Announcing new tools in Azure AI to help you build more secure and trustworthy generative AI applications

März 28, 2024

In the rapidly evolving landscape of generative AI, business leaders are trying to strike the right balance between innovation and risk management. Prompt injection attacks have emerged as a significant challenge, where malicious actors try to manipulate an AI system into doing something outside its intended purpose, such as producing harmful content or exfiltrating confidential data. In addition to mitigating these security risks, organizations are also concerned about quality and reliability. They want to ensure that their AI systems are not generating errors or adding information that isn’t substantiated in the application’s data sources, which can erode user trust.

To help customers meet these AI quality and safety challenges, we’re announcing new tools now available or coming soon to Azure AI Studio for generative AI app developers:

Prompt Shields to detect and block prompt injection attacks, including a new model for identifying indirect prompt attacks before they impact your model, coming soon and now available in preview in Azure AI Content Safety.

Groundedness detection to detect “hallucinations” in model outputs, coming soon.

Safety system messages to steer your model’s behavior toward safe, responsible outputs, coming soon.

Safety evaluations to assess an application’s vulnerability to jailbreak attacks and to generating content risks, now available in preview.

Risk and safety monitoring to understand what model inputs, outputs, and end users are triggering content filters to inform mitigations, coming soon, and now available in preview in Azure OpenAI Service.

With these additions, Azure AI continues to provide our customers with innovative technologies to safeguard their applications across the generative AI lifecycle.

Safeguard your LLMs against prompt injection attacks with Prompt Shields

A colorful illustration of red and blue cubes floating through a purple computer screen.

Prompt injection attacks, both direct attacks, known as jailbreaks, and indirect attacks, are emerging as significant threats to foundation model safety and security. Successful attacks that bypass an AI system’s safety mitigations can have severe consequences, such as personally identifiable information (PII) and intellectual property (IP) leakage.

A colorful illustration of red and blue cubes floating towards a purple computer screen. The red cubes are blocked by a shield in front of the computer screen.

To combat these threats, Microsoft has introduced Prompt Shields to detect suspicious inputs in real time and block them before they reach the foundation model. This proactive approach safeguards the integrity of large language model (LLM) systems and user interactions.

Prompt Shield for Jailbreak Attacks: Jailbreak, direct prompt attacks, or user prompt injection attacks, refer to users manipulating prompts to inject harmful inputs into LLMs to distort actions and outputs. An example of a jailbreak command is a ‘DAN’ (Do Anything Now) attack, which can trick the LLM into inappropriate content generation or ignoring system-imposed restrictions. Our Prompt Shield for jailbreak attacks, released this past November as ‘jailbreak risk detection’, detects these attacks by analyzing prompts for malicious instructions and blocks their execution.

Prompt Shield for Indirect Attacks: Indirect prompt injection attacks, although not as well-known as jailbreak attacks, present a unique challenge and threat. In these covert attacks, hackers aim to manipulate AI systems indirectly by altering input data, such as websites, emails, or uploaded documents. This allows hackers to trick the foundation model into performing unauthorized actions without directly tampering with the prompt or LLM. The consequences of which can lead to account takeover, defamatory or harassing content, and other malicious actions. To combat this, we’re introducing a Prompt Shield for indirect attacks, designed to detect and block these hidden attacks to support the security and integrity of your generative AI applications.

Identify LLM Hallucinations with Groundedness detection

‘Hallucinations’ in generative AI refer to instances when a model confidently generates outputs that misalign with common sense or lack grounding data. This issue can manifest in different ways, ranging from minor inaccuracies to starkly false outputs. Identifying hallucinations is crucial for enhancing the quality and trustworthiness of generative AI systems. Today, Microsoft is announcing Groundedness detection, a new feature designed to identify text-based hallucinations. This feature detects ‘ungrounded material’ in text to support the quality of LLM outputs.

Steer your application with an effective safety system message

In addition to adding safety systems like Azure AI Content Safety, prompt engineering is one of the most powerful and popular ways to improve the reliability of a generative AI system. Today, Azure AI enables users to ground foundation models on trusted data sources and build system messages that guide the optimal use of that grounding data and overall behavior (do this, not that). At Microsoft, we have found that even small changes to a system message can have a significant impact on an application’s quality and safety. To help customers build effective system messages, we’ll soon provide safety system message templates directly in the Azure AI Studio and Azure OpenAI Service playgrounds by default. Developed by Microsoft Research to mitigate harmful content generation and misuse, these templates can help developers start building high-quality applications in less time.

Evaluate your LLM application for risks and safety

A colorful illustration of a blue screen with three bar charts, each with text descriptions beneath them. One bar chart is highlighted and shows two short bars in a blue color and one tall bar in a red color.

How do you know if your application and mitigations are working as intended? Today, many organizations lack the resources to stress test their generative AI applications so they can confidently progress from prototype to production. First, it can be challenging to build a high-quality test dataset that reflects a range of new and emerging risks, such as jailbreak attacks. Even with quality data, evaluations can be a complex and manual process, and development teams may find it difficult to interpret the results to inform effective mitigations.

Azure AI Studio provides robust, automated evaluations to help organizations systematically assess and improve their generative AI applications before deploying to production. While we currently support pre-built quality evaluation metrics such as groundedness, relevance, and fluency, today we’re announcing automated evaluations for new risk and safety metrics. These safety evaluations measure an application’s susceptibility to jailbreak attempts and to producing violent, sexual, self-harm-related, and hateful and unfair content. They also provide natural language explanations for evaluation results to help inform appropriate mitigations. Developers can evaluate an application using their own test dataset or simply generate a high-quality test dataset using adversarial prompt templates developed by Microsoft Research. With this capability, Azure AI Studio can also help augment and accelerate manual red-teaming efforts by enabling red teams to generate and automate adversarial prompts at scale.

Monitor your Azure OpenAI Service deployments for risks and safety in production

A colorful illustration of a blue computer screen with different kinds of charts in a dashboard. A blue line graph is highlighted, with a peak on the y-axis in red.

Monitoring generative AI models in production is an essential part of the AI lifecycle. Today we are pleased to announce risk and safety monitoring in Azure OpenAI Service. Now, developers can visualize the volume, severity, and category of user inputs and model outputs that were blocked by their Azure OpenAI Service content filters and blocklists over time. In addition to content-level monitoring and insights, we are introducing reporting for potential abuse at the user level. Now, enterprise customers have greater visibility into trends where end-users continuously send risky or harmful requests to an Azure OpenAI Service model. If content from a user is flagged as harmful by a customer’s pre-configured content filters or blocklists, the service will use contextual signals to determine whether the user’s behavior qualifies as abuse of the AI system. With these new monitoring capabilities, organizations can better-understand trends in application and user behavior and apply those insights to adjust content filter configurations, blocklists, and overall application design.

Confidently scale the next generation of safe, responsible AI applications

Generative AI can be a force multiplier for every department, company, and industry. Azure AI customers are using this technology to operate more efficiently, improve customer experience, and build new pathways for innovation and growth. At the same time, foundation models introduce new challenges for security and safety that require novel mitigations and continuous learning.

At Microsoft, whether we are working on traditional machine learning or cutting-edge AI technologies, we ground our research, policy, and engineering efforts in our AI principles. We’ve built our Azure AI portfolio to help developers embed critical responsible AI practices directly into the AI development lifecycle. In this way, Azure AI provides a consistent, scalable platform for responsible innovation for our first-party copilots and for the thousands of customers building their own game-changing solutions with Azure AI. We’re excited to continue collaborating with customers and partners on novel ways to mitigate, evaluate, and monitor risks and help every organization realize their goals with generative AI with confidence.

Learn more about today’s announcements

Get started in Azure AI Studio.
Dig deeper with technical blogs on Tech Community: