Red teams think like hackers to help keep AI safe

Red teams think like hackers to help keep AI safe

By Susanna Ray

Just as AI tools such as ChatGPT and Copilot have transformed the way people work in all sorts of roles around the globe, they’ve also reshaped so-called red teams — groups of cybersecurity experts whose job is to think like hackers to help keep technology safe and secure.

Generative AI’s abilities to communicate conversationally in multiple languages, write stories and even create photorealistic images hold new potential hazards, from providing biased or inaccurate results to giving people with ill intent new ways to stir up discord. These risks spurred a novel and broad approach to how Microsoft’s AI Red Team is working to identify and reduce potential harm.

“We think security, responsible AI and the broader notion of AI safety are different facets of the same coin,” says Ram Shankar Siva Kumar, who leads Microsoft’s AI Red Team. “It’s important to get a universal, one-stop-shop look at all the risks of an AI system before it reaches the hands of a customer. Because this is an area that is going to have massive sociotechnical implications.”

This post is part of Microsoft’s Building AI Responsibly series, which explores top concerns with deploying AI and how the company is addressing them with its responsible AI practices and tools.

The term “red teaming” was coined during the Cold War, when the U.S. Defense Department conducted simulation exercises with red teams acting as the Soviets and blue teams acting as the U.S. and its allies. The cybersecurity community adopted the language a few decades ago, creating red teams to act as adversaries trying to break, corrupt or misuse technology — with the goal of finding and fixing potential harms before any problems emerged.

When Siva Kumar formed Microsoft’s AI Red Team in 2018, he followed the traditional model of pulling together cybersecurity experts to proactively probe for weaknesses, just as the company does with all its products and services.

At the same time, Forough Poursabzi was leading researchers from around the company in studies with a new and different angle from a responsible AI lens, looking at whether the generative technology could be harmful — either intentionally or due to systemic issues in models that were overlooked during training and evaluation. That’s not an element red teams have had to contend with before.

The different groups quickly realized they’d be stronger together and joined forces to create a broader red team that assesses both security and societal-harm risks alongside each other, adding a neuroscientist, a linguist, a national security specialist and numerous other experts with diverse backgrounds.

“We need a wide range of perspectives to get responsible AI red teaming done right,” says Poursabzi, a senior program manager on Microsoft’s AI Ethics and Effects in Engineering and Research (Aether) team, which taps into a whole ecosystem of responsible AI at Microsoft and looks into emergent risks and longer-term considerations with generative AI technologies.

The dedicated AI Red Team is separate from those who build the technology, and its expanded scope includes adversaries who may try to compel a system to generate hallucinations, as well as harmful, offensive or biased outputs due to inadequate or inaccurate data.

Team members assume various personas, from a creative teenager pulling a prank to a known adversary trying to steal data, to reveal blind spots and uncover risks. Team members live around the world and collectively speak 17 languages, from Flemish to Mongolian to Telugu, to help with nuanced cultural contexts and region-specific threats.

And they don’t only try to compromise systems alone; they also use large language models (LLMs) for automated attacks on other LLMs.

The group also added breadth to the depth of its expertise by releasing open-source frameworks such as Counterfit and the Python Risk Identification Toolkit for generative AI, or PyRIT, earlier this year to help security professionals and machine learning engineers outside the company map potential risks as well. The tools help expert red teamers — a limited resource — be more efficient and productive. The team also published best practices from their experiences to help others getting started.

Once Microsoft’s AI Red Team finds an issue, it sends it to the Responsible AI Measurement Team, which evaluates how much of a threat the matter might be. Then other internal experts and groups address the matter to complete the three-step approach for safe AI: mapping, measuring and managing of risks.

“Our activity encompasses a wide variety of harms we try to proof for,” Siva Kumar says. “We quickly adapt and reformulate, and that has been the recipe for our success — not to wait for the forces of change to push up, but to anticipate.”

Read our first Building AI Responsibly story about AI hallucinations.

Learn more about Microsoft’s Responsible AI work.

Lead illustration by Makeshift Studios / Rocio Galarza. Story published on July 24, 2024.

Red teams think like hackers to help keep AI safe

Tags: