Building AI that works for everyone starts with language
by Susanna Ray
A farmer takes a photo and taps into a phone: “What’s causing the spots on these leaves, and what should I do about them?” It’s the kind of common question people around the world now turn to AI to answer as the technology becomes a front door to information — a way to learn, decide and get help with daily tasks.
But language shapes who can walk through that door. Even with a smartphone and a speedy internet connection, whether that farmer or anyone else gets a useful answer often depends on one thing: which language they speak.
AI is powered by large language models trained on vast amounts of text to understand and generate responses. English accounts for half of online content, which shapes how AI learns and creates a wide disparity in who benefits from it. That’s why Microsoft researchers and data scientists are building open tools and partnerships to expand language support so more people can use AI in their native tongue.
“AI has the greatest positive impact when it’s built for the people it’s meant to serve,” says Inbal Becker‑Reshef, managing director of Microsoft’s AI for Good Lab. “Language isn’t a nice‑to‑have, it’s what determines whether technology actually empowers communities or leaves them out.”
More than 1.2 billion people have used AI tools in the past three years, according to the 2025 AI Diffusion Report, but usage is uneven, with the rate in the Global North almost double that of the Global South. The report suggests language is a barrier in its own right, with lower adoption in places where local languages have less digital content, even after adjusting for income and connectivity.
It’s a huge imbalance. Out of more than 7,000 languages spoken worldwide, fewer than 100 have enough digital presence to significantly shape today’s large language models. That means millions of people can’t use their language to navigate AI systems.
As the technology moves deeper into everyday life, that gap can be frustrating — but it also can be consequential, limiting access to tools that are helping others improve their lives with support for work, health care, finances, education and more.
“Solving the language challenge isn’t as simple as just translating content,” says Jacki O’Neill, lab director of Microsoft Research Africa in Nairobi, a team working to expand the linguistic and cultural inclusivity of AI tools. “AI’s usefulness depends on whether it can respond to what people mean, not just what they say, and meaning lives in a community’s culture, which is deeply intertwined with its language.”
That means closing the gap requires new data, new evaluation methods and better ways to design, build and test AI models and systems in real settings, with the communities who will use them. Here are seven projects aimed at making AI more accessible around the world.
1. Project Gecko: Building AI designed for whole communities
Project Gecko is a Microsoft Research effort focused on a simple idea: AI works best when it’s designed for the people who will actually use it. Instead of treating language support as an add-on at the end, the team is building AI with local languages and local context in mind from the start.
Co-led by teams at Microsoft Research India, Microsoft Research Africa in Nairobi and the Microsoft Research Accelerator, the group’s early work in East Africa and South Asia has focused on agriculture and education, where people need practical guidance that reflects the realities of their region — not generic advice pulled from elsewhere.
The project’s research focuses on how AI can be adapted and deployed in settings where people may share devices, have limited bandwidth, speak multiple languages and switch between speech and text. It includes methods for creating data where little exists, as well as testing the tools in real settings to measure whether they’re useful, trustworthy and aligned with how people actually ask questions and use information.
Those lessons are now being shared more broadly through a set of Project Gecko playbooks, which offer practical guidance for creating AI that works across languages and cultures.
2. MMCTAgent: Finding answers in voice, images and video — not just text
A lot of important information isn’t found in text on a website. It’s in photos, diagrams and videos — including training videos where a key detail might appear for only a few seconds. But many AI tools still struggle when the right answer depends on finding a specific moment in a long video or pulling clues from a large library of images and clips.
Microsoft Research’s MMCTAgent, available on GitHub and featured in Microsoft Foundry Labs, was created to tackle that problem. Instead of a “one-shot” response, it breaks a question down into steps, uses tools to search for relevant images, frames or scenes and then pulls those pieces together into an answer it can refine along the way — even checking its own work using a built-in “critic.”
One place this matters is FarmerChat, a tool from Digital Green used to help agricultural extension workers support farmers. In a video-enabled prototype, MMCTAgent helps FarmerChat search and draw from a collection of local farming videos, so answers are based on guidance created within the farmers’ own communities by people who know the crops, conditions and languages firsthand.
3. Paza: Speech recognition that works in more languages and accents
For many people, talking into a smartphone is the easiest way to use it. But speech recognition can struggle when a language has little training data, or when someone’s accent doesn’t match the one a system was trained on.
Microsoft Research’s Paza effort tackles that problem by both improving the technology and making it easier to measure progress. It includes PazaBench, an automatic speech recognition (ASR) scoreboard that measures how well different speech-to-text models handle the same set of recordings. That makes progress easier to compare — especially for languages with little online presence — so researchers can see what works, what doesn’t and where gaps remain. It started this year with 39 African languages and 51 models and is the first ASR benchmark of its kind.
Beyond PazaBench, the team developed fine-tuned speech models for six Kenyan languages — Swahili, Dholuo, Kalenjin, Kikuyu, Maasai and Somali — that were tested for performance in real-world conditions, including trials on everyday mobile devices with farmers in the field.
The project grew out of fieldwork with Project Gecko, where teams saw how often speech tools fail in real settings and how that can limit access when local languages are primarily spoken rather than written. The name comes from the Swahili phrase “paza sauti,” meaning “to project” or “to raise your voice.”
4. LINGUA: Funding open datasets so more languages show up in AI
If AI is going to work well in more languages, it needs something basic: data. Languages that are less common online don’t have enough openly available text and speech for AI systems to learn from or be tested against.
That’s what the Microsoft AI for Good Lab’s LINGUA project addresses.
In Europe, the program supports 11 organizations that are collecting and sharing new datasets for underrepresented European languages — resources that others can use to build and improve language tools. The projects selected for grants span 16 languages and dialects across 10 countries, covering languages spoken by more than 65 million people, including Basque, Icelandic and Ukrainian.
Microsoft is building on that approach with LINGUA Africa, working with the Masakhane African Languages Hub and the Gates Foundation to award $5.5 million in grants for AI projects focused on developing African language models in areas like education, food security, health and government services.
5. Bring Your Own Language: a repeatable “recipe” for adding more languages to AI
Even when researchers improve AI in one language, the bigger challenge is doing it again and again for many others — especially languages with little online text to learn from.
Bring Your Own Language (BYOL) is a framework from Microsoft’s AI for Good Lab that lays out a practical way to do that. It starts by sizing up how much digital material a language has, sorting languages into tiers and then choosing an approach that fits. That makes the work repeatable, so more communities can build tools that account for local nuances and needs.
For languages with some data, like Chichewa and Māori, BYOL focuses on teaching the model the language directly by cleaning and expanding what’s available and then fine-tuning it so it handles that language better. The team saw a 12% improvement in accuracy with this method.
For languages with very little data, such as Inuktitut, BYOL explores using translation as a bridge. Someone can ask a question in their language, the system translates it into a language the model already handles well, generates an answer, then translates the answer back. That can offer access sooner, when building a full model from scratch in the original language isn’t realistic yet.
6. ASHABot: A practical AI helper for India’s frontline health workers
In many rural parts of India, frontline health workers known as Accredited Social Health Activists, or ASHAs, are the bridge between households and the health system. They handle everything from newborn and child health to immunizations and family planning — often with only basic training and limited access to supervisors.
ASHABot is a WhatsApp-based chatbot that’s helping them get quick, dependable, on-demand support, even in places where a mobile signal can be unreliable. The tool was built by the nonprofit Khushi Baby using generative AI technology developed by Microsoft Research.
It’s connected to a trusted knowledge base that includes India’s public health manuals, and it can understand questions in Hindi, English or even a mix of the two. It can give responses by text or voice, allowing ASHAs to play the recommendations aloud for a patient to hear.
7. African Health Stories: Personalizing diabetes guidance
Generic health advice can be hard to follow anywhere, but it can be even more difficult when it doesn’t match a person’s food, culture, language or daily realities. African Health Stories is using generative AI to create personalized stories aimed at helping people living with Type 2 diabetes make realistic lifestyle changes.
The project is starting in South Africa, which has a higher rate of the disease than the global average. Microsoft Research is developing it with Stellenbosch University, the University of Pretoria, Swansea University and a team of medical experts.
Patients and clinicians can generate stories that answer questions and offer guidance on living with diabetes, grounded in their language, culture and everyday realities — such as diet and exercise choices shaped by local food options and routines. The stories can be shared through text, images or speech, and they adapt through interaction to tailor advice to individual needs and preferences.
Susanna Ray writes about AI and technology, with stories that show its real‑world impact and examine how innovation is reshaping work, business and society. She previously reported for Bloomberg News and other major international news organizations in the U.S. and abroad, covering beats ranging from politics and government to business and aviation. Follow her work on Microsoft Source.
All photos and videos from Getty Images. Credits from top to bottom: Sergeyxsp, Kevin Fleming, Dimple Bhati, DarioGaona, THEGIFT777, ImagesBazaar, Nattrass, Somnuk Krobkum, Tuul and Bruno Morandi, Bruce Mounde and 500px, Danm, Arctic-Images, Laviejasirena, Raimund Linke, Chameleonseye, Mayur Kakade, CR Shelare, Richard T. Nowitz and Martin Harvey.
This story was published on April 7, 2026.