Village by village, creating the building blocks for AI tools with work that also educates 

Woman leaning against a pole, looking at her phone in a group of women

Read this story in Kannada, Malayalam, Marathi, Portuguese, Spanish or Tamil.

KHARADI, Maharashtra, India – At 10:30 p.m., after a long day of work, Baby Rajaram Bokale has one more task to complete before she sleeps. 

She settles cross-legged on her bed. In one corner, an elaborate shrine to the Hindu deity Krishna glows with colorful strings of lights. A portrait of her late husband, with a full, gray mustache and a direct gaze, hangs above the bed.

She opens an app in her smartphone, and in her clear, resonant voice, she begins to read a story aloud in her native tongue, Marathi, the language of Maharashtra state, where she lives in Kharadi, a bustling suburban neighborhood in the city of Pune. 

Bokale’s voice, among others, will be used to train AI models in Marathi. But at the same time, she’s learning valuable lessons for herself – in this case about personal finance. The story she was reading is designed to deliver practical information in an entertaining way – about how banks work, how to save and how to avoid scammers and frauds. 

“Now I’m able to do more interesting things with my smartphone,” she says. She learned to pay for items with India’s UPI payment system. She also learned how to use the phone for banking, among other things.

A woman sitting on a bed, looking at her smartphone
Baby Rajaram Bokale reads a story in Marathi into the Karya app on her phone. Photo by Chris Welsch for Microsoft.
A woman’s hands holding a smartphone
Baby Rajaram Bokale using the Karya app to read and write in Marathi, the language of Maharashtra state. Photo by Chris Welsch for Microsoft. 

Bokale is working for a social impact organization called Karya, Sanskrit for “work that gives you dignity,” that describes itself as the “world’s foremost ethical data company.”

“Earn, learn and grow” is the mantra of Karya, which wants to revolutionize the way datasets are created in India and elsewhere. The group’s goal is to lift as many people out of poverty as possible while giving them the tools to thrive in the modern digital economy. At the same time, Karya is building high-quality and ethical datasets with an unconventional workforce. 

Those datasets are valuable. While about 80 million people speak Marathi, it’s not well-represented in the digital world. In India, if you don’t speak Hindi or English, it can be difficult to access technology that helps people thrive – apps, tools and digital assistants that English and Hindi speakers take for granted. The fact that hundreds of millions of potential customers could benefit from those technologies is why Microsoft and others are in a race to make their products available in those “under-resourced” languages. 

“I’m really proud that my voice is getting recorded, and someone is about to learn Marathi thanks to my voice,” says Bokale, who is 53, “and also proud that it will make these tools and features available in Marathi.” 

She runs a small business grinding spices and chili peppers out of her home. “I used what I earned to buy a part and repair my grinder,” she says. “That’s money I wouldn’t normally have.”

Karya: Creating high-quality data and alleviating poverty 

Karya creates datasets in several Indian languages to train AI models and for research while creating jobs for Indians, mainly in rural areas. 

Karya got its start as a Microsoft Research project in Bengaluru in 2017. 

Over time, it became clear that Karya had enormous potential, both as a creator of high-quality language datasets in India’s many languages and as a way to help lift rural Indians out of poverty with education and income. The project was spun off in 2021 as an organization independent of Microsoft. Its entire operation, including the app that workers use to record and write in their native languages is built on Microsoft Azure and uses Azure OpenAI Service, as well as Azure AI Cognitive Services to validate its data. Microsoft is one of its major clients. 

Karya pays workers like Bokale about $5 USD an hour, far above the minimum wage in India. Over 11 days, Bokale worked about five hours and earned 2,000 rupees, or about $25 USD. The work is engaging and educational (hence the “learn”), and continued support is intended to help Karya workers prosper with the knowledge they’ve gained. Further, if the data created by Karya is resold, the workers receive royalties. 

Karya’s founders have ambitious goals. It is partnering with more than 200 other nonprofits with the goal of reaching 100 million people by 2030. It hopes that data will serve as the basis for tools that will later serve these same people in their own languages. Karya is attempting to gather and process the datasets in ways that mitigate bias based on gender and other factors. It’s one of the reasons Karya is reaching out to diverse groups of people to build more inclusive data. 

Manu Chopra, 27, is one of the founders of the company and its CEO. He says the enormous demand for datasets in underserved languages, combined with the fact that 78 percent of rural Indians have access to a smartphone, is an enormous opportunity. Karya is set up to funnel most of its profits into the hands of its workers, retaining enough to support its staff and do more research.

Outside portrait of a man standing in a busy street
Manu Chopra, the CEO of Karya, in the Kharadi neighborhood of Pune, India. Photo by Chris Welsch for Microsoft.

“Let’s say the world is going to spend a trillion dollars on building AI,” Chopra says. “So over the next 20 years, what percentage of that can I bring directly into the wallets of people who need it the most? We really think that rural India can be an excellent builder of AI, but also an excellent recipient of AI technologies.” 

Bokale is among the more than 30,000 people who have so far worked for Karya in towns and villages across 24 of India’s 28 states. 

Making technology accessible in under-resourced languages 

AI tools like OpenAI’s ChatGPT and Microsoft’s Copilot work well in English because of the abundance of written and audio material on the internet in the language. India, a country of 1.4 billion, has 22 official languages, hundreds of other languages and thousands of dialects. About 60 percent of Indians speak Hindi and about 10 percent speak English, leaving hundreds of millions of people without digital tools that can help them thrive in the modern world. 

“I think we want to rectify that most of the internet being in English is not a very good place to start,” says Kalika Bali, a language technologist and researcher at the Microsoft Research Lab in Bengaluru. She uses data collected by Karya for her research. 

“People need to be part of the growth in the digital economy that’s spreading everywhere. No one should be excluded from using technology because of their language,” she says.

Portrait of a woman smiling
Kalika Bali, a language technologist and researcher at the Microsoft Research Lab in Bengaluru, India. Photo by Chris Welsch for Microsoft.

“At Microsoft, we say we want to empower the entire planet, right? And more than half the world’s population uses languages other than English.” 

Bali says that AI has greatly sped up the process of language preservation and its use in large language models (LLMs). This is useful in creating online and AI tools, but also for preserving rare or dying languages. 

“Now we can create these copilot kinds of things really quickly,” she says. “Previously when we were talking about language preservation, we were talking about efforts that took place over decades, literally. … All of that can now be shortened to months.” 

Karya, which says it is on pace to engage with more than 100,000 workers by the end of 2024, seeks participants who need work and education the most – often women in rural areas. In addition to a premium wage, it offers training and other kinds of support when the work is done.

‘Technology can really, really help amplify people’s desires’ 

Chopra grew up in a “basti” – an informal settlement – in Delhi and says that the inequities he witnessed growing up had a profound effect on his sense of purpose while studying computer science with a focus on AI at Stanford University in California. 

“When I moved back to India, the first thing I realized was, everywhere I went, people had the intent or the will to get out of poverty, everyone works really hard, everyone has aspiration,” he says. “And they have the capacity to learn new skills. And if those two things exist, technology can really, really help amplify people’s desires, to make something of themselves.” 

The work Bokale did over the course of 11 days was part of a pilot project to test whether the work of inputting data could be combined with the learning of useful information. While earning what for them was a substantial amount of money, they’d also be learning about the financial tools they need to make the best use of it. 

The material was presented as a serialized story about two sisters, and it was this story that the workers read aloud into their smartphones to capture the sounds and rhythms of spoken Marathi. “We really enjoyed the story,” Bokale says, “And in that story, there were common people who are working hard every day. The money they earned would be easily spent, there were no savings. In short, the question was how to save.” 

Safiya Husain, Karya’s chief impact officer, said that the story format proved a success, and that many of the participants read the story out loud to their families and friends. 

Outside portrait of a girl smiling in a field
Safiya Husain, the chief impact officer of Karya. Photo by Chris Welsch for Microsoft.

“They would say, ‘I’m going to do this work and read the story to you,” Husain says. “And they would actually get excited and wonder, ‘Oh, what is happening next? Will she get her loan? Or will she have enough money to pay for the wedding?’” 

She says that by combining work with education, Karya was trying to treat its workers with respect and create outcomes beyond income that are meaningful. “We were paying people for their time, and we were saying what they were doing was valuable,” she says. “It wasn’t just, here’s a lesson to learn in your spare time.” 

Husain says she hopes that eventually many of the Karya workers will join the organization in different roles, working as organizers and local administrators. In the big picture, she says, the aim is to put technology to work for everyone. 

“When we’re collecting data in these languages like Marathi, we’re trying to make sure that these communities and these populations, which have millions and millions of speakers, are not being left behind in the technology revolution,” she says. 

Engaging whole communities in the project 

Kalika Bali, the Microsoft researcher, says one of the keys to the success of Karya is that it strives to engage whole communities in the project. Most of Karya’s workers are women, and she says they have more “circles of trust” to cross than men. 

“The men only need to ask two things: will this work for me, and will I get paid?” she says. “Women have to ask; will my family accept it? Will this bring a bad name to my family and myself by doing this? Is this going to harm me in some way? Only then does it come to the platform and the money.” 

“The advantage with Karya is that it has created a lot of trust on the ground. They are really engaged with the communities they’re in,” she says. 

In her neighborhood in Pune, Bokale is a well-known figure who is universally known as Baby Tai, tai meaning “elder sister.” She runs an informal financial network with several dozen other women who pool savings monthly and take turns taking a larger amount to use for things like starting a small business or paying school fees. Women often show up on her tree-shaded patio to talk business or just hang out. Her chili and spice grinding equipment is in a small tin shed on one side of the small yard.

Three women sitting in on a step
From left, Parvati Kemble, Surekha Sanjay Gaikwad and Baby Rajaram Bokale discussing their self-help banking group in the Kharadi neighborhood of Pune, India. Photo by Chris Welsch for Microsoft.

Surekha Sanjay Gaikwad, 51, is one of her neighbors and friends. She runs a small grocery store about a half hour from her home. She also reads Marathi into her phone for Karya. Sitting with Bokale on her front steps, she burst into a wide grin when asked what she liked about the experience. 

“I couldn’t believe I could do it at home,” she says. “I don’t have to get on a bus again or go anywhere else at the end of the day.” 

The education component of the work was a plus, Gaikwad said. She learned how to create a fixed deposit at the bank, and she did just that as a way to save more effectively for her son’s college studies. 

Over the course of a recent morning, several other women who had worked for Karya stopped at Bokale’s home to chat. Meena Jadhav, 55, had used the money to buy material and sewing tools for her tailoring business – she made shirts to sell. Thanks to what she learned, she said, she can now use a savings account and knows how to use an ATM. She didn’t know you could withdraw and deposit money without going to the bank. 

Another woman used the lessons she learned and the cash she earned to start a savings account for her daughter’s education. 

They all said they enjoyed the work and found the information about financial planning and online tools useful. An added benefit for the women, Bokale says, was learning that their smartphones could open doors to other kinds of opportunities. 

She says many of the other women in the pilot project didn’t know how to use a smartphone at all beforehand. “Their husbands and in-laws, they’re saying ‘Oh, you’ve learned so many new things, and that’s so great.’”

Top image: Baby Rajaram Bokale with some of the women in her informal investment group.  By recording and writing Marathi in their smartphones they helped create datasets to be used to create AI language models.