At Microsoft, we believe that for India to become truly digital, we need to make technology accessible and productive for all, irrespective of the language they may speak or read. To break the language barrier, we started working with Indian languages two decades ago and launched Project Bhasha in 1998 to accelerate computing in Indian languages. We have come a long way since then – supporting text input in all 22 constitutionally recognized Indian language across our products, and Windows interface support in 12 languages. Bhashaindia.com, our portal that provides computing tools for Indic languages, on an average receives 40 million hits every year.
On India’s 69th Republic Day, we are taking one more step towards making technology work for India. We’re excited to announce that we’re bringing the power of Artificial Intelligence and Deep Neural Networks to improve real-time language translation for Hindi, Bengali, and Tamil. With Deep Neural Networks-powered language translation, the results are more accurate and sound more natural.
We think of language translation not just as an add-on service but as a core part of our products and services for our users, partners and customers. Users can avail the benefits of Deep Neural Networks-enhanced Indian language translation while surfing the internet across any website on the Microsoft Edge browser, on Bing search, Bing Translator website, as well Microsoft Office 365 products like Word, Excel, PowerPoint, Outlook, and Skype. The Microsoft Translator app in Android and iOS can recognize and translate languages from text, speech and even photos. For our partners and customers, we also provide APIs on Azure that they can use in their products.
“We’re committed to empower every Indian and every business in India by bringing the power of AI into their daily life and become a driving force for Digital India. Microsoft celebrates the diversity of languages in India and wants to make the vast internet even more accessible. We have supported Indian languages in computing for over two decades, and more recently have made significant strides on voice based access and machine translation across languages. Today’s launch is a testament of our quest to bring cutting edge machine learning tech to democratize access to information for everyone in India,” said Sundar Srinivasan, General Manager – AI & Research, Microsoft India.
Bringing Deep Neural Networks to language translation
Since early 2000s, we’ve been pioneering the traditional Statistical Machine Translation (SMT) paradigm to translate global as well as Indian languages. The incorporation of Deep Neural Networks into translating complex Indian languages has been engineered to bring more accuracy and fluency to translation.
While SMT is limited to translating a word within the local context of a few surrounding words, Deep Neural Networks operate differently as it has the capability of encoding more granular concepts like gender (feminine, masculine, neutral), politeness level (slang, casual, written, formal), and type of word (verb, noun, adjective). At its core, Deep Neural Networks are inspired by human-theories about how the pattern-recognition process works in the brains of multilingual humans, which leads to more natural-sounding translations.
Our conversation translation comes equipped with another satellite Deep Neural Networks-based system called TrueText that filters repetition, pauses, and indifferent words, enhancing the translation’s contextual appropriateness.
However, to do be able to employ Deep Neural Networks to language translation, the system needs to be trained using professionally translated documents. This allows the system to learn how words and phrases in one language are represented in another. And that was a big challenge when it came to Indian languages.
Challenges in Indian language translations
Training the Deep Neural Networks involves inputting massive amounts of high-quality data to execute the translations. For accurate translations, the system demands millions of parallel sentences in each language pair, in all permutations and combinations. However, Indian languages, constituting of Dravidian and Aryan subdivisions, are complicated. The complexities increase while translating languages for India, where 29 different states have 22 official languages.
Adding to the challenges was the dearth of digital content in Indian language, which could be pulled from the internet to train the neural networks. Even though there’s digital content that’s available, a lot of it cannot be used as it doesn’t follow standard encoding like Unicode.
“Six Indian languages are part of top 20 global languages by population. Ironically, these languages are not on top of the digital content list. There’s not enough material on the internet that we could use to train the system,” explains Krishna Doss Mohan, Senior Program Manager, Microsoft India, who is part of the team that works on Indian languages.
Despite the obstacles, Deep Neural Networks-powered translation systems have shown significant improvement in both automatic and human evaluation metrics. More specifically, we have witnessed at least 20% improvement in translation quality for all Indic languages currently supported by Microsoft.
Empowering people across the spectrum
Our success in Deep Neural Networks-powered translation for Indian languages can have a game-changing impact on businesses and society. “In India, about 12% of the people can speak, read, and write English, even though it’s not their first language,” says Mohan. “There are 600 million literate people, who aren’t necessarily proficient in English and prefer to consume information in their mother tongue.”
Photos by Rajesh Cheemalakonda