From search to translation, AI research is improving Microsoft products

Until recently, a multinational company looking to help customers around the world book international travel would have had to build separate chatbots from scratch to converse in French, Hindi, Japanese or other languages.

But thanks to artificial intelligence research breakthroughs that have enabled algorithms to more accurately parse nuances in the way different languages express concepts or structure sentences, it is now possible to build a single bot and use Microsoft Translator to translate questions and answers accurately enough for use in multiple countries.

Over the past few years, Microsoft deep learning researchers were the first to achieve human parity milestones in developing algorithms that could perform about as well as a person on research benchmarks testing conversational speech recognitionreading comprehensiontranslation of news articles and other challenging language understanding tasks. Now, the benefits of those AI research breakthroughs are making their way into products from Azure to Bing.

Search engineers are borrowing lessons from Microsoft AI researchers who developed a new deep neural network model that can learn from multiple natural language understanding tasks at once. They’ve applied those lessons to improve answers to questions and captions in Bing search results and question answering in corporate SharePoint sites. A new AI model that performed well on a recent speaker recognition challenge to recognize speakers from real life speech is being incorporated into Azure’s Speaker Recognition Cognitive Service.

“It’s really only been the recent introduction of new deep learning models that has allowed language understanding to dramatically improve,” said Eric Boyd, Microsoft corporate vice president, Azure AI. “The types of things we’re now able to do in our products because of these research breakthroughs are things that previously didn’t work as well or that we just generally couldn’t do.”

Across the company, he can point to examples of Azure AI products that grew from researchers solving challenges that also turned out to be useful for customers ­— like Azure’s automated machine learning capabilities that vastly simplify the model building process or Azure’s Personalizer Cognitive Service that easily delivers relevant content to users. The latter reinforcement learning model was initially developed by researchers, proved out internally and eventually built into a product for Azure cloud customers, he said.

“This space moves so quickly that you really need to tap into the latest thinking, and it’s a very privileged position to have Microsoft Research’s vast army of super talented people pushing the envelope in all these different ways,” Boyd said. “So our work is really to figure out the most interesting places where we can apply that to our products and, on the other side, to also give them guidance on what would really make the biggest differences to us.”

In the translation field, for instance, Microsoft researchers in 2018 were the first to demonstrate that AI could match human performance in translating news articles from Chinese to English on a commonly used test set. As soon as the team achieved that historic research milestone, they began adapting the model to work in Microsoft Translator, which powers an Azure Cognitive Service that has to work instantaneously and translate a wide variety of texts ranging from historical research documents to travel websites and production manuals.

The resulting product improvements were rolled out in the first nine language pairs in June, for translation to and from English, and in eight new languages in November. For example, English to French translations have improved by 9 percent, English to Hindi by 9 percent, Bengali to English by 11 percent, Urdu to English by 15 percent and English to Korean by 22 percent. Even previously strong models such as Portuguese and Swedish have seen significant quality gains.

In one example, the improved machine translation model accurately translates a sentence from French as: “Arsenal manager Arsene Wenger believes ‘the signs are promising’ for his three injured midfielders who are due to recover for Sunday’s game against Chelsea.” The previous model translated it this way: “Arsenal’s Director Arsene Wenger thinks ‘signs are promising’ for his three wounded terrain backgrounds that need to be plumb for the game against Chelsea on Sunday.”

With these kinds of improvements, it’s much more feasible to take, for example, a human resource document that’s written in one language, use machine translation to convert it to another and simply post the document without additional editing, said Microsoft distinguished engineer Arul Menezes, founder of Microsoft Translator. Or for an engineer working in a factory with a broken piece of equipment to communicate with an expert in the home office who speaks a different language.

“We are really getting to the point where automatic translation just works, and a lot of customers are using it for new applications they never thought were possible before,” Menezes said.

Original SentenceHuman TranslationPrevious Machine TranslationImproved Machine Translation with New AI Model
Le directeur d’Arsenal Arsene Wenger pense que ‘les signes sont prometteurs’ pour ses trois milieux de terrains blessés qui doivent être remis d’aplomb pour le match contre Chelsea dimanche.Arsenal manager Arsene Wenger believes the ‘signs look quite good’ for his three injured midfielders as they face a race to be fit for the Chelsea game on Sunday.Arsenal’s Director Arsene Wenger thinks ‘signs are promising’ for his three wounded terrain backgrounds that need to be plumb for the game against Chelsea on Sunday.Arsenal manager Arsene Wenger believes ‘the signs are promising’ for his three injured midfielders who are due to recover for Sunday’s game against Chelsea.
Geld könnten Verbraucher unter anderem durch Frühbucherrabatte oder All-Inclusive-Angebote sparen, erklärte Laepple.Consumers could save money through early-bird or all-inclusive discounts, among others, Laepple said.Money could save consumers through early bird discounts or all-inclusive deals, among other things, Laepple explained.Consumers could save money through early booking discounts or all-inclusive deals, Laepple said.
Microsoft researchers developed a new AI model that has boosted the accuracy of Microsoft Translator, which powers an Azure Cognitive Service, as shown in these before-and-after examples.

The evolution from research to product

It’s one thing for a Microsoft researcher to use all the available bells and whistles, plus Azure’s powerful computing infrastructure, to develop an AI-based machine translation model that can perform as well as a person on a narrow research benchmark with lots of data. It’s quite another to make that model work in a commercial product.

To tackle the human parity challenge, three research teams used deep neural networks and applied other cutting-edge training techniques that mimic the way people might approach a problem to provide more fluent and accurate translations. Those included translating sentences back and forth between English and Chinese and comparing results, as well as repeating the same translation over and over until its quality improves.

“In the beginning, we were not taking into account whether this technology was shippable as a product. We were just asking ourselves if we took everything in the kitchen sink and threw it at the problem, how good could it get?” Menezes said. “So we came up with this research system that was very big, very slow and very expensive just to push the limits of achieving human parity.”

“Since then, our goal has been to figure out how we can bring this level of quality — or as close to this level of quality as possible — into our production API,” Menezes said.

Someone using Microsoft Translator types in a sentence and expects a translation in milliseconds, Menezes said. So the team needed to figure out how to make its big, complicated research model much leaner and faster. But as they were working to shrink the research system algorithmically, they also had to broaden its reach exponentially — not just training it on news articles but on anything from handbooks and recipes to encyclopedia entries.

To accomplish this, the team employed a technique called knowledge distillation, which involves creating a lightweight “student” model that learns from translations generated by the “teacher” model with all the bells and whistles, rather than the massive amounts of raw parallel data that machine translation systems are generally trained on. The goal is to engineer the student model to be much faster and less complex than its teacher, while still retaining most of the quality.

In one example, the team found that the student model could use a simplified decoding algorithm to select the best translated word at each step, rather than the usual method of searching through a huge space of possible translations.

The researchers also developed a different approach to dual learning, which takes advantage of “round trip” translation checks. For example, if a person learning Japanese wants to check and see if a letter she wrote to an overseas friend is accurate, she might run the letter back through an English translator to see if it makes sense. Machine learning algorithms can also learn from this approach.

In the research model, the team used dual learning to improve the model’s output. In the production model, the team used dual learning to clean the data that the student learned from, essentially throwing out sentence pairs that represented inaccurate or confusing translations, Menezes said. That preserved a lot of the technique’s benefit without requiring as much computing.

With lots of trial and error and engineering, the team developed a recipe that allowed the machine translation student model — which is simple enough to operate in a cloud API — to deliver real-time results that are nearly as accurate as the more complex teacher, Menezes said.

Arul Menezes standing with arms folded in front of green foliage in the background
Arul Menezes, Microsoft distinguished engineer and founder of Microsoft Translator. Photo by Dan DeLong.

Improving search with multi-task learning

In the rapidly evolving AI landscape, where new language understanding models are constantly introduced and improved upon by others in the research community, Bing’s search experts are always on the hunt for new and promising techniques. Unlike the old days, in which people might type in a keyword and click through a list of links to get to the information they’re looking for, users today increasingly search by asking a question — “How much would the Mona Lisa cost?” or “Which spider bites are dangerous?” — and expect the answer to bubble up to the top.

“This is really about giving the customers the right information and saving them time,” said Rangan Majumder, partner group program manager of search and AI in Bing. “We are expected to do the work on their behalf by picking the most authoritative websites and extracting the parts of the website that actually shows the answer to their question.”

To do this, not only does an AI model have to pick the most trustworthy documents, but it also has to develop an understanding of the content within each document, which requires proficiency in any number of language understanding tasks.

Last June, Microsoft researchers were the first to develop a machine learning model that surpassed the estimate for human performance on the General Language Understanding Evaluation (GLUE) benchmark, which measures mastery of nine different language understanding tasks ranging from sentiment analysis to text similarity and question answering. Their Multi-Task Deep Neural Network (MT-DNN) solution employed both knowledge distillation and multi-task learning, which allows the same model to train on and learn from multiple tasks at once and to apply knowledge gained in one area to others.

Bing’s experts this fall incorporated core principles from that research into their own machine learning model, which they estimate has improved answers in up to 26 percent of all questions sent to Bing in English markets. It also improved caption generation — or the links and descriptions lower down on the page — in 20 percent of those queries. Multi-task deep learning led to some of the largest improvements in Bing question answering and captions, which have traditionally been done independently, by using a single model to perform both.

For instance, the new model can answer the question “How much does the Mona Lisa cost?” with a bolded numerical estimate: $830 million. In the answer below, it first has to know that the word cost is looking for a number, but it also has to understand the context within the answer to pick today’s estimate over the older value of $100 million in 1962. Through multi-task training, the Bing team built a single model that selects the best answer, whether it should trigger and which exact words to bold.

Screenshot of a Bing search results page showing an enhanced answer of how much the Mona Lisa costs, with a snippet from Wikipedia
This screenshot of Bing search results illustrates how natural language understanding research is improving the way Bing answers questions like “How much does the Mona Lisa cost?” A new AI model released this fall understands the language and context of the question well enough to distinguish between the two values in the answer — $100 million in 1962 and $830 million in 2018 — and highlight the more recent value in bold. Image by Microsoft.

Earlier this year, Bing engineers open sourced their code to pretrain large language representations on Azure.  Building on that same code, Bing engineers working on Project Turing developed their own neural language representation, a general language understanding model that is pretrained to understand key principles of language and is reusable for other downstream tasks. It masters these by learning how to fill in the blanks when words are removed from sentences, similar to the popular children’s game Mad Libs.

You take a Wikipedia document, remove a phrase and the model has to learn to predict what phrase should go in the gap only by the words around it,” Majumder said. “And by doing that it’s learning about syntax, semantics and sometimes even knowledge. This approach blows other things out of the water because when you fine tune it for a specific task, it’s already learned a lot of the basic nuances about language.”

To teach the pretrained model how to tackle question answering and caption generation, the Bing team applied the multi-task learning approach developed by Microsoft Research to fine tune the model on multiple tasks at once. When a model learns something useful from one task, it can apply those learnings to the other areas, said Jianfeng Gao, partner research manager in the Deep Learning Group at Microsoft Research.

For example, he said, when a person learns to ride a bike, she has to master balance, which is also a useful skill in skiing. Relying on those lessons from bicycling can make it easier and faster to learn how to ski, as compared with someone who hasn’t had that experience, he said.

“In some sense, we’re borrowing from the way human beings work. As you accumulate more and more experience in life, when you face a new task you can draw from all the information you’ve learned in other situations and apply them,” Gao said.

Like the Microsoft Translator team, the Bing team also used knowledge distillation to convert their large and complex model into a leaner model that is fast and cost-effective enough to work in a commercial product.

And now, that same AI model working in Microsoft Search in Bing is being used to improve question answering when people search for information within their own company. If an employee types a question like “Can I bring a dog to work”? into the company’s intranet, the new model can recognize that a dog is a pet and pull up the company’s pet policy for that employee — even if the word dog never appears in that text. And it can surface a direct answer to the question.

“Just like we can get answers for Bing searches from the public web, we can use that same model to understand a question you might have sitting at your desk at work and read through your enterprise documents and give you the answer,” Majumder said.

Top image: Microsoft investments in natural language understanding research are improving the way Bing answers search questions like “How much does the Mona Lisa cost?” Image by Musée du Louvre/Wikimedia Commons. 


Jennifer Langston writes about Microsoft research and innovation. Follow her on Twitter.