What’s Microsoft’s vision for conversational AI? Computers that understand you

Today’s intelligent assistants are full of skills. They can check the weather, traffic and sports scores. They can play music, translate words and send text messages. They can even do math, tell jokes and read stories. But, when it comes to conversations that lead somewhere grander, the wheels fall off.

“You have to poke around for magic combinations of words to get various things to happen, and you find out that a lot of the functions that you expect the thing to do, it actually just can’t handle,” said Dan Roth, corporate vice president and former CEO of Semantic Machines, which Microsoft acquired in May 2018.

For example, he explained, systems today can add a new appointment to your calendar but not engage in a back-and-forth dialogue with you about how to juggle a high-priority meeting request. They are also unable to use contextual information from one skill to assist you in making decisions from another, such as checking the weather before scheduling an afternoon meeting on the patio of a nearby coffee shop.

The next generation of intelligent assistant technologies from Microsoft will be able to do this by leveraging breakthroughs in conversational artificial intelligence and machine learning pioneered by Semantic Machines.

The team unveiled its vision for the next leap in natural language interface technology today at Microsoft Build, an annual conference for developers, in Seattle, and announced plans to incorporate this technology into all of its conversational AI products and tools, including Cortana.

Teaching context and concepts

Natural language interfaces are technologies that aim to allow us to communicate with computers in the same way we talk with each other. When natural language interfaces work as Roth and his team envision, our computers will understand us, converse with us and do what we want them to do, much like most people can understand a complex request that requires a few actions.

“Being able to express ourselves in the way we have evolved to communicate and to be able to tie that into all of these really complicated systems without having to know how they work is the promise and vision of natural language interfaces,” said Roth.

Dan Roth stands with arms folded in front of a counter with a colorful, striped pattern

Dan Roth, Microsoft corporate vice president and former CEO of Semantic Machines, said his team’s technology will enable computers to understand us, converse with us and do what we want them to do. Photo by Dana Quigley for Microsoft.

The natural language technology in today’s intelligent assistants such as Cortana leverages machine learning to understand the intent of a user’s command. Once that intent is determined, a handwritten program – a skill – is triggered that follows a predetermined set of actions.

For example, the question, “Who won today’s football match between Liverpool and Barcelona?” prompts a sports skill that follows the rules of a pre-coded script to fill in slots for the type of sport, information requested, date and teams. “Will it rain this weekend?” prompts a weather skill and follows pre-scripted rules to get the weekend forecast.

Since the rules for these exchanges are handwritten, developers must anticipate all the ways the skill could be used and write a script to cover each scenario. The inability of humans to script every possible scenario limits the scope and functionality of skills, explained Roth.

The Semantic Machines technology extends the role of the machine learning beyond intents all the way through to enabling what the system does. Instead of a programmer trying to write a skill that plans for every context, the Semantic Machines system learns the functionality for itself from data.

In other words, the Semantic Machines technology learns how to map people’s words to the computational steps needed to carry out requested tasks.

For example, instead of executing a hand-coded program to get the score of the football match, the Semantic Machines approach starts with people who show the system how to get sports scores across a range of example contexts so that the system can learn to fetch sports scores itself.

What’s more, machine learning methods then enable the system to generalize from contexts it has seen to new contexts, learning to do more things in more ways. If it learns how to get sports scores, for example, it can also get weather forecasts and traffic reports. That’s because the system has learned not just a skill, but the concept of how to gather data from a service and present it back to the user.

That’s missing in today’s intelligent assistants, which are programmed to do a list of isolated things that a programmer anticipated. The machine learning in these systems primarily focuses on words that trigger a skill, explained Microsoft technical fellow Dan Klein, a recognized leader in the field of natural language processing and a professor of computer science at the University of California at Berkeley.

“They aren’t focused on learning how to do new things, or mixing and matching the things they already know in order to support new contexts,” said Klein, who was also a co-founder and chief scientist at Semantic Machines.

Dynamic conversation

Since the Semantic Machines system can learn how to do new things, it can more easily engage in a dynamic conversation with a person, accessing and stitching together relevant content, context and concepts from disparate sources to provide answers, present options and produce results.

The Semantic Machines system also has a memory to keep track of the context in a conversation and so-called full duplex capability to talk and listen at the same time in order to keep the dialogue flowing.

“Everything you say is contextualized by what has come before so you can do more complicated things: you can change your mind, you can explore,” said Klein. “Moreover, once things get contextual enough, the notion of a skill begins to dissolve.”

That’s because the notion of skills confines interactions to silos of data whereas true conversation relies on connecting data from all over the place. The Semantic Machines technology orchestrates gathering data and accomplishing tasks on the backend while maintaining a fluid, natural dialogue with the user on the frontend.

Reshuffling your schedule to accommodate a high-priority meeting, for example, requires calendar data and directory data to determine who is free, when, as well as contextually relevant data such as the weather, nearby coffee shops and traffic to figure out where to meet and sit, and when to leave to get there on time.

“Once you start letting things evolve and connect contextually, the notion of a skill is way too limiting,” said Klein. “Getting things done involves mixing and matching.”

Building with natural language

At Build, Microsoft showcased a calendaring application using Semantic Machines technology that can make organizing your day with an intelligent assistant a more fluid, natural and powerful experience. The same technology can be applied to any conversational experience and will eventually power conversations across all of Microsoft’s products and services.

That will build on Cortana’s existing capabilities such as providing answers to questions, offering previews of your day and helping you across your devices from phone to laptop and smart speaker.

Once the technology is incorporated into Cortana, for example, it could make getting things done in Office more about what you need to do and less about accomplishing tasks in certain applications.

“We want it to be less cognitive load, less feeling like I have to go to PowerPoint for this or Word for that, or Outlook for this and Teams for that, and more about personal preferences and intents,” said Andrew Shuman, Microsoft’s corporate vice president for Cortana.

What’s more, added Roth, the technology will be made available through the Microsoft Bot Framework. His team is currently engineering a way for developers working in the framework today to migrate their existing data to the Semantic Machines-powered conversational engine when it is ready.

“As a developer you can start building these experiences yourself,” he said. “We can collectively move, on the basis of this technology, past this notion of skills and silos and simple handwritten programs into the kind of fluid Star Trek-like natural language interfaces we all want.”


John Roach writes about Microsoft research and innovation. Follow him on Twitter.