In earlier posts, I’ve talked about the huge shift that’s underway in how people interact with and use technology. The shift is toward a natural user interface, or NUI, where computers understand how humans communicate using speech, gesture and touch to interact with technology that is contextually aware and adaptive to the person, task and resources at hand.
Speech has a special place in NUI, or what I like to think of as speech NUI. Speech, more than other forms of interaction, is helpful no matter what device you’re using. But, before I go too far on that point, I should start my list:
1) It Feels Natural.
The broadening adoption of NUI is changing how we interact with entertainment and media and will redefine how we think about the TV. Remember when remote controls were first introduced? I might be dating myself, but I remember when a remote was a box with switches and a wire back to the TV, and then later a remote that relied on ultrasonic tones. Today, of course, we’re used to having multiple infrared remote controls scattered across our coffee tables. We’ve also come a long way from the handful of channels people could choose from in those early days. Even basic interactivity opened things up a lot, moving us from just watching TV channels to also having access to recorded and on-demand content — but this new functionality unfortunately also saddled us with bigger and more complex remotes. Adding cloud connectivity to TVs brought the next inflection point, bringing Internet content, applications and communications to the big (and growing) screens in our living rooms. With all this capability for entertainment and communications, the old remote control just doesn’t cut it anymore, but it’s equally inconvenient to get up to touch your TV screen, and it certainly doesn’t feel natural or social to hold a keyboard on your lap as you’re trying to enjoy some entertainment with your family and friends.
At Microsoft, we’re working hard to get technology out of your way so your experience is more natural and intuitive. This year at the E3 Expo 2011, more than 50 years after the introduction of the remote control, our Xbox team introduced a new, innovative way to experience entertainment. It begins by giving TV a new voice: yours. This Xbox experience uses Kinect for Xbox 360 and speech technology from Microsoft Tellme to combine voice and gestures in a way that humanizes the power of the TV, from searching through media to interacting with games. I believe this kind of speech NUI will also change how we interact with devices of any screen size, whether they’re in your pocket, bag, car or office. We’ll use our voices across these varied devices to get more done, quickly and easily.
Microsoft is making big bets on speech NUI. Microsoft Tellme is driving that forward, powering the speech experiences in Kinect for Xbox 360, Windows Phone, Bing Mobile and Microsoft Tellme IVR. Because speech fits well with NUI across devices of all screen sizes, Microsoft Tellme is truly at the center of the NUI evolution.
2) It’s More Than Talking.
The magic of NUI elegantly combines simple ways of interacting, adapting to your innate ability to use your voice, ears, gestures and touch to complete everyday tasks. It’s not just that speech, gesture and touch are available. It’s that you can switch between them seamlessly at different points in the same experience. The technology enables the form that makes the most sense at any given time to help you do what you want to do quickly and easily. A great example is the new hands-free messaging feature in the next version of Windows Phone code-named “Mango,” which uses the Microsoft Tellme speech cloud service to help make it easy to communicate with your social network contacts even when your phone is out of reach. If you’re like me, you probably listen to music frequently on your phone with your headphones on and your phone out of reach, such as when you’re at the gym or out for a run. Getting a text in this kind of situation can be inconvenient, but “Mango” will actually offer to read your text to you and give you the opportunity to reply with a text or instant message using your voice, all without ever looking at or touching your phone.
The big aha moment for me came as I was watching my children use Kinect for Xbox 360 to talk and gesture to our TV, and then later pick up my Windows Phone to find information they wanted just by using their voices with Bing, all without having to read a single manual or learn any complicated instructions. A few days later, I saw one of my kids trying to talk and give commands to our home stereo, without much success, and right then I knew that my kids and their generation will just expect to interact in similar natural ways with all the devices in their lives.
Imagine what we’ll all be able to do a few years from now, just by talking to our devices.
3) It Gets Smarter as You Use it.
In developing the speech NUI, we’ve designed the Microsoft Tellme speech service as a system that continuously learns and adapts. The more you use it, the more it learns and improves — we hope meeting and often exceeding your expectations. It continually gets smarter through a natural feedback loop that spans mobile, entertainment, customer care and other interactions. It learns from the great diversity of ways people speak across these interactions. The Microsoft Tellme speech service currently processes more than 11 billion voice interactions a year; each one helping to improve the service and, along with it, your experience. It’s the ultimate crowd-sourcing example.
4) It’s for You.
That the Microsoft Tellme speech service gets better with each interaction is important. But that’s not the coolest thing about the future of speech. We aspire to deliver services that are just as natural and easy as human conversation. We see a future where the service will know you: know your intent, your social and business connections, your likes and dislikes, your privacy preferences, and the things that define the context that’s important to you. The result will be a speech NUI service that helps you accomplish everyday tasks in a more natural and conversational manner. This service will simplify tasks that used to be tedious or impossible on a TV or other device, by combining an understanding of language and intent with a deep knowledge of you, the user. We envision a future where we build on the experiences we deliver today with Kinect for Xbox 360, Windows Phone, or Bing for iPad or iPhone apps, by enhancing the speech NUI experience to understand more layers of context: what you are doing, where you are doing it, the kinds of devices you are using and your historical preferences. Because this is a cloud-based service, your interactions will be able to persist over time, enabling you to pick up where you left off, regardless of what device you may be using. That is a pretty exciting future, and one where your TV experience will be as helpful and intuitive as it is natural today with Kinect for Xbox 360. In other words, you may never have to see another remote control on your coffee table again!
We’ve been delighted to see the excitement and positive reviews people have had for the voice and gesture-based experience in Kinect for Xbox 360, and the new communications and cloud experiences in “Mango.” We’re passionate about changing not only the way people use their TVs, but any device that’s connected to the Internet. With analysts at Strategy Analytics (June 2010) estimating there’ll be about 400 million connected devices by 2015 in addition to 1.5 billion mobile phones, we know there’ll be a lot of room for change.
If you’re interested in learning more, please follow us on Twitter. And if you’ll be at SpeechTEK 2011this week, please come visit us in person and tell us about what gets you excited about the future. We’d love to hear from you.