REDMOND, Wash. — Oct. 28, 2009 — Keyboards and mice still are the dominant methods for working with a PC or laptop. But big leaps in speech-recognition technology mean that talking to a computer may soon be as natural as using a mouse.
Zig Serafin, General Manager of the Speech at Microsoft Group.
Leading Microsoft’s charge to that audible future is Zig Serafin, general manager of the Speech at Microsoft group. Serafin says his team’s goal is simply to create the world’s most advanced speech platform, one that spans cloud-based voice services, mobile phones and world-class servers for enterprise customers. “Voice is the new touch,” says Serafin. “It’s the natural evolution from keyboards and touch screens. Today, speech is rapidly becoming an expected part of our everyday experience across a variety of devices. Bill Gates articulated this vision a decade ago, and we’re seeing it happen today.”
Two years ago, Microsoft acquired Tellme Networks and has subsequently merged Microsoft’s speech development team (formerly the Speech Components Group) with Tellme to form the Speech at Microsoft group. The group’s sophisticated speech-recognition technology and Web speech engine, which has been under development for more than a decade, is leading to a wave of voice-enabled products promising easier, faster interactions — spanning automobiles, smartphones, and personal productivity software.
For example, Ford Sync, powered by Microsoft and Tellme, provides in-dash voice-activated navigation and search. In addition, Bing for Mobile, Exchange Server 2010, Windows 7, and new Windows® phones such as the Samsung Intrepid from Sprint are all voice-enabled.
“See” Your Voice Mail
One of the most eagerly awaited features in Exchange Server 2010 is the new Voice Mail Preview, a capability that is poised to transform the way people retrieve and navigate voice mail. Using speech-to-text technology, Exchange 2010 automatically sends a text preview of voice mail right to the user’s inbox.
Instead of wondering whether the little red light on their phones is signaling an important call, people can scan text previews, right in Outlook, to determine message content and priority.
Exchange Server 2010’s voice mail feature turns an audio call into a text preview.
Rajesh Jha, corporate vice president of Microsoft Exchange, says Voice Mail Preview in Exchange 2010 makes it dramatically easier to visually sift through voice mail on your PC, mobile phone, or any popular Web browser to quickly determine the importance of a call. “For me, this feature is invaluable during meetings or other situations when actually listening to voice mail is not a viable option,” says Jha.
Exchange Server 2010 will launch at TechEd Europe, which runs Nov. 9–13 in Berlin.
“Hands-Free” Calling, Texting and Search
The Bing for Mobile application is a free, on-the-go version of Bing with voice-enabled search. Using this application, people simply speak their search query to retrieve results on their Windows phone.
The Bing 411 service works for any phone. People call 1-800-Bing-411, speak their search, and hear the results or get a text message of addresses, directions and other information for easy access later. Both Bing 411 and the Bing for Mobile application help users safely access important information wherever they may be, when typing on a phone is slow, impossible or inconvenient.
With the newly launched Samsung Intrepid from Sprint, the first Windows phone to use Microsoft’s Tellme voice user interface, the experience gets even better. People can speak a search query or dictate a text message, making it dramatically easier to accomplish tasks on the go. Intrepid users simply press the Tellme button on the phone and say what they want — whether that’s to dial a colleague, text a friend, or search Bing for the nearest hardware store or best happy hour.
“When you’re on the go, using only keystrokes to search can be cumbersome, especially if you’re multi-tasking. It takes over 20 strokes of the keypad to find a restaurant on the Web,” says Yusuf Mehdi, senior vice president of the Online Services Division at Microsoft. “With Bing for Mobile or Bing 411, you simply speak your query to get results quickly, easily and safely. Using your voice to simply ‘say what you want and get it’ helps you do more when you’re in a mobile scenario.”
Talk to Windows 7
An improved speech recognition feature in Windows 7, launched last week, enables people to control their computer completely by voice or by touch and voice. Using Windows Speech Recognition, people can easily launch applications, access commands and even convert their voice into text in any application that runs on Windows 7. In addition, software developers can tap into these capabilities to enable rich, natural speech interactions between users and Windows-based applications.
Partners such as HP are already leveraging these capabilities in their Windows 7-based PCs with innovative applications that leverage speech and touch together to transform the user experience.
“By using the power of their voice, people can get their jobs done more efficiently,” says Ian LeGrow, group program manager for the Windows team at Microsoft. “With Windows Speech Recognition, the interactions between people and their computers can be more natural, not just in the future, but starting today.”
Voice at Your Service
The Speech at Microsoft group runs the Tellme platform, the world’s largest voice platform based on the VoiceXML standard, managing more than 6 million calls every day, helping businesses improve customer service.
This month, the Speech at Microsoft group introduced an enhanced Outbound IVR (interactive voice response) Service on the Tellme platform to provide proactive customer service. With this service, businesses can provide interactive outbound messages that allow customers to act upon the alerts — to pay a bill, rebook a flight, or schedule delivery for a missed package, for example. The Outbound IVR Service is optimized to work across the phone (as a call or text), e-mail, instant messaging and the Web to deliver a personalized, efficient experience.
Says Jamie Bertasi, senior director for Speech at Microsoft, “We are delivering a steady stream of innovations to our platform in order to continue to deliver the best experience for the caller and best performance for the enterprise. By leveraging the power of the cloud and the billions of interactions we see every year, we are able to fine-tune the way companies engage their customers, enabling them to improve customer satisfaction while significantly reducing costs.”
Looking Ahead: What’s Next
According to analysts, the growing demand across industries for speech technology indicates that voice is poised to transform the user experience on a variety of fronts.
“Speech-recognition technology has matured to a level where it’s a primary catalyst for the next wave of innovation in the unified communications space,” said Nancy Jamison, principal analyst with Jamison Consulting. “Microsoft’s recent advancements in speech really strike at the heart of what true unified communications is all about — improving the user experience.”
By combining Tellme’s speech optimization and deployment experience with Microsoft’s cutting-edge speech technology, this new group brings together a cross-functional team of domain experts to drive speech technology to new heights. By using cloud-based technology, the Speech at Microsoft group is envisioning a future where speech recognition rivals human understanding.
Serafin says that his team of experts will remain committed to applying their many decades of experience to push the frontiers of voice-enabled technology that brings speech into everyday use.
“For perhaps the first time in the history of Microsoft, we have our world-class speech scientists and highly respected software-plus-services experts under one roof, and I believe the resulting collaboration will lead to pathbreaking innovation,” says Serafin. “The climate in our R&D environment is optimally charged to accelerate advances, leverage the power of software plus services, and revolutionize the ways customers interact with a wide range of Microsoft products.”
Bolstering that expertise is the recent addition of Larry Heck to the role of chief scientist for the Speech at Microsoft group. Heck first joined Microsoft as the partner architect for the Online Services Division R&D. Before that he led the creation, development and deployment of the search and advertising algorithms at Yahoo!, and before that he was the vice president of R&D at Nuance. Heck has joined the Speech at Microsoft group to help chart the course of next-generation elements of Microsoft’s speech platform.
“Speech belongs in the cloud. Only there can you reach the scale, the enormous volume of interactions required to create a speech system capable of rivaling human understanding,” said Heck. “With the formation of the Speech at Microsoft group, the unrivaled breadth of our platform today, and our cloud-based approach, this future is within sight.”