REDMOND, Wash., Aug. 21, 2001 — Millions of PC users may not know it, but they can ignore the keyboard and mouse when they want to create and format e-mails and Microsoft Word documents, open and close files in Excel, or create PowerPoint presentations. And they can also ignore their display screens and have the numbers and text read back to them as they enter data into an Excel spreadsheet.
Thats because the software theyre using Microsoft Office XP understands voice commands and can convey information by speaking. Office XPs ability to understand and use spoken language is based on a Microsoft technology called the Speech Application Programming Interface (SAPI) thats also being used by a growing array of independent software developers to speech-enable their own applications. Microsoft this week released SAPI version 5.1, which vastly simplifies application developers work to enable their applications to speak and to understand the speech directed to them.
SAPI 5.1 also makes it easier for
“engine vendors”
— the people who create the underlying software code that translates between spoken words and text — to offer software engines that can work with many speech-enabled applications, without having to re-write the engine code to support each one. Similarly, application vendors gain the flexibility to link their software to any compatible engine.
Bringing Speech-Enabled Applications into the Mainstream
Microsoft has been researching and promoting speech-enabled software since the mid-1990s, and its SAPI is the most popular PC-based speech-to-text and speech recognition interface in the industry. Microsoft expects this newest SAPI version to help bring speech-enabled applications into the mainstream by making it easier for millions of developers to voice-enable their software for the first time. That greater ease comes from SAPI 5.1s automation support, which lets developers take advantage of SAPI functionality from automation languages such as Visual Basic.
“Until SAPI 5.1, developers who wanted to voice-enable their software needed to use languages such as C or C++, which limited this speech technology to a relatively small group of developers with specific skills,”
says Glenn Thompson, group program manager of Microsofts Speech group.
“SAPI 5.1 puts this capability within reach of the millions of Visual Basic developers, without requiring them to learn new languages. We think this is another important step forward in making speech recognition and text-to-speech a standard part of the desktop applications that these developers have created.”
SAPI 5.1 includes improved speech recognition and text-to-speech engines that developers can include with their applications at no additional charge, and eliminates the need for developers to also use the Windows Platform SDK (Software Development Kit) when theyre compiling speech-enabled applications. Engine vendors and developers say that these additions make it faster, easier and less expensive for developers to speech-enable their applications with better results than they could have achieved before.
SpeechWorks International, the leading provider of over-the-telephone automated speech-recognition and text-to-speech solutions, recently created a SAPI 5.1-compliant interface that brings its Speechify engine to
“new portions of the developer market,”
according to Dave Burns, development manager at SpeechWorks.
“This increases the size of our customer base and allows those customers to offer their users a higher level of quality in their applications.”
InSync Software has been working with a pre-release version of SAPI 5.1 for the past two months.
“Other speech technologies were focused on in-house application rather than developer tools,” says Parmod Gandhi, president of InSync Software. His company produces SpeechX controls that incorporate speech capability into drop-down boxes and other interface elements on an application.”
In order to be useable, an application must be voice-enabled from the source and not after the fact. SAPI 5.1 is the first complete SDK to help developers, large and small, to incorporate speech in their applications. SAPI 5.1 is easier to work with, especially with its new automation interface. SAPI 5.1 should open the door to a huge market for speech-enabled applications.”
“Were happy to support Microsofts ever-expanding speech recognition efforts by adapting our applications and developer tools to this latest SAPI release,”
said Chris Spencer, CEO of Wizzard Software, which creates speech-enabled applications.
“We feel SAPI 5.1 is a major push in the right direction for the entire speech recognition industry,”
Spencer says.
Application developers supporting SAPI 5.1 include Alexis Communications, Chant, Datria, EverSpeech, InSync, Ivoice, O & A, Realize Software, Speech Studio, Tangis, VoiceGenie, Wizzard, and Words+. Engine vendors supporting the technology include Agenda, ART, Babel Technologies, Fonix, Fujitsu, Lernout & Hauspie, Mindmaker, NEC, Rhetorical Systems, and Speechworks.
The Audiences for Speech-Enabled Applications
Speech-enabled applications are a particular benefit to consumers with disabilities, particularly those who cannot easily read a display screen. Microsoft plans to expand the speech support it offers its own users when it adds SAPI 5.1 to Windows XP, the next major version of the Windows operating system, scheduled for release on October 25.
But consumers with disabilities are only one of the audiences for speech-enabled applications, according to Microsofts Thompson.
“Users in two of the worlds largest computer markets — China and Japan — have a pressing need for speech-enabled applications, a need thats not immediately apparent to U.S. and European users,”
says Thompson.
“While Western languages have a 26-letter alphabet thats relatively easy to implement on a keyboard, China and Japan have character-based languages with thousands of characters, and the keyboard isnt a great way to work with them. For many users in these countries, speech-enabled applications can greatly increase the speed and productivity of entering text. Thats why SAPI 5.1 includes speech recognition and text-to-speech engines in Chinese and Japanese, as well as in English.”
NEC, for example, uses SAPI for its SmartVoice 4 XP Japanese speech recognition and text-to-speech engines. According to Mitsuru Nishiura, voice interface project manager for NECs Personal Solutions division, SAPI will
“increase voice-enabled applications and make digital equipment easier to use.”
Another audience for speech-enabled software is specialty or niche users who need to keep their hands free while interacting with software. Specialty applications are particularly suited for speech recognition, since the relatively smaller vocabularies used in niche markets make it easier for software to achieve high accuracy in interpreting speech. Doctors or other health-care professionals who enter diagnoses or data while examining patients comprise one such audience. Anyone who drives a car is, too. Most automobiles today use embedded computers to control entertainment, climate and navigation systems.
Next Stop: The Speech-Enabled Web
While SAPI 5.1 will bring speech capabilities within reach of many more developers, applications and users, Microsoft isnt stopping there. Thompson says his unit is already looking at ways to speech-enable the Web.
“Web users, particularly mobile users, have a compelling need for speech capability,”
says Thompson.
“They typically use small devices — Web-enabled phones, handheld PCs and other small, interactive devices — that have small screens and keyboards, if they have screens and keyboards at all. Being able to speak commands to such devices and have information displayed on the screen or spoken back to the user will be very cool. Thats where the action will be.”
And thats where Microsofts speech group will be, too.