More than a Grain of SALT: Industry Leaders Assess How Specification Is Laying Foundation for Speech Technologies

REDMOND, Wash., April 1, 2003 “To check your account balance, press or say ‘one’ now.”

For many, the sound of such instructions inevitably corresponds to a steady increase in blood pressure. This kind of traditional, proprietary telephony application, which includes limited speech interaction, has allowed businesses to take some initial steps toward lowering costs, but hasn’t always been a favorite with customers.

Now the movement toward broad, widely available speech applications based on Web standards and infrastructure is gaining momentum in the technology world, and it’s only a matter of time before the older, simplistic voice response system makes way for truly sophisticated speech interaction between humans and information systems. In addition, applications and services based on Speech Application Language Tags (SALT), which can utilize a
“multimodal”
approach combining voice and visual interaction on a range of devices, are moving closer to the marketplace.

Imagine calling your computer to have it read back e-mail messages, or even instructing it to call you back when the important e-mail you’re expecting arrives. Imagine your car telling you how many miles or kilometers the fuel left in your tank will cover, then displaying a map of local service stations within that range. Imagine having instant access to account and billing information, without having to wait 20 minutes in a customer-service call-center queue.

The foundation for these kinds of applications is being laid today with the emergence of new specifications such as SALT that have fostered the kind of broad industry support necessary to take speech technology and its related benefits to a much broader audience, to a wider range of devices, with a wider range of service capabilities.

For businesses, this technology can offer value beyond reduced costs, such as increased employee productivity, increased customer satisfaction and new revenue opportunities. These applications are expected to bring significant value to many industries, especially those with extensive call-center service operations, such as financial services, retail and insurance. For consumers, the possibilities for new kinds of services and entertainment are limited only by the imagination.

As companies plan for this new opportunity, developers and vendors are hard at work creating the ecosystem of technologies that are making the promise of speech a reality. As part of this effort, Microsoft recently announced the shipment of a technical preview for the SALT-based Microsoft Speech Platform, which includes an enterprise-grade speech-recognition engine developed by Microsoft, a text-to-speech engine from industry partner SpeechWorks for enabling voice output of corporate or Web data, and a Microsoft SALT voice interpreter.

The new platform — along with the Microsoft Speech Software Development Kit (SDK), which integrates into Visual Studio .NET (currently in beta 2 release and available now at www.microsoft.com/speech/) — is expected to help broaden and advance the technology of speech by providing a common set of tools for developers, allowing them to more easily build applications and services that utilize the largely untapped power of speech in helping humans to interact more naturally with information systems.

Only a year and a half old, SALT has gained broad support from leading businesses across the value chain for delivering speech solutions, including hardware vendors, interactive voice response (IVR) suppliers, telecommunications carriers, handset manufacturers, speech technology application firms, and service providers such as systems integrators. These companies are developing technologies and services that take advantage of the power of SALT and other speech-enabling specifications, not only with speech interaction by telephone, but multimodal interactions as well. SALT can support a whole spectrum of devices, including PCs, Tablet PCs, telephones, cell phones, smart phones and wireless PDAs. Since many of these devices contain displays, multimodal interactions are a key focus.

To learn more about where the speech movement stands today, where SALT fits into the picture, and what to look for down the road, PressPass asked several industry leaders for comment on a new era of voice interaction with information systems:

Mark Willingham

Vice President of Marketing
HeyAnita Inc.


We believe that all new technologies are driven when you provide people with substantial offerings, and also empower them with choice, and certainly SALT does that.


HeyAnita was initially focused on the speech-only interface, and at the time multimodality was just a broad concept. But SALT really brings that promise to life, because it enables people to choose input and output modes that are relevant for them. It empowers them with choice. We believe that all new technologies are driven when you provide people with substantial offerings, and also empower them with choice, and certainly SALT does that.

“The ability to switch between modalities I think is extremely compelling, and it’s actually going to become a necessity in an environment where everyone wants to be connected all the time. Think of a salesman walking into a meeting. The multimodal interface would allow him to request a PowerPoint from his e-mail via telephone, and to have it sent to the customer’s laptop right as he walks in. That kind of efficiency and flexibility can be built into almost any kind of application using the multimodal capabilities afforded by programming styles such as SALT. It provides the best of all worlds, and leaves the decision up to the user.

“With the announcement of our FreeSpeech ™ SALT Voice Browser last year, HeyAnita became the first to support all of our current applications with SALT. This includes powerful efficiency-based applications such as voice-activated dialing, voice access to e-mail and voice-SMS as well as all sorts of different content and information-based applications such as weather, news and sports. We also have other offerings, such as the HeyAnita Voice Care product suite, which is geared toward call centers, again utilizing SALT to provide the voice interface now, and then enable a company to move forward into the multimodal world.

Peter Gavalakis

Marketing Manager
Intel Corp.


We now have the three things that we’ve needed to come together to make the widespread deployment of speech interfaces a reality.


SALT is compelling because it utilizes existing markup languages and the execution models behind them, making it appealing to the community of web developers. It’s also applicable to both the classic, system-directed voice response applications where users respond to prompts issued by the application, as well as user-directed applications, where the user asks for what is needed and the application responds based on the commands.

“SALT works well in both execution models, and that’s part of the beauty of it. It’s a single specification that addresses voice-only as well as multimodal applications. There was really no specification prior to SALT that did that. So SALT is unifying in the sense that it enables speech interfaces for a wide variety of applications.

“With the availability of standard telephony boards that can integrate servers with a company’s telephony infrastructure, combined with a variety of complementary software technology for deployment in the call center and the robust development environment that Microsoft is providing, we now have the three things that we’ve needed to come together to make the widespread deployment of speech interfaces a reality. The next step is to build an ecosystem of developers to support it with products and services for the end-user community. Microsoft, Intel and others, working together around a common industry specification such as SALT, will help companies to start deploying solutions, lowering entry barriers and increasing customer choice, while launching us into the age of true voice integration, and making speech mainstream.

Steve Chirokas

Director of Product Marketing
SpeechWorks


Part of the reason that speech is starting to take off is there are many things that can be done with speech today that you couldn’t do even in the recent past.


What we’re seeing today is that speech is really beginning to cut across many different application areas and segments, and some terrific solutions are starting to emerge for companies to automate call center operations, making the call center more available, allowing customers to get information in a self-service manner, and really improving overall customer support and satisfaction.

“Part of the reason that speech is starting to take off is there are many things that can be done with speech today that you couldn’t do even in the recent past. There is a combination of reasons for that. One is that recognition is much better than it used to be. Combine that with text-to-speech, the output component, which is sounding so much more natural today, and what you get is a customer service model that really works with customers. When a customer calls into an automated call center today, the technology is available that can allow them to easily get through a dialog and be understood. And importantly, the information that they get back doesn’t always have to be prerecorded. It could be a data stream or a text-to-speech stream, and it sounds much more natural.

“I think, with the quality and effectiveness of speech applications improving as they have, with new capabilities afforded by emerging specifications such as SALT, and with new tools coming onto the market to harness those capabilities, we’re going to see more applications that repurpose Internet and corporate data to an audio format. This will allow mobile workers to really be connected in efficient ways when they’re out in the field, and allow existing Web services to provide a range of new services to customers no matter where they are.

James Mastan

Director of Marketing, Speech Technologies Group
Microsoft Corp.


The approach taken by the SALT specification will allow for seamless interaction between Web and telephony.


Companies looking to make an infrastructure investment in speech want to ensure that the technology in which they are investing will not only solve problems and benefit the business today, but also allow for growth and expansion as the business evolves. We believe that our commitment to SALT as the foundation for the Microsoft Speech Platform will help customers not only take advantage of their existing web infrastructure to provide all of the associated benefits of speech integration for telephony applications right now, today, but also lay the foundation for an evolution to multimodal interfaces for multiple devices handhelds, Smartphones, PDAs, Tablet PCs as those devices start coming into the enterprise in the future.

“By basing our platform on an open, royalty-free specification, we hope to eliminate the ‘rip-and-replace’ proposition for customers the idea that they would have to not only invest now to enable speech for their telephony applications, but, as the multimodal interface comes into the forefront, that they would have to get rid of that infrastructure and rebuild their applications instead of simply extending them through minor modifications, as you would with an elegant Web programming style, to include the new devices. We think that the approach taken by the SALT specification will allow for seamless interaction between Web and telephony, while also positioning customers to include multimodality as those solutions develop.

“The partners that we’ve been working with in this effort have been a tight, cohesive team, and we’re constantly expanding the depth and breadth of our partnerships to help bring SALT-based technologies to the mainstream.

John Donaldson

Vice President of Strategic Product Development
Intervoice


Once customers are able to interact with a Web site or corporate server via intuitive speech interfaces, they will start to expect speech interaction on all automated systems.


Enterprise IT departments have invested heavily over the past few years in solutions that provide automated, self-service contact for their Web-enabled customers. Given that around half of all U.S. households have Web-enabled PCs, while telephony penetration is in excess of 90 percent, implementing a solution that combines both channels increases customer contact and really extends the enterprise’s reach and ability to communicate with customers in an automated fashion. This is a critical benefit of speech technology and particularly SALT in the Microsoft environment the ability of the enterprise to greatly reduce their total cost of ownership of the platform while increasing customer contact through any device, any time, anywhere.

Today, developers using the Microsoft Speech SDK can begin to speech-enable their Web pages and develop interfaces to the same data sources through telephony devices. Once customers are able to interact with a Web site or corporate server via intuitive speech interfaces, they will start to expect speech interaction on all automated systems, replacing the cumbersome touch-tone interfaces with which we are all familiar. The ability to both talk through that interface as well as get immediate feedback from the data source either verbally or visually will be very important for increasing customer satisfaction, improving usability, and expanding automation into more complex applications.

At Intervoice, we’re working on products that provide the ability to not only be reactive to customers’ requests for retrieving information from the system, but also are proactive in pushing that information out. For instance, if you commute to work every day, you may not want to wait for the radio traffic report to tell you the location of accidents. Working with a Web service, applications can be enabled to monitor the route that you normally take, recognize when there is an accident or blockage on the route, and proactively notify you through both a speech and a data interface so that you can choose an alternative route before it’s too late. This notification can occur on any device you choose home phone, mobile phone, wireless PDA or PC.

The fact that the industry is coming together with a standard like SALT, which leverages a very large installed base of products from Microsoft and other companies, gives vendors such as ourselves the ability to focus on creating these kinds of solutions, without worrying about the platform. It lets us focus our 20 years of industry experience on the value of the solution and the service that we provide to the end customer. The fact that Microsoft is producing a platform and tools to work with speech greatly enhances this benefit by enabling Intervoice to transform the way people and information connect.”

Brian Strachman

Senior Analyst
In-Stat/MDR


SALT has the ability to make PDAs much more useful for mobility applications, and to really bring the power of the Internet to the PDA.


Most of the speech technology installations that we’ve seen have been in call centers, which have provided good evidence for the return on investment in speech. But the call-center market, although it’s been fairly profitable, is just the tip of the iceberg. There are a lot of call centers out there that could still benefit from speech, and I think there are going to be lots of other markets where speech recognition will be used. I think SALT will play a key role in the ability of speech technology to enter those new markets.

For instance, speech-enabling a PDA is another great application for SALT. Many people use their PDAs now only for basic contact management, scheduling, that sort of thing. If you get more complex than that, you run into limitations with the interface in dealing with either a small keyboard or a touch screen. SALT has the ability to make PDAs much more useful for mobility applications, and to really bring the power of the Internet to the PDA. And that’s the beauty of speech it’s the most natural way to communicate.

SALT also provides a programming model that is familiar to millions of developers. So in terms of training a development ecosystem to work with the specification, which is really what creates an industry, there is a large base of talent out there already. All of the small software houses that create specialized applications for vertical markets, all of the people that will customize their Web pages to make speech recognition available, all of that development talent all over the world can not only have access to SALT, but use it to create cool applications that will drive the next wave of speech technology.

Related Posts