REDMOND, Wash., Aug. 16, 2001 — Imagine composing your e-mail with your telephone keypad. “That’s what using an input method editor is like,” says Xuedong (X.D.) Huang, general manager of Speech.NET at Microsoft. “You’re trying to create 6,000 Chinese characters with 26 letters.”
Currently, most Chinese-speaking computer users form characters by using an input method editor (IME) to convert pinyin — a system that recognizes Chinese language sounds — into characters. The process is extremely time-consuming, since a computer user must be familiar with pinyin and often must stop occasionally to choose the correct character from a list of characters with the same sound.
This difficulty spurred Kai-Fu Lee, vice president of Microsoft’s Natural Interactive Services Division, to become the driving force behind Chinese speech recognition, serving as Speech Research group manager and managing director of MSR China when the technology was developed.
Speech recognition for simplified Chinese and Japanese — as well as English — now ships with every copy of Microsoft Office XP, the newest version of Microsoft’s popular productivity suite. This may come as a surprise to some consumers, many of whom have heard of the new features for collaboration, formatting and application stability found in Office XP. But speech recognition hasn’t been as widely touted.
Does the low profile seem strange? Not to Huang, who, as an IEEE (Institute of Electrical and Electronics Engineers) fellow, has been working on speech recognition since 1982, and recently published a book on the subject titled “Spoken Language Processing” with colleagues Alex Acero and Hsiao-Wuen Hon.
“This is version 1 of an emerging technology, and it will continue to be improved,” Huang explains. “Microsoft’s work on speech technology is for the long term. This is part of the company’s vision: The goal is to provide software to help people access information any time, any place and on any device. Speech is one of the enabling technologies to realize that vision.”
Xiaoning Ling, a program manager with Microsoft Research (MSR), agrees. “One of the major missions of the MSR Beijing (China) Lab is to help consumers who use character-based languages to use computers as naturally, as easily, as those who use other languages. The significance of this kind of technology is in the future.”
It was Huang’s group that developed the key part of what makes speech recognition possible — the kernel, known as the speech recognition engine. Language models are built upon the engine — such as English, simplified Chinese and Japanese, the languages that are available in Office XP — but the kernel remains the same. The kernel makes the inference between what a user puts in and the data that comes out, Huang explains. You input Chinese, and the output is Chinese. You input English, and the output is English. Independent software developers can also create and manufacture Office XP-based solutions for all languages.
The total Chinese vocabulary contains over 60,000 characters; of these, approximately 6,000 are frequently used. The Speech group collected several years’ worth of newspapers for common phrases and words, and used them as the base for the simplified Chinese language model.
For users of simplified Chinese and Japanese, the results are significant, Huang says. Speech recognition for these languages can improve productivity by a factor of two. “It’s a huge step forward,” he says. “A 10-20 percent throughput gain — the speed it takes to finish a task — is already a big improvement. A factor of two is a paradigm shift.”
Several groups at Microsoft collaborated to get speech recognition into Office XP. MSR provided the basic engine technology; Huang’s speech product group brought the technology to the market; and the IME group worked with Office on speech integration.
Huang uses Office XP’s speech recognition feature whenever he needs to do anything in Chinese. His productivity has doubled, and he shudders when he recalls having to use an IME. “I gave a talk recently at Tsinghua University, the top engineering school in China,” he says. “Some of the students in the audience didn’t believe what I was saying about the technology. So there was a challenge — a student from the audience came up, and I dictated to Office XP while he used an IME, and I beat him two to one — I finished everything in half the time, including the corrections I had to go back and make. I may not have been the best dictationist, and the student may not have been the best typist, but that made a big point right there.”
Users of the Roman alphabet can also benefit from voice-recognition technology now — particularly those who don’t type well or who suffer from repetitive stress injuries such as Carpal Tunnel Syndrome. Office XP includes advanced speech-recognition functionality in all Office programs — such as Excel or PowerPoint — enabling people to enter and edit data, control menus and execute commands by speaking into a microphone.
Microsoft recognizes that speech functionality will be a key part of the wireless Web and the future of technology, Huang says. “Speech is a consistent input modality to help people access information any time, any place, on any device. It’s consistent because the speech UI is the same when you move from a phone to a computer, or when you’re driving or when you’re watching TV. The investment of speech recognition in Office is part of Microsoft’s long-term commitment to delivering speech functionality across many devices, including the PC.”
To use the speech-recognition function in Office XP, regional settings should be set to the appropriate language. Then, under the Tools menu, choose the Speech option. All that is required beyond that is a microphone — Microsoft recommends one that is high-quality and close-talk.
Ling reveals that MSR is also working on a multi-model approach that will combine speech, keyboard and handwriting to achieve the most efficient, productive results. However, this is still in the early stages of research.
“In Western countries, the keyboard is probably not going away any time soon, because people are so fast and productive with it,” Ling says. “And even for people in Asia, it may not become the standard soon, as the technology isn’t perfect yet. But it is a huge step forward.”
Keep your eye — and your voice — on the future.