MiPad: Speech Powered Prototype Listens and Learns

REDMOND, Wash., May 22, 2000 — Xuedong
“X.D.”
Huang, founder and head of the Speech Technology Group at Microsoft Research (MSR), figures that users of handheld computers are tired of tapping tiny styluses or typing on minuscule keyboards to compose e-mail and schedule appointments.

To address this problem, Huang’s group — which includes some of the best speech researchers in the world — recently developed a research project called MiPad (pronounced
“my pad”
), which is short for
“my interactive pad.”
The prototype device enables users to accomplish many common tasks using speech recognition, natural-language processing and wireless-data technologies.

The device, which company Chairman and Chief Software Architect Bill Gates demonstrated on March 21 at the Latin America Enterprise Solutions Conference in Miami, combines cellular phone, Internet access and handheld-computer capabilities.

MiPad incorporates a built-in microphone that activates whenever a field is selected. When a user taps the screen or uses the built-in roller to navigate, the action narrows the number of possible instructions that the computer will expect to hear. Select an entry in the address book, for example, and the computer “knows” you are about to enter a voice command to establish a new address or modify an existing entry.

MiPad runs on Windows CE and is linked to an NT Server. An MSR continuous-speech-recognition (CSR) engine boasting a 64,000-word vocabulary powers the device. Another engine interprets language inputs from the CSR engine and maps them into meaningful actions that MiPad understands.

Users
“train”
MiPad to recognize their speech by speaking into the built-in microphone. Huang dictated the MiPad
“readme”
file into the microphone to train the device to recognize his voice. Now, when he asks his MiPad for someone’s address, the device displays it immediately — despite his rapid-fire, accented elocution.

MiPad is the first demonstration application designed to showcase MSR’s Dr. Who
“tap and talk”
interface and engine technologies. The name is a nod to the hero of a British science-fiction television show who travels through space and time to battle evil and injustice.

Speech is a natural, efficient method of interacting with a computer, but simply layering it onto an existing product doesn’t do it justice, says Microsoft program manager Derek Jacoby. Because MiPad is currently based on a client-server model and is a research project, MSR researchers are free to experiment with the interaction model without being bound by excessive hardware or product constraints. “Our goal is to make interacting with the computer as natural and easy as interacting with a person — at least within the limited subject domain of email and calendar tasks,” Jacoby says.

“I can imagine that one day I will be able to use speech to access the vast information on the Internet while connected and also use it to manage my personal information while off-line,”
says Peter Mau, the developer lead on the project.

MiPad is architecture for both the client-server and standalone models.”

Ease-of-use is another significant factor in the vision of the MiPad project. Senior researcher Alex Acero says he and his fellow researchers are heavily considering consumer wants and needs. “Results from our internal tests show that users of MiPad do not like wearing headset microphones,” Acero says. “On the other hand, a microphone located on the MiPad device captures a lot more background noise, which results in significantly lower recognition accuracy. Thus, we’re working to improve the robustness of the ‘recognizer’ in such noisy conditions.”

Huang says the MiPad prototype will be the launch pad from which other, similar products will be introduced.
“The mission of Dr. Who is to develop a new, compelling interaction model,”
he says.
“By developing MiPad, we are working to perfect a model that we can generalize and extend for many other home and office uses. That mission is very important and strategic for Microsoft.”

The key goal, Huang notes, is familiar: enable people to access information any time, any place, on any device. He believes that speech technologies will help the company fulfill this mission and expand its business into the home and to mobile markets.

Huang explains that incorporating speech into portable wireless devices, such as the MiPad prototype, was initially inspired by the vision of MSR Vice President Rick Rashid. Although Huang says it might be awhile before MiPad and similar Microsoft products become available, he sees benefits beyond commercial release.

“I would love to see MiPad technology turned into a product,” Huang says. “But we’re not just selling it as a device. We’re selling it as a vision for Microsoft and the industry.”