Who Needs Books? MSR Researchers Look for Ways to Make Audio and Video as Easy to Use as Text

SEATTLE, Wash., April 2, 2001 —
“Angel and minister of grace, defend us.”
Theater buffs will recognize this line from Shakespeares
“Hamlet,”
when the Danish prince first meets the ghost of his murdered father. But what does it mean? Since the line can be interpreted many ways, some productions of the play portray Hamlet as confident, others as if he is in terror, others as if he is in disbelief.

So, consider the challenge when professors like Peter Donaldson try to compare and contrast these different interpretations in

class, using video recordings of various performances. For years, it has taken Houdini-like manipulations of videotapes and discs, video players and Search buttons. And once instructors share their observations with students, the contextual connection of these words to the video is lost.

“We live in a cross-media world, we need cross media tools to understand it,”
said Donaldson, a professor of literature at the Massachusetts Institute of Technology (MIT).

At the CHI 2001 Conference on Human Factors in Computing Systems this week in Seattle, some of the top minds from Microsoft Research (MSR) will explain how they are helping solve problems such as Donaldsons, making it easier for people to control and interact with todays growing array of technology. In tutorials and papers published at CHI, they will show how new technologies being developed by MSR have the potential to increase and improve this interaction with machines and between the users — and save businesses and other users time and money.

Microsoft Chairman and Chief Software Architect Bill Gates will expand on these themes during a keynote speech today at CHI, where he will discuss the history of computer interfaces, how Microsoft has played a role in this area and how the company is adding new innovations with the Tablet PC and Microsoft Reader.

“There is no limit to the possibilities when it comes to human-machine interaction,”
said MSR researcher Jonathan Grudin.
“As new approaches mature, people will be able to more easily and fully realize the benefits of technologies they already use every day, as well as others they havent yet encountered. We are only just beginning to explore all of the possibilities.”

Grudin should know. A regular contributor at CHI and other professional conferences, he and other members of MSRs Collaborative and Multimedia Systems Group authored six of the eight MSR research papers selected for publication at CHI 2001. These papers — which detail everything from advances in remote-conferencing technology to new methods for linking public spaces — not only indicate the stature of the research being done by Grudin and his colleagues; they show the increasingly vital role collaborative technology now plays in interaction between humans and machines.

Making Video and Audio as Easy to Use as Paper

In addition to the papers, Grudin will offer an all-day tutorial at CHI 2001, in which he will describe one of the building blocks of human-machine interaction: the ability to manipulate and reference the audio and video recordings stored on many of these machines. MSRs goal is to make recordings as easy to reference as books and other paper text. Though theres still work to do, Grudin believes MSR is making significant strides.

Donaldson is the first to agree. A prototype annotation system developed by MSR allows the MIT professor to compare and contrast multiple versions of theatrical plays — as well as permanently capture typed comments about the plays — without any of the past complications. The system allows him to show performances side by side on a single screen, quickly access specific scenes in both videos, and type commentary about scenes onto the screen. The comments are saved and can be viewed over the Internet by students who miss the class or want a recap.

Donaldson also uses the system in online seminars for Shakespeare Association of America. Colleagues in locations as disparate as Hawaii, Maryland or Iowa log on to watch videos and type comments to one another in real time, adjacent to the moving images, or to view the video and others comments at a later time.

“It really enables you to make a quantum leap forward in understanding when you can share precise observations in this way, regardless of your location or when you watch a video,”
Donaldson said.

MSR developed the algorithms and other technology that power the annotation system. Researchers there also are developing ways to compress and more easily browse and skim everything from videotaped lectures to baseball games. By developing algorithms that can notice and eliminate pauses and jump between other cues within tapes, MSR researchers have been able to condense a three-and-one-half-hour game to as little as two minutes of highlights, without any human editing. These highlight cues, or thumbnails, can also work much like a table of contents in a book, allowing users to jump between significant sections of an audio or video recording, Grudin said.

“This technology allows you to decide how much of the game or any other event you want to watch,”
he said.
“Or if you find a lecture on the Web that you think you might be interested in, you can skim it — much like you would with a book — before deciding if you want to sit down and view the entire thing.”

Building a Better Camera — and Camera Operator

Before users can annotate or skim audio and video recordings, they need someone or something to capture those sounds and images as easily and inexpensively as possible. In papers published at CHI 2001, MSR researchers explain technologies they are developing to automate operation of video cameras and broadcast remote conferences and meetings.

“We believe, in the future, recording a meeting will become almost as simple as turning on a light switch in the meeting room, and the recurring cost will be negligible — a few dollars for the disk storage for a one-hour meeting,”
write MSR researchers Yong Rui, Anoop Gupta and JJ Cadiz in their CHI paper,
“Viewing Meetings Captured by an Omni-Directional Camera.”

They describe how they are designing and testing systems that provide realistic-looking recreations of group meetings, using new omni-directional camera technology and image-processing software — which they designed — that can locate where people are within a room. The software also extracts and positions a person in a video window.

Rui, Gupta and Cadiz used cameras made by other companies in their research. MSR has since developed a camera system, called a RingCam, that provides a more realistic image and higher picture resolution at a fraction of the price of single-camera systems, which cost as much as $10,000, said MSR researcher Ross Cutler, who developed the RingCam.

In its current prototype form, the RingCam comes in a round casing, roughly the circumference of a compact disc and the thickness of two hockey pucks. It sits in the middle of a conference table, captures images of all of the people around the table and stitches these images together side by side for broadcast on a remote monitor. It also provides an enlarged view of the person who is speaking while maintaining smaller images of other people in the meeting in the background.

The RingCam provides higher resolution than other, more expensive systems by combining pictures captured by five inexpensive cameras, fixed in a circular array, rather than stretching the image from a single, high-resolution camera over 360 degrees, Cutler explained. Based on the lower cost of the cameras and other materials, he expects the RingCam could sell for about $300.

“Until now, video conferencing has not been a success,”
Cutler said.
“One reason is the lack of resolution in the camera equipment. You dont really get to see the expressions on peoples faces. The RingCam has the potential to change this, so that you can experience remotely the richness of interaction that occurs in a traditional meeting.”

In another CHI paper, MSR researchers offer a glimpse of another broadcast tool they are developing and testing. The technology automates camera control by combining software capable of tracking a speaker and adding in audience shots and different camera perspectives. Grudin said this technology has the potential to make it even less expensive to broadcast virtually any meeting or other public gathering.

The researchers have begun testing the automated camera during broadcasts of in-house lectures at MSR. Half of the time, viewers cant tell the difference between the automated camera and another camera operated by a person.

“We are getting really close,”
Grudin said.
“Pretty soon well refine the technology so people cant tell which one is which.”

New Ways to Keep in Contact, Protect Privacy

While the potential of collaborative technology to automate and simplify broadcasts of meetings and other gatherings has been clear for years, MSR is looking for ways to use these technologies to draw people together informally in other, less obvious places. In one CHI paper, MSR researchers detail an experiment that linked seven kitchens and eating areas for several months with live audio and video signals.

The researchers undertook the experiment after MSRs offices in Redmond, Wash. moved from one building with three kitchens to two buildings with seven kitchens, making it harder for people to informally meet with a broad cross section of colleagues. To remedy this and test some new ideas, the researchers set up a large broadcast screen, camera and audio link in each kitchen. Each screen included live video and sound from the other kitchens, along with a live broadcast of CNN to capture the attention of the people in each of the kitchens.

The researchers wanted to see if MSR employees would use the technology. However, even before the launch of the system, some raised concerns about privacy, prompting the researchers to add buttons outside the kitchens to turn off the cameras. Other employees worried about who might be monitoring or tracking kitchen use and how this information might be used. A few times, employees anonymously disconnected the cameras or covered the camera lenses.

Although only one in five MSR employees surveyed after the experiment said they waved or gestured to people in the other kitchens and even fewer used the technology to speak to one another, almost half wanted to see the experiment continued or expanded.

Grudin learned a lot from the experiment, but one thing in particular: Most people need to understand the purpose of cameras in public places before they will see value in them.

“Nobody complains about cameras in banks because they increase security,”
Grudin said.
“But you put one outside their office for no reason and they may not be real happy.”

Putting Research into Action

Dont expect to see the research and prototypes MSR will unveil at CHI included in Microsofts next lineup of new software. Some ideas may never make it beyond the experimental stage. A few will require the support of hardware manufacturers. Grudin predicts the importance of these and other types of collaborative technology will increase as digital audio and video become even more pervasive on the Internet and even cheaper for users to capture and store.

Until then, universities and other Microsoft partners will put them to the test. The University of Washington (UW) in Seattle will use the RingCam to link a class with students in two locations, beginning next winter. The UW and Stanford University in Palo Alto, Calif., plan to begin using an annotation system similar to the one at MIT.

Donaldson, the literature professor, plans to expand the use of the MIT system and predicts no shortage of applications beyond academia for these types of technologies.
“Our culture is one of sound, moving images and text,”
he said.
“We need tools like these to provide new context for these interrelated forms of media.”

Related Posts