Craig Mundie: Cleveland Clinic

Remarks by Craig Mundie, chief research and strategy officer for Microsoft
Cleveland Clinic
Cleveland, Ohio
January 5, 2011

MODERATOR: Well, good afternoon and Happy New Year. I’m happy to welcome you to the first of the 2011 Speaker Series, and today we have an absolutely outstanding opportunity.

Craig Mundie is the chief research and strategy officer at Microsoft Corporation, and a leading figure in business and information technology. He charts the future of the world’s largest software concern. Reuters News Service calls him Microsoft’s top visionary. He is shaping the technology that is going to shape our world. Craig Mundie was born here in Cleveland. His father was an accountant for the auto industry. The family moved to Detroit when Craig was six months old, but he remembers Cleveland fondly. (Laughter.) He’s ultimately settled in New Orleans. Craig displayed all the childhood symptoms of a budding engineer.

At eight years of age, he taught himself how to use a slide rule, which was subsequently stolen from him at age 12 by the class bully. He built his first computer at age 12, and went on to earn his bachelor and master’s degrees in computer-related fields at Georgia Tech.

While still in college, Craig went to work for the developer of the Systems Equipment Corporation, which subsequently became Data General’s first acquisition. Then, in 1982, he co-founded Aliant Computer System Corporation which developed mini supercomputers. He held various positions before becoming CEO. Craig Mundie’s future boss, Bill Gates, was also busy during this period. He co-founded Microsoft in 1975 in his parents’ garage. Microsoft built the operating system that powered the personal computer revolution.

In 1992, Microsoft passed GE to become the largest corporation on Earth measured by market capitalization. That same year, Bill Gates recruited Craig Mundie to lead the new Consumer Platform Division. Craig went on to work closely with Gates as chief technical officer for Advanced Strategy and Policy. He focused on privacy, security and cyber security. He initiated Microsoft’s Trustworthy Computing initiative to set new standards for security for the company’s products.

Craig has served as Microsoft’s liaison with foreign governments like China, India and Russia. He serves the U.S. National Security Telecommunications Advisory Committee, and the Task Force on National Security in the Information Age. He also sits on President Obama’s Council of Advisors for Science and Technology. Recently, he co-chaired a report on the use of advanced information technology to enhance the nation’s evolving healthcare capabilities.

In his spare time, he enjoys piloting his boat, Serendipity, and dabbles in advanced digital photography, video and audio. He has learned to drive racecars, dragsters, and has taken air combat classes at Boeing field.

Since Bill Gates retired from active duty at Microsoft in 2008, Craig has taken over the responsibility for Microsoft’s research, and its 850 computer science researchers, and sets the company’s overall long-term technology strategy. He oversees new technology incubations in quantum computer research unit, and Microsoft’s healthcare technology business, including the advanced HealthVault program that was piloted here at the Cleveland Clinic.

Craig Mundie lives in a world where most advanced computer concepts are everyday topics. Bill Gates once said I’d hire ten more just like him if I could. On top of that, let me add to the fact that Craig Mundie is one of the few people that can discuss the computer industry with manual laborers like cardiac surgeons, and have them think they almost understand it.

Please welcome Craig Mundie. (Applause.)

CRAIG MUNDIE: Thanks, that’s a fine introduction.

It’s great to be back in Cleveland. As he said, I was born here, although I never really had an opportunity to grow up here. Many of my mother’s family stayed here throughout their lives, and I had a chance to visit with them.

Today, I want to talk a bit about the way I think computing is evolving, and I entitled this talk More Like Us. Computing has grown up over quite a long period of time, and that evolution has been remarkable in terms of the capability that they have acquired. And yet, I think people are going to be surprised in the next decade or two by the kinds of things that we’re going to find the computer capable of doing. Today, everybody now whether they use their personal computer, or their pads, or tablets, or phones, are increasingly aware of the kinds of ease of use that have been brought to the computers through the evolution of the user interface.

In the early days, people wrote programs, typed punch cards, and if you weren’t a programmer you couldn’t get the computer to do much for you. Things got a little better in that we got to the point where people could use generalized tools. In fact, the birth of Microsoft was really around creating things like word processors and spreadsheets, which for the first time allowed people to get the computer to do something that they wanted to do without having to learn to become a computer programmer.

The next big step was the advance toward the graphical user interface. And in that environment, we realized that if people could point and click in a familiar environment, that they would be able to do a lot more with even less learning. And so now we stand at the beginning of a new era in computer interfaces, which we call the natural user interface, and we think of that as sort of the successor or the enhancement of the graphical user interface. So, GUI becomes NUI. And a lot of what I’ll talk about today is what will happen as the interface between man and machine becomes more natural. In essence, more like us.

Recently Microsoft introduced a technology called Kinect, and I’ll talk a little bit more about that as we go along, but it’s the first time where people who have absolutely no concept of dealing with the computer itself are, in essence, interacting with it like it was another person in some sense. And this is the baby steps towards a world where the computers really will integrate, and help us in ways that we don’t quite imagine now.

It wasn’t that many decades ago that the computer was created. Basically, after World War II, the Army needed it, and the ENIAC was created. That begat what we knew as mainframes. Mainframes went on and became miniaturized through electronics to be the minicomputers. That put them into people’s individual offices, if you will, but certainly not in their homes.

Today, we’ve seen that evolve to where servers and personal computers on your desk or in your laptop form are something that people just think of as infrastructure now. There is nothing novel about them. People are often times still surprised when they see that there’s something new it can do, but it’s really become very, very well absorbed into the society. In essence, computing is a bit like electricity and running water now, people just expect that it’s there, and that they get to use it.

But, in fact, computing has evolved in quite dramatic ways, and the kind of things that are happening now is that it’s really getting inside lots and lots of other devices. So, the laptop allowed you to take your desk with you. That wasn’t quite good enough, so we now essentially put your laptop into your tablet or your pad or your phone. But embedded computing has become quite popular and powerful, too.

And so, today contemporary book readers and cameras, even bathroom scales now, are really computerized. I have a picture on here of one that is a company called Withings. It has a scale that you get on it in the morning, it measures your weight, and your body fat percentage, and wirelessly transmits it to your computer which stores it into a cloud service for you, and then applications can plot it, and analyze it, and give you coaching or indications that you may have problems that are emerging. So, these things are becoming very intelligent and, in fact, very connected.

The next thing that’s happening is that in combination with radical changes in sensing technology, the whole world has become a connected sensing environment. And the ability to exchange this information is creating new opportunities for people to get the computer to help them more. So, in a sense, this world where the movement of cars on roadways, the orientation of the device in your hand, all of these things are being sensed without you having to explicitly tell it to do that. And as a result computing just becomes more and more completely invisible.

So, today if you buy a contemporary car it probably has upwards of 50 or 60 microprocessors in it. And the ones that are now becoming all electric, for example, are even more computer controlled in the way that they operate. This is a picture of that scale that sends your morning measurements on. And of course, in places like the Cleveland Clinic and other hospital environments, the advances in diagnostic equipment, largely driven by computational capability, have been quite dramatic. But, now digital photography, even your washing machine, are increasingly computerized, not only to make it easy for them to perform their task, but ultimately to assist in the diagnostic and maintenance activities associated with making them easy to use and easy to maintain.

So, in essence computing has become all around us. And the question now is how will that evolve. One thing I wanted to talk about before we get into that is a bit specifically about how I think IT is going to transform healthcare. We’re here at the Cleveland Clinic, or at least they’re sponsoring this event, and Toby and I have had conversations on and off about how the healthcare environment is going to change. As he indicated in the introduction, I spent most of last year working on a report for the president on advances in how we could think about using information technology to transform healthcare in a more radical way.

Everybody obviously wants better outcomes and lower costs, and that’s true not only in this country, but around the world. And yet there’s really still a big disparity between the way the information technology industry has evolved broadly, particularly in other sectors, and the way that it’s been applied within the healthcare field. And so there are the beginnings of a convergence in these things. But I think it would be an area where thinking of this as a more networked environment, much like we have this network of all of all of the devices and sensors that are part of our daily life and our normal work experience, how do we start to think about the entire universe of healthcare, the hospitals, the doctors, the clinics, the patients themselves, as all being part of this network, bound together by these same type of cloud services as people call them, which is big computing facilities in the sky, or in the backbone of the Internet is where they really are. And where you use these to integrate and learn from the collective amount of information and activity that’s going on.

We do this at scale in other areas. I mean if you pick up the Wall Street Journal, or other articles in recent months you begin to see people are even concerned about the degree to which this type of cross-correlation of information can be done. The search engines, the advertising systems, the retailers, if you go to any of these things on the Internet now you get advertising that’s delivered to you on a personalized basis, based on what you’re doing and the kind of activity you’re performing.

A decision is made to deliver an ad to you in the middle of a Web page. That happened in about 30 thousandths of a second. And so everything that has happened to you in the past has been recorded and is being analyzed in order to try to make this a useful presentation. There’s clearly economic incentive and in sectors like retailing and advertising, those investments have driven this thing forward at an incredible rate.

For some reason that same kind of technology and rate of evolution has not yet found its way into the healthcare environment. And yet it seems quite obvious that by combining some of these technologies that are providing this incredible network, data mining, interactively driven world that you use every day in the rest of your life, why isn’t that coming and being brought to bear more directly? I think it can be and it will be, but it will be a revolution that is required, of sorts, to get people to think about the healthcare ecosystem in that way.

Next I want to move back a little bit to the technology, and talk a bit about some of the things that we’ve been doing over the last few years at Microsoft in our research labs in order to try to get people prepared for radical new ways of interacting with computers. In a sense we want to get computers that can emulate many if not all of the human sensory capabilities. And so I’m going to show you a video in a minute and I’m going to explain a little bit of what you see, and then I’ll highlight some at the end. But, we’ve been looking at ways for people to use cameras, and touch sensors, and things that measure electrical impulses in the body, all different ways of trying to figure out both convenient and interesting and easy ways for people to provide data to, or get data from a computer.

So, let me just run the video. These are just little vignettes of the kind of things that go on and then I’ll explain a few of them.

(Video segment.)

So, if you look at a few of the things that are up on the screen, the one on the far left, the person writing on the paper, there’s actually two people here. One is remote and sees and experiences the same thing as the person at the other end and they can interact together. So, as you saw in the video, you can take the pen and write and the person at the other side can take the pen and write, and the person at the other side can take a pen and write, and essentially their arms and hands are essentially projected in real time. And they can essentially collaborate with you on the same document.

So, today we talk about collaboration as you can make a phone call and talk, you can have a video conference, or other types of interactions, but increasingly we think this sort of interaction at a distance is going to be really important. I generally tend to use the term telepresence as a way to think about what it’s going to be like, but I actually think it will be sort of the next big thing in the tele family. There was the telephone, which collapsed distance for people, but only with the spoken word. Then there was television, which allowed us to do that with images. And I think the next thing that we’re going to see is tele-presence, where more and more you’ll be able to interact with people in a very lifelike and realistic way, that aren’t there.

For those of you that are perhaps science fiction fans, if you watched Star Trek, they used to have the thing called the Holodeck, which was essentially a completely synthetic environment. We know how to build the Holodeck now, and so it isn’t that many years away where you’re going to find it quite natural to interact at a distance.

In fact, one of the things that I think you’ll see quite soon is the ability for people to, at least in small groups, go and have meetings together where none of them are actually physically in the same room. But your ability to look at each other, and talk and communicate is as if you are in the same place. The technology to do that is coming very quickly now, and I think will represent a revolution in what people come to expect in multi-party interaction at a distance.

Kids today are already starting to move in this direction. If you look at things like Xbox and Kinect, they have avatars that they use to represent themselves to communicate with their friends, and to do things. Today, those avatars are just caricatures of you. But at the end of the day, there’s no reason that they can’t become very lifelike. A year or so ago when I used to give a talk like this, I would talk about avatars, and people would say what’s that? Thanks to Jim Cameron and the movie, people don’t ask that question so much anymore, but the idea that you can have some very lifelike representation that you’re essentially projecting yourself through is not really science fiction. And you’re going to see that come much more quickly than many people expect, and it creates a whole new model of interaction.

Other things that are interesting, the one on the right, there’s just a small projector like you might actually have in your phone, or you might wear it as a piece of jewelry. What they’re doing is, they also have a thing like a wristwatch, and the wristwatch is actually able to sense all of the electrical impulses, and actually the motor muscle movements in your arm. And computationally, by sensing them in one place, are able to project where they are happening in your arm. And so this gives the ability to project something in your arm, animate it by just moving your fingers which can be detected, and then you can touch on your arm as a way to select something. So, that’s an example of very intimate binding together of how human physiology works, and how you might interact with a computer.

The one in the center on the bottom is actually the map that comes out of these new three-dimensional cameras. All the cameras you would know today, like these TV cameras, they just take whatever is in front of them, they flatten it into a 2D image, and you get to look at that as a planar image. And yet people see 3D, and a lot of what is important in interacting naturally is in three dimensions. And so the breakthrough this year has been the introduction of these cameras that actually see in three dimensions. And that’s an example of how those cameras work, and if I stand in front of one today, it knows that I’m here, and that the podium is there. And that depth perception turns out to be as important to the computer in the future as it is to people in trying to get things done.

Another big thing, of course, is vision. And some of the things you saw in the demo represent new ways in which we’re trying to give vision to machines. One of the things that’s really unnatural about interacting with computers today, like in a video teleconference, is that the camera is not co-located with the image that you’re looking at. So, you always have this unnatural difference between where your gaze actually is and what the people perceive on the other side, because I’m looking at the person, but the camera is above you.

And it turns out that those subtle cues are very important to humans. And so you really would like a machine to see the same way that you do. That if you look at the person, you look them in the eye, and it looks back. So, one of the demos you saw there was a computer screen, and the computer sees through the screen.

And, in fact, there will be new technology literally within the year where an LCD panel, like you would have for your television at home, will actually not only have each tiny little pixel that makes up the picture there, but each one of those pixels will be a camera that allows the screen to look out. And you’ll be able to basically create imagines that way. And so now, when you look at an image on a screen, you’re looking at it, and it’s looking right back at you. So, many of these things are very important in terms of moving people to comfort in dealing with computers, or dealing with people interacting at a distance.

So, the next thing is this natural user interface, and I like to think of that as computing without the learning curve. Many, many people today get great value and utility out of computers, but they’ve historically required a lot of training and acclimation to really get a lot of value out of them. As we’ve moved to these advanced graphical interfaces, and then what we’ll call the direct manipulation interfaces where you can do things with your fingers, or add voice commands, then the ease is getting a lot better.

In the new phones that just came out, the Microsoft version, you just have a button on the front, and you hold it down and then you can talk to the phone, and give it commands. People keep asking me, well, I hate typing on the little phone, the little keyboards are too small. I said, well, why do you type, you just ask it what you want. And, information act, you can do that more and more. You can tell it to call Toby at work, and it will just dial him out of my contacts. And so more and more the things that have frustrated people in using computers are going to be overcome by making them behave more like we do.

The thing that we’ve been extremely gratified by is the work that we did to create this Kinect sensor that has been sort of one of the hits in the Christmas season recently. And this is one of these special 3D cameras hooked up to a game console. And that coupled with an integral array microphone allows you to just talk to the game console to use it to talk to people that you want to play with who aren’t actually in the room with you, and it allows the computer to see, recognize, and map the skeletal structure of four people simultaneously 30 times a second.

And so, it’s essentially the skeletal angular position of the joints is the data that we give to the person who writes the program, and from that they’re able to make that into whatever interface they want. And so we now have dance titles, and yoga titles, and car racing titles, and all of these things are essentially being done by the creative use of the ability for the machine to see and to represent that in useful ways.

One of the things you saw in the video clip was an early piece of work we did where surgeons were using a prototype of this kind of capability to giver sterile touch-free control of imagery in the operating theater. And as more and more imagery becomes important, and the granularity of it is more and more detailed, you can’t obviously see everything you want to see at once. So, the ability to just make fine adjustments, or flip through things may become an interesting application of this kind of technology even in that medical domain.

An example of how people are thinking about this will be shown in this next video. And I’m going to explain it a little bit. This is sort of work that’s hot off the press. This is done at the University of Washington over the last couple of months. And they took one of these Kinect sensors, and instead of using it in the game environment, they said, let’s explore how we might use this in different kind of medical environments. And so I thought for this audience it might be interesting to show this.

What you have is a guy sitting at a table. And what you’re seeing is the depth map of the table. And so the shadows are where the camera can’t see behind it, it’s occluded by the objects in front of it. Up in the top right corner, you see his colleague, and what he’s actually holding onto is a force feedback system. You know, we built these and have used them in gaming for a long time, so if you’re playing a game, or flying a simulated fighter plane, the joystick pushes back on you. And, in fact, cars and planes and other things use this kind of force feedback system today to give tactile feedback to pilots or drivers in a very natural way.

But this system is a completely general one in that if you hold this object and try to move it, it will respond with force, just as if your hand was bumping into something at the other side. So, what they’ve done is coupled these things together, as you’ll see in a minute, where the room, the object, and even the hand of the person can be seen, and can essentially tactilely be felt. In this case, the resolution is fairly coarse, but their interest in this is trying to understand how could these things be used in robotic surgeries where you’re trying to give some additional sense of feel to the surgeon in an environment where he either is literally not present, or perhaps in these things where the robot is in there and you’re not, how can you regain some of the tactile capability. And what role might vision systems play?

So, today we’re doing this at granularity for people standing at a distance, but the technology and the algorithms to do this could obviously be applied at any scale, very miniaturized, or quite large. So, let me play the video for you.

(Video segment.)

So, the red dot is the distant guy, where his hand is. So, now he’s feeling the object that was placed on the table, and so as he reaches around in space, when the dot bumps into the head model there, he feels it, and he can’t move that force feedback system through it, because it would have to go through the guy’s head to do it. And so, more and more I think there’s going to be interesting ways where we are able to use electronics and mechanical systems to telegraph feeling, physical feeling, haptic sense, even across a distance, as well.

And if I let this thing finish in a minute you’ll see they actually close this segment by doing a handshake, where the guy puts his hand out and the other one can essentially reach out and feel his hand and they can essentially push it back and forth. And it just shows there’s a lot of creativity that’s emerging, once you take some of these core technologies and put them in people’s hands. And the thing that I think is so compelling in the world that we live in today is the Internet has allowed us to bring all these capabilities together to bring literally hundreds of thousands of clever people to bear on these problems, and to do it at a speed that has never really been seen before.

All of this is part of making computers work on your behalf. My view is that up to this point in time computers were largely a tool, and that if you did your apprenticeship and you learned how to operate the tool you could do some incredible things. And the tool started out fairly elementary and has become incredibly sophisticated, and those whose skill level advanced with the computing capability can do some stunning things. But, the bulk of the population is not able to achieve that level of mastery of the technology, and so we’ve got to find a way to get more value out of the computer, without you having to use it as a tool. So, the key to this is essentially to make it work less at your command and more on your behalf.

Our goal, my goal I guess you could say for Microsoft, has been to help move computers to a point where they are great assistants. Those of us who work in jobs that require assistants, whether it’s in an office environment, or people who help you in the operating theater, you realize that the people are not immediately fungible, because they have understanding, they have history and you begin to depend on that. It becomes an integral part of how you function as a team together. And today the computers are becoming really perhaps the most intimate element in understanding what you do, whether it’s at work or at home, whether it’s what you read or play, and do, who your friends are, the things that you these things are all observed now by computers, even at a granularity that is finer than even other people who might live in your household, or work in your office.

So, as all this knowledge is essentially garnered, and collected, and organized, it creates a new database from which to mine capability, and to try to help you get things done. So, I want to show you in the next three little clips here, just very, very simple examples of how we changed our headset in our work from just providing you a tool to providing you something that did more work on your behalf. So, when we created this Bing search system, which is like the Google search system, or the Yahoo! search system, we asked ourselves the question why do people search? They don’t just search for the hell of it. They’re trying to get something done. So, the question is, can you figure out what it is they might be trying to do, and do more of the clicks for them, so that one input, one click, the computer did what you were likely to want to do, or even tried several examples of what you’re trying to do and offers you those as examples.

So, the first one I’m going to show you is, you just type in a word like Denver, and you can see as soon as you put that in it says, well, based on the history of what you’ve been doing perhaps you want to know something about Denver, so here’s a bunch of its statistics. Maybe you want to fly to Denver, so here’s what it would cost to buy an airplane ticket.

If it turned out that’s what you were interested in doing, was traveling to Denver, because you’re not there now, it goes out and it essentially gets all of the fare history, and it essentially plots it as a real-time graph, and it can show you every single day what was the fare day-by-day, and if you want to pick one can you statistically forecast the day that will probably have the lowest fare. And we do all that in 30 milliseconds and we put it on this Web page. So, all you do is essentially say, yes, I did want to travel to Denver, and I’d like the lowest fare, and I can travel on June 5th, and so I’ll click on that one and it will take you there and order your ticket.

So, if you think about what it would have been like, you say, OK, I want to go to Denver, you go and go to the airline site and then you navigate to the reservations and information section, or you go to Expedia, who tells you a lot of different airline sites. Then you’d have to figure out, wow, do I know whether the prices are going to change. You can’t see any of that, because you don’t have that history. So, if you think about how much gets done on your behalf, all in an anticipatory way, you’re beginning to move down the path to where these things are wildly more useful.

Another example here, closer to the medical world, you just type in BMI, or body mass index, it says, okay, well, if you type that in why would you type that in. One of two things, you either don’t know what it is, somebody just told you about it, or you want to know what yours is. So, the very first thing it provides you when you type this thing is the body mass index, a little slider. So, you just adjust the sliders for your height and your weight and it tells you what it is.

So, you didn’t have to go to a lot of different places and try to figure out how would you figure that out, or having read the definition of it in Wikipedia, you have to get out your Excel spreadsheet and then do the arithmetic. It just does it for you. So, more and more we’re trying to get these computer systems to anticipate the kind of things that you would want to do, in essence, it’s like having a great assistant. They know when you ask them something, they take all of the history and what your preferences are, and they factor that into what they do for you. Many times they do it without you even asking.

So, the question is, how do we get the computer that now has all this intimate information and understanding of you and people like you, and use that to essentially proactively help you get things done.

So, the last part here is, this is sort of what the interface looks like for our new phone software. And people can organize this. But, one of the things we found is today people are very, very interested in social networking. So, if you take your phone, historically, and you wanted to go look up if I wanted to see Toby’s Facebook page, I’d get on my phone, I’d go find the Facebook app, then I’d log in to the Facebook app and then I’d type in Toby. And then I’d say I really want to see the pictures he took recently and he posted, and so I’d have to navigate there.

So, if you look at the number of steps you get to just see what did he post recently in his photos it’s quite a bit of work. So, we said, how can we make the phone do that for you all the time? So, in this case you just put in all of the social networks and the people that you care about, and then the phone essentially continuously sucks in all the social networking information for that person. And you can just touch on them in sort of the contact page, and it will show you that stuff. So, you didn’t have to log on, start an app, go searching, put in a name, nothing, it’s all essentially acquired, sorted, pivoted and placed there under the name of the person that you care about.

So, these are just simple examples. The other one I’ll just show you on this phone is we all go to meetings, and if you’re like me most of the time you’re running late. So, when we were doing the calendar for this we asked people, what’s the thing you’d like to have automated most frequently? And they said, sending somebody a message that says I’m late. So, sure enough, that’s what we did, we added a little button at the bottom, and if you open up your calendar appointment, and you’re on the fly, and you just hit that button, it sends the people who are in the meeting a little message that says, I’m running late. So, some of these things are very simple, and others are more complex.

But if you say, well, if you wanted to send somebody a message that you were running late, how would you do it, would you send them all a text message, would some get e-mail, some get text? Here, basically the system just figures it all out and notifies them, and you don’t have to address it. It knows who is in the meeting, it’s in your calendar appointment. So, many of the things the computer was capable of doing only if you were willing to drive it to do them are all now things that are going to get done largely on your behalf.

So, think about this looking forward, and how the computer might work on our behalf tomorrow, I’m going to show you a few examples of how we think that this is going to come together. A few years ago in our Cambridge lab in the U.K., we developed this camera, we called is “sense-cam” because we realized that the computer not only wanted to see, but for many people if the computer could record a history of the things that you were doing, it might be very valuable in certain instances. And in particular, the one that the researchers were interested in were people in law enforcement, and another group was Alzheimer’s patients.

If you look today, most law enforcement if they have a patrol car, they now put cameras in them, and the cameras are there to record everything that’s happening, so that if something bad happens, you can go back and look at the video and say, hey, what happened. But for people on foot patrols, and other things, they don’t have a car, they don’t want to lug the thing around. So, we said, we’ll make this camera as a pendant. But in order to make it practical, you couldn’t take video of the whole thing. So, the computer had to be embedded in the camera so that it was smart enough to decide when something was interesting. And then it would just take a picture of it. So, when scenes changed, when you went through a doorway, when a person came in front of you, then it would take their picture and annotate it about where you were and the time it happened. And so you got a staccato pictorial representation of your day.

One of the most stunning results from this was when they used it with Alzheimer’s patients, because people could sit down at the end of the day, and sit in front of the computer, and they could review the day. And they could mine things like all the people you met. And so the computer could begin to note who you saw frequently, it could remind you who people were that you were forgetting. And they actually found that early stage Alzheimer’s patients were able to have much better recall because they were able, at the end of the day, to review what had actually happened in the day. And that review process somehow was an assistant in improving their recall capabilities. Normally the day goes by, and they forget, the short-term memory just goes away. But somehow the ability to review the important events, and the things you encounter in the day turned out to make it move more toward long-term memory, and therefore their function was improved.

Another example, and you’ll see a little bit more of this in a minute, is where we really are trying to get the computer to present itself as an avatar. So, not just in the case where people have an avatar that represents them, but if you want it to be more natural, then why doesn’t the computer just present itself as an avatar. And so that can be increasingly lifelike, as you’ve seen in the movies, and gives the computer a persona that is easier for people to deal with. And I’ll show you a medical application that we’ve been working on of that.

And then, finally, when you start to integrate all these things together where gesture and display and seeing things all come together, you begin to have the computer really become a lot more of an assistant. So, this is an example of a guy who is a chef. He wants to make a menu. He’s speaking the words that he wants to be annotated on that thing. He can use gestures to alter it, because that’s really not a blackboard; it’s a display. Each thing that he’s doing is being converted into shopping lists, and other things that are required for the restaurant on his computer. He can check those things, and when he’s done, just says place the order and it’s all done. So, instead of having to make his menu, and then figure out what the requirements are, and what he’s got, and how to place an order, in a sense the computer software can basically link all that together and automate that entire task.

So, more and more what we see is this new baseline of computing that comes when you couple together the sort of cloud of intelligent devices and sensors, and all the data that they accumulate. You put it in a place where it can be organized. And then you put this increasingly powerful capability of computing in all manner of devices that are close to the people. And therefore, the computing is sort of ever-present, not in the case of a device where you have to go and call it the computer, but basically you just encounter computing in the course of doing whatever it is you’re trying to get done. So, whether it’s reading a book, or communicating with other people, trying to solve problems, or just drive your car, increasingly these things are all being integrated together in real-time.

One of the things that I think is really telling is to look at what’s happening in broadband connectivity. So, there are two graphs here. One on the left is essentially, the blue is existing telephone lines. The line that people used to run to your house. The green is essentially broadband connectivity, which could either come over those telephone lines, or over your cable. And the yellow is mobile broadband. And you can see that cellular telephony, the left is developed countries, and the right are the developing regions in the world. And this, as percentages, is essentially the number of these things that exist per hundred inhabitants in the country. And so what you can see is in the developed world there was quite a high penetration of land line phones. And those things are even being slightly eroded by the arrival of mobile capability. And as mobile has become more broadband, that trend will accelerate.

But the developing countries, you can see, had percentage-wise a much less penetration of traditional phones, and it wasn’t growing very fast. But it’s been completely overtaken by the use of both mobile telephony, and now fixed broadband and mobile broadband are both growing quite dramatically. This is going to be very important in terms of being able to take these new technologies and apply them to problems, whether they’re business problems, education problems, or healthcare problems in the developing parts of the world.

One of the things that we looked at to put a fine point on this was the density of doctors. And so here we took the map of the world and we color-coded it where this is a measure of the number of physicians in the country per 10,000 of population. So red says there’s zero to 10 docs per 10,000; light green, 10 to 35; next one, 35 to 50; and dark green, 50 to 70.

And it’s kind of interesting in that some of the things you see are misleading, so you look at Russia, and you say, wow, Russia is really pretty good. Well, they don’t have very many people, but they educated quite a few doctors. And so the littlest countries that have little populations, but are quite advanced from a societal point of view, they do pretty well. But if you look at this big band in Sub-Saharan Africa, and sort of to the right and left of India in the Asian subcontinent there, you begin to realize that we have a huge percentage of the world’s population that just have essentially a desperate need to have more doctors and, of course, all the attendant facilities.

But at a time where you look at even a country like the United States, arguably the richest country in the world, and we’re still struggling and arguing about how we’re going to get healthcare for all the Americans. And we represent only about 4 percent of the world’s population. There’s no chance that the rich world model of healthcare as we know it today is going to get expanded to the six-and-a-half billion people on the planet. And we actually know that in the next 30 years, that number will go asymptotic at about nine-and-a-half billion people. And so, something has to give. We can’t really afford it here, and we certainly, as a society globally, don’t have a strategy today to make this work more globally.

In fact, if you look at this, and you said, I just want to get enough doctors such that every one of the red countries on this map gets up to the lowest level of green, which is just 10 docs per thousand, or one per thousand of population. We’d need today 1.8 million more doctors. And, I think that’s going to be hard to come by. And certainly it would be hard to make it all work economically if it was operating under the models and with the level of care that we deliver today.

And the question is, how do you deal with this kind of problem. So, this is where I think all of these technologies have the promise to come together, and be an amplification factor for the skilled, highly trained people, whether they’re teachers or doctors or other high skilled people in order to be able to scale up our capabilities in a more cost-effective way on a planet that’s going to continue to see an increasing population.

So, the last thing I want to show you is a video. This is an actual system that we put together. It’s not deployed. It’s not a product. But, what we did is we took many of the technologies that I showed you earlier here, where we’re taking machine vision capabilities, machine learning capabilities, Bayesian inference systems, avatar generation from computer graphics, and we stitched it all together to build an experimental system and the goal was could we build essentially a triage nurse, if you will, that was just a computer, but had access to and knowledge of the necessary information to triage the basic childhood diseases of the poor.

And so, we worked with some people and put that together and what you see on the right here is sort of a map of many of the different elements that are being considered as this diagnosis takes place. The goal was, could we put a computer in a village and have somebody come in and just talk to the computer like they were talking to a physician’s assistant, or a nurse, explain the symptoms, and have the computer basically make a reasonable assessment as to the urgency of care requirement, and make a recommendation.

And as you’ll see, it asks questions, it does sort of a natural interaction between the mother and the child, and this is all still fairly crude, but I think it shows a lot of promise, and it’s the kind of thing that personally excites me in terms of thinking that these technologies are going to come together, not just to make the standard of care at places like the Cleveland Clinic ultimately higher and higher, but ultimately to figure out how we can take that standard of care and begin to have it trickle out to a much, much broader population around the world. So, you’ll see in this video how we think some of this might come together.

(Video segment.)

So, we’ve got to work on the bedside manner a little bit, but you get the idea. It’s interesting in that despite the progress we’ve made in computing, and all these sensing capabilities, the quality of what we can do here is computationally limited. So, the reason we keep working on how to make these computers faster and faster, and cheaper and cheaper, is because it isn’t actually hard for us to find problems that we can’t compute yet, at least not at the rate that we want to compute them in order to have a completely natural interaction. But, I think that computing is one of the rare things that has continued to grow at an exponential rate over a long period of time. That is very rare, and yet I think it’s something that we are going to continue to be able to drive forward at that exponential rate for some period of time yet to come.

And as a result you can see that we’re really at this transition point where the computer has become wildly powerful enough to do the kind of old tool-oriented things that we asked it to do. If I told you I could make your Word, Excel, or PowerPoint presentation 100 times faster would you care? Probably not, it’s more than adequate for that kind of tool task. But, if I tell you, I want to be able to do this, and do it at much higher quality, perhaps with an avatar that is almost indistinguishable from a video of another person, then that’s going to take a lot more computing. We see this happening in special systems. If you look at the quality of games, quality of figures that are produced in real time in gaming, or in the movies these days, you realize we can make these things incredibly realistic.

Many people don’t realize it, but many of the second-tier actors that you see in films these days are all computer generated. There weren’t any people there, all right. And so we’re getting to the point where these transitions are going to take place, and I’m very enthusiastic about all of this. I don’t know how this medical thing will play out per se. It’s the kind of thing where, by making the technology available, my hope is that we’re going to find a lot more clever people, whether it’s through the innovation programs at places like the Cleveland Clinic, or the world’s universities, there’s lots of smart people and when we put this kind of technology in their hands we get some incredible things to happen.

Today we’re starting to see that happen in the gaming environment, where for the first time we’ve really given the computer vision at a practical level. To be able to do this kind of three-dimensional seeing for $149, quantity one at Best Buy, is a revolution compared to what people have done in the past. A 3D camera used in a professional environment to get that kind of resolution in commercial terms historically has cost about $40,000 or $50,000. So, now you can buy one for $149 there’s just an explosion in the number of people who are getting these things and taking them and trying to do interesting things with them. That’s the kind of thing that the technology industry is really good at driving and I’m very proud to be a part of that.

So, I think we are at a point where computers are going to be more like us, and from that we’re going to open up a completely new realm of what the computer can do for us and with us, and I think it’s an exciting future. So, with that let me stop and we have time for some questions and answers. There’s microphones at the two aisles and I’m happy to take any question on any subject. Does anybody want to start? (Applause.)

There’s a gentleman ready here, let’s start on this side.

QUESTION: Thank you. Regarding going back to wide area networks, and access across all kinds of different devices, I wanted to ask you about a couple of areas you didn’t touch upon. One is, especially for medical IT, one is the need for open standards for data. We kind of have that with communication protocols, not necessarily for data. So, in the past we’ve had this siloization between different systems, so now that and a second question has to do with privacy and security.

CRAIG MUNDIE: Let me do them one at a time.

QUESTION: OK.

CRAIG MUNDIE: I’ll make a comment about that then we’ll go on to the next one. One of the big things that has happened in other domains where information technology has been present is that we sort of gave up on the idea of coming up with rigid standards on an a priori basis to define everything. And we realized that it was a lot more interesting to come up with machine-readable languages that would allow us to describe the data, what we call metadata, and that having a meta-description is far more useful than having a fixed standard, because it allows much more rapid evolution.

So, if you look at how the Internet works today, nobody created a standard for the world’s Web pages. What we did is we created a very small machine-readable language by which people can describe what a Web page has. And that’s the thing that allowed us to build these giant indexing systems, and to bring all the world’s Web pages together.

As Toby mentioned, I worked on this PCAST Health IT Report, which was released last month. And if you are interested in this topic, I would encourage you to look at it because it makes three specific recommendations, broadly. One is that the industry, the medical community, should move more towards a focus on meta-description of the data with a common XML-derived language that would then sort of ingest all the work that historically was done on standards, but wouldn’t continue this focus on a priori standardization as the model.

The second is that then there should be some type of access system that’s built that’s more like a big indexing or Internet search environment. It could be built in modular or federated ways, but that appears completely possible now when you look at what’s happening in other domains. And then the last, which kind of touches on the privacy question, was the other thing that you can embed in the meta data are essentially the controls on privacy. And so, instead of having some sweeping thing like HIPAA, which basically over constrains secondary uses, we think it’s possible to use meta data to describe controls on privacy that would allow us to have our cake and eat it too relative to the use of data.

QUESTION: Thank you, Mr. Mundie. I had a similar question. Given the explosion of how we’re being connected as a society, what are you, or what are Microsoft’s thoughts on privacy and security? It’s one thing we hear a lot from our patients.

CRAIG MUNDIE: As Toby said, one of the things I started at Microsoft with Bill Gates about ten years ago was this Trustworthy Computing Initiative, in part because we recognized even a decade ago that as the computer became sort of intimately involved with us, whether in our work life or our personal lives, that people would ultimately have to trust them, and if they didn’t trust them, they’d start to reject the engagement.

And so, we spent a lot of time thinking about this question. The pillars of trust turn out to primarily be security, privacy and interoperability. And so, I think that our focus, and that which we advocate to other people in this field and others, is that you need to basically have a great deal of focus on the security aspects of this, because loss of privacy, for example, through a breach of data operationally is a big problem. But, more importantly, people have to have some understanding of what data they have given, what rights people who acquire it have, and what they intend to do with it. And that’s, of course, one of the big issues in the industry today. One of the reasons it’s such a big issue is that so much of what people now take for granted and use on the Internet is all ad supported. They don’t pay any fee. They don’t pay a subscription. It’s sort of like free.

Well, it isn’t really free. It costs a lot of money to do these things. So, who is paying for it? Well, advertisers are paying for it. The way to make the advertising more and more effective is to basically mine all the data. That becomes potentially a lot more invasive relative to privacy. So, it isn’t just the clinical and healthcare world that’s starting to bump into these issues, but the rest of the world is, too.

In the, I’ll say, non-clinical, or non-medical area, there is a lot of work being done to create sort of self-regulatory models among the advertising community, and the people that operate these big Web services so that people would have a better understanding about the data and its use and controls on it. I actually think in some sense that the health community has one thing going for it in this regard, which is that it doesn’t have to operate in the open Internet environment. And, therefore, we have a vehicle in this country, and in others, to establish a regulatory framework, I think supplemented by a different technological approach that will allow us to have much stronger guarantees around privacy than that which we actually have today in the paper systems. So, I’m an optimist about out ability to solve that problem.

Yes, sir.

QUESTION: In your talk, you mentioned much of what we do moving to the cloud a lot of content providers have done that as well, like Netflix, and Hulu, we’re using a lot of bandwidth to do that. And the more and more things that go in the cloud, the more and more we want to use the cloud. I’m concerned that we’re moving towards a perfect storm, because at the same time, the big telcos, like Time-Warner, are threatening to charge us by how much bandwidth we use per month. So, something is going to have to give. They’re either going to be charging us astronomical rates for all of the cloud that we’re using, or things are going to have to come off the cloud. Is that how you see it?

CRAIG MUNDIE: Not exactly. I think your concern is appropriate. There are really two issues at play here. And I’ll talk about the United States, because actually this is different in different countries, because different countries have chosen different economic and regulatory models for this. You’ve probably read a lot in the papers, in fact, just recently the FCC put out some rules on what’s called Net Neutrality. So, the first question that the society has to deal with is whether or not any given packet is worth more than any other packet. And if you’re a network operator, and you own a private asset, in a sense, you’d like to charge a rent for the use of that facility. And in the case of these communications capabilities, it’s actually possible to distinguish that says, look, if that packet is going back to the Cleveland Clinic, you must be working. That must be more valuable, and so let’s charge you more for that packet than the next. We’ve never had that kind of tiered pricing, although a number of the network operators in years gone by did experiment with that kind of tiered offer.

At the end of the day, you’re right, all of this connectivity and application use is going to drive more demand. And you have two choices, you’re either going to have to say, we’re going to regulate those facilities in order to control prices. I’ll point out in the cable industry, for example, there’s been a long period of time where they regulated cable pricing, because when it became the predominant way television was distributed they got to say, we’ll just charge whatever we want, and we’ll raise the price all the time. Regulators intervened. Regulators act on behalf of the society. If this society, or any in the world, in my view, is going to become as dependent on this aggregate capability as it is on electricity and running water, and other things, then through its legislation, it will ultimately decide how it wants to put controls on that.

In the United States, we made a decision a long time ago that the government was not going to subsidize those networks. So, they were all bought and paid for by private people in the financial markets. And so, in a sense, it’s a delicate balance. Other countries have chosen different ways. They say, there’s one national carrier, and therefore we control it, and we’re not as interested in profit making. So, each society, I think, is going to have make their own decisions about the right balance of incentive and regulation.

What we’ve banked on in the United States is competition. And if anything, we are starting to see quite a bit of competition between the cable companies and the phone companies. The one thing I don’t think is going to happen is, wireless is not going to be a big savior by and large in terms of a lot of new capability for broadband, but it’s going to provide the mobile capability. But I do think that the society will find a balance, and it will either get it through regulation, or it will get it through incentives that provide competition, or the incentive for novel ways to bring new networks forward. And I think those may be possible.

And so, I think we’ll see some rough patches, but I don’t have a permanent fear that we’re going to find ourselves priced out of using these facilities.

Yes, sir.

QUESTION: I was wondering about the user interface. I saw a lot of innovative things up there. One thing I didn’t see was implants, to consider your Star Trek analogy to the Borg, are you looking at that at all?

CRAIG MUNDIE: We’ve done only a limited amount of that. We’re primarily computer scientists, and not doctors. And so other than in partnership with other people, it’s difficult for us to do a lot of the things that are physiologically invasive. There’s certainly lots of people in the universities that are exploring those things, and we work with them. But we don’t do that much ourselves.

We have done a number of things, like I showed in that video, where the guy was essentially tapping on the arm. We’ve been focusing a lot on what we call non-invasive sensing. For example, if you had a pair of glasses, and the temples were somehow electrical sensors, would you be able to decode enough about the rough brainwaves to do things. And we have done some experiments like that. But I don’t hold any near-term process where this non-invasive stuff is going to be good other than for this coarse granularity of can I figure out where I touched my arm by looking at the reflection of the waves in the muscle tissue, which is what we do.

We put a little piezoelectric lever in a thing under your watch band and a little radio and basically the motion of that produces it’s like a seismic wave, you know, except instead of doing it on the earth you’re doing it on your arm, and then you do the same kind of processing and from that you can say, yes, he touched there. So, we’ve been focused more on the non-invasive stuff.

Yes, ma’am.

QUESTION: Thank you for your very exciting talk. Do you foresee Microsoft making a major product line out of health business?

CRAIG MUNDIE: Out of which?

QUESTION: Out of health business?

CRAIG MUNDIE: I’m trying hard. We started a group at Microsoft called the Health Solutions Group, run by Peter Neupert, who works for me, five years ago. And we’ve been focused on trying to produce a special kind of software that deals with essentially how to aggregate all the data that exists in the clinical environment. And we’ve made some good progress and a number of major institutions have acquired that product from us. And I think it leads in a positive direction. The other half that we focused on was we believe that the patient, the consumer, is essentially ultimately going to have to have access to the data, too. So, we created a thing called HealthVault, which is essentially a safety deposit box on the Internet, where you can have all your medical data deposited.

The idea there in my mind is that there’s going to be two kinds of data. There’s going to be what you call your episodic clinical record, that comes from the episodes of care, and that’s all that’s out there really today, and then there’s what I’ll call your continuous personal record, which is everything that can be sensed when you’re not in an episode of care, if you will. And by combining those things you actually have a whole new set of data.

So, for example, the scale I talked about is just one example of a thing where you get on it every morning, it takes your weight, you don’t think about it, you don’t have to do anything, you don’t have to go write it down or type it in, it just ends up in HealthVault. But, it’s interesting, somebody told me, and I’m way over my head here, obviously, medically, so you can correct me. So, he said, are these measurements really interesting? And one of the examples they gave me was they said, if somebody had a heart attack, I guess they’re susceptible not only to a second heart attack, but you get this movement toward I guess congestive heart failure. And one of the symptoms of that is that you get a fairly rapid elevation in fluid retention.

These scales, they measure that every single day. So, if you had a heart attack and you had one of these scales, and you have a program that runs in HealthVault, what does it do? Every morning you get on the scale, it checks that out, pattern matches it and sends you a little text message on your phone that says, you might want to stop at the hospital on the way today, because your fluid level is going up. And trust me, it’s a lot better to stop today than tomorrow.

So, I think what’s going to happen is we’re going to get a huge amount of processing of this data, and whether it’s something like that where you have a specific domain knowledge that you’re trying to apply, or you’re just trying to coach people in other matters related to diet or health, that’s here all this software is going to come into play, once you have a data platform and everybody can get at it. So, that’s where we focused at Microsoft was on creating the two data platforms, one for the clinical environment, and one for the consumer.

Yes, sir.

QUESTION: My question is, obviously, we live in a country of immigrants, and a lot of the time a doctor and a patient don’t speak the same language. Have you guys done any research in overcoming that language barrier?

CRAIG MUNDIE: That’s actually one of our most active areas of research, and not just because of the problem you described here. The world is sort of globally connected now and people want to do business all over the world all the time and language is always a problem. I actually every year do a technology demo for some journalists, and last year, I guess about 10 months ago, one of the things we showed them was a prototype of what we called the real-time translating telephone. So, we actually had two people, one here and one there, they basically had a phone. This guy spoke in German. This guy spoke in English. And when this guy spoke German, this guy heard English, when this guy spoke English, this guy heard German. And it happens in real time.

And so we’re pretty close. It’s another one of those examples where we’re not quite ready for primetime, but the ability to do probably 40-by-40 real-time language translation I predict will be commercially available within the decade. So, any of 40 languages, real-time translated into any of 40 other languages, so you can essentially have the real-time translator droid.

So, with that I think there’s no more questions. I want to thank you for your attention, and thank you, Toby, for having me come. (Applause.)

END

Executive Officers

Press Contact

Craig Mundie: Cleveland Clinic

Related