Craig Mundie: Emerging Technologies Conference, MIT

Remarks by Craig Mundie, Chief Research & Strategy Officer, Microsoft
Massachusetts Institute of Technology (MIT)
Cambridge, Mass.
Sept. 25, 2008

CRAIG MUNDIE: Good morning, everyone. I don’t have that much to do now that Jason said it all in his opening remarks. In the 40 minutes or so we have together this morning, I want to show you a little bit about how we’ve been rethinking computing, and how that will evolve to change not only the way those of us that have computers today use them, but how I think it becomes the basis of getting a lot more people in the world to benefit from access to these key technologies.

It’s also important to realize that even in an audience of people who have the backgrounds you do and the level of innovation that we see reflected in the TR-35 award winners, for example, how the things that really change lives broadly, they take a long time, in some terms, anyway. If you look back at the beginning of Microsoft, the things that we’re the best known for — Windows and Office — those products took us more than a decade in each case, and some are before nine releases of the products before they actually displaced in the market, by volume, either our products or the other company’s products that they were the successors to. So, it’s very important to realize that if you want to sustain this type of innovation over a long period of time, you have to make investments that have to be pipelined out over time, and without that, I think you both lose your ability to disrupt your own environment, and you certainly lose your ability to respond to the innovation that happens around you.

So Microsoft has, for more than 15 years, invested in a pure research function which gives us that shock absorbing capability and the basis of moving things forward at an appropriate point. We’ve really been through several cycles in this process. People today now look at Windows live, the Xbox, and things like that, and say okay, these things are real on a global basis and we understand a lot about them. The mobile and embedded work that we’ve been doing, this evolution to bringing a service component to all the products, those are out there, visible. The AutoPC, many of the new Office Live things, are really about to be launched.

My own job, as Jason said, in splitting Bill’s role, historically, between Ray Ozzie and I, is to deal with the really long cycle issues. So, most of the stuff I do is sort of in the three- to 20-year time horizon, and the projects in the leftmost column here are the ones that I am most directly involved with personally, and trying to drive the company towards. So, we’ve got new business divisions in education and health care, a whole new division that’s focused on technology applications for the global emerging middle class. And we are moving ahead in technical computing, new work in robotics, and a number of other activities. So, I’m going to show how we see some of these things coming together to create an interesting environment going forward.

The Evolution of the Computing Industry

The computer industry really operates, ever since the mainframe days, in a fairly long by cyclical process, where we move in somewhat jerky steps, in about a 15-year cycle, from one platform to the next. In the background, of course, the process of invention goes on in a steady basis, but there’s so much hysteresis in a system of getting everybody to write programs and adopt these technologies that you can’t really insert the technologies continuously. You have to ultimately accumulate them, and then you make another big leap forward.

If you look at the last few generations of computing, there was sort of the early personal computer era, the DOS era, where we began to move from the mini-computer and mainframe eras before it. I contend that each of these lurches is driven not only by the coming together of a new set of technologies, but their presentation to the marketplace, in the form of some type of killer application or experience, which promotes a grassroots adoption of these new technologies.

In our own case, the business franchise that we think of today as Office and Windows, really was established on the back of spreadsheets and word processors, neither of which were by themselves a new capability, but blending them together with an operating system on the PC that was Windows or graphically user interfaced was an important combination, and that combination created the personal computing platform as we even today still know it.

Another platform began to emerge about a decade or more later, which we now think of as the Internet, or the Web. I contend it also was driven into broad adoption by two killer applications: e-mail clients and the Web browser, and that had so much ubiquitous appeal that it got the Web established, and once that happened, more and more people were able to start to build on that. But I think the big question now — you could say, okay, the Web, we’ve been in that component of this now for more than a decade, in a big way, and so the real question is what happens next?

So clearly, sort of on the blue line, we can continue to see the continuous insertion of new technologies — the arrival of parallel programming and the underlying hardware to drive these high scale, high core count microprocessors, these live platforms that we and other people are building as essentially an operating system service set that exists in the Cloud, as we now call it. The perfection, to some degree, of many of the components of a natural user interface that make interacting with computers more like interacting with people, completely new technologies, like robotics. Clearly, robots have been around for awhile in dedicated applications like factory floor automation, but if you think of robotics more broadly, it’s certainly an emerging capability.

The Next User Engagement Model

So the real question is not what are the technologies that are going to take us to the next level, but in a sense, what is the killer application, and what is the user engagement model that is going to take us there, or drive this adoption. No one really has ever been able to predict exactly when these things happen, or exactly what those applications are going to be, and in fact it’s the innovation that either comes from a company like Microsoft, or perhaps from start-up companies around the world that ultimately brings these things to fruition. But certainly a company like ours has to always be very, very actively on the lookout for how we can bring these technologies together — those that we invent ourselves, those that the world brings to our doorstep, and move this platform along.

So today, I think this user engagement model is changing. We’ve gone from the desktop to the Internet, we’ve inserted more rich media, we’ve got sort of computer-mediated communication. Search as a navigational metaphor has become extremely popular, and I think that we can now see the outlines of at least in the user engagement model, what are the key components of this natural user interface, or NUI, if you will, to replace the GUI.

It’s going to be more context-centered. It will have more a sense of the environment in which the computing and the interaction takes place. There will be many more surfaces, not just the one that you historically thought of as the dedicated computer display surface that the interaction will take place on, and I think it will be a much more immersive environment. It will be more diffused into the environment around us, and not viewed as discretely as we have seen the traditional desktop computer, or the specific computing devices that we — we call them computers. And I think more and more, most of the computers in our life, we won’t address them as computers, and you can see that today with intelligence being embedded in phones, cars, game machines, and virtually all types of electronic devices.

So if that’s the model of the engagement, then what are the attributes of the applications that we’re going to have to have in this environment? So I might call this next era spatial computing, where the environment in which we operate, the world in which we want to get things done, has to have a representation in the computer environment, with a level of fidelity that will make that a big part of how we get more people involved and give them an easy way to do it.

To do this, we’re going to have to leverage the arrival of the many-core heterogeneous microprocessor architectures, along with the communications in the wide area environment and these massive sort of exo-scale computing facilities that we’ll put in the Cloud. They’ll have to be seamlessly connected. They’ll have to be more fully utilized than the machines that we all have today. The applications that we’ve designed in the past have been really targeted at being responsive to a user’s interaction. As a result, we actually waste a lot of the computing cycles that are available. We really haven’t had a class of applications that demanded that type of continuous computing process.

Sensors are becoming extremely cheap, and as a result, part of the context will be continuously monitored and developed through a variety of sensing devices, and I think more and more, the computing will be model-based as opposed to very discrete applications that were developed by each individual programmer. If we do that, I think that we’ll see spatial computing will be more personalized, that speech, vision, gestures, will become a very well accepted part of interacting with the computer system, and in a sense it will be more humanistic in the way that we do that. The emersion will come from an evolution in the display technologies, both perhaps itself moving to 3D capability in display, not just the projection on a 2D surface of a 3D model-based environment, and I think it will be a lot more adaptive to the needs of the people that want to use it.

Demos: Evolving the Programming Model for the Parallel Computing Environment

Microsoft Robotics Studio

So let me use three demonstrations now to take you through the evolution of the programming model for this highly parallel computing environment, and a way to see how the parts start to come together, in order to create this environment. Two years ago, we released a new product called Microsoft Robotics Studio, essentially a development kit for people doing robotics. So, this is — one of my colleagues backstage is actually taking this. It’s a visual programming environment, much like people use Visual Studio to program in C, or other traditional programming languages. But this is essentially a visual programming environment. It’s a drag-and-drop one where all of the components are completely composeable. This is a property that we don’t actually have in most of the world’s software today, certainly not the software that’s been written for distributed or concurrent environments, because we have old languages and lock-based models of synchronization, neither of which allow large scale compositions.

So I think what we’re going to build in the future are higher level programming abstractions. Of course, that’s what the business of software has been for decades, is to incrementally increase the abstraction level, and we now see an opportunity to move it up again. It’s extremely important to do this increase of the power of each generation of the software, in order to get more people to be able to participate. If you think back to the era of Basic, a very simple programming language, many people felt they could use the language, but to do things that were really complex in their presentation, it was too weak a tool.

So Microsoft invented Visual Basic, where we added some templatized drag-and-drop graphical user interface components, so that that difficult part of presenting or interacting with the application became very, very simple for people, and therefore the language only had to be used for the actual logic that you wanted to add. That single change moved the world of Basic programming from an academic exercise, or very few people in the business world using it, to one of the world’s most popular programming languages. Today, there are literally hundreds of millions of applications in the world that were developed using this Visual Basic.

If you move this up a complete click in the abstraction level, you get to the kind of thing you see in this Robotics Studio product, where you have whole subsystems. Each of them could be built at an arbitrary level of complexity. It’s a lock-free programming environment, and yet the ability to drag and drop things together becomes very, very potent. So, while we were talking, what Tim did backstage, as you saw, was he wired up essentially this Xbox controller as a component to a very simple two-wheel robot in that environment, and if I can get close enough where we can hook up, the — so I’m just going to take the joysticks in this, and I can essentially drive this little robot around. So, he just built this application.

The product actually has three parts. It has the execution environment, which allows this highly concurrent model of program development interaction. It has the visual programming model that people can use to wire up this interaction, which is a lot closer to the way they think about what the robot should do, and then it has a full 3D physics model that we can use to actually see what the robot does.

So debugging this environment is done in a full physics-based simulator. So, if I drive the — I’m going to get back here where the radio works — if I drive this forward and drive it into some of these objects, you can see that that little red line is actually an infrared laser range finder where the robot is seeing the things that is in front of it. So, I can drive around and run into the balls and knock them loose, and this is, you could say, a relatively simple thing. But if you think back a few years and said you wanted to build this application, you know, it would take very, very sophisticated people quite a lot of programming to be able to put such a thing together. So, we think we can take many, many more people and bring them into an environment where these kind of robotic applications can come together.

But when I think of robotics, I don’t just — I think of it as really two parts. There’s the brain part, and the brawn part, and in the past, we largely thought of the robotics for their brawn, but I actually think that we’re going to find that the model of building these things is as useful in sort of the brain sense as the brawn sense, and how we might bring these things together.

So actually what we have on stage with me is actually a little product called iRobi.

iROBI: Hello, Emerging Technology Conference.

Demo: Microsoft Photosynth

CRAIG MUNDIE: Good morning, iRobi. This is actually a product comes out of Korea, and it actually is built and programmed using this Robotics Studio. But I wanted to show you the kinds of perhaps innovative applications that people will use these to do, to build this new immersive environment. So, last night, after you all left, we had the old iRobi here go run around the auditorium and the spaces outside, and used his vision system as an input to the Photosynth process or technology. Blaise Aguera y Arcas, who was one of the award winners here yesterday from Microsoft, developed this technology a few years ago, and it’s become more and more popular.

Photosynth is a capability where we can take images, 2D images, and without any prearrangement, analyze them and piece them together in order to create a 3D model of a particular environment.

iROBI: I have finished Photosynthing the area.

CRAIG MUNDIE: Thank you. Why don’t you show us what you found last night as you walked around? So basically, here’s an image collection that came as the robot wandered around in the hallway outside this auditorium. So, each of these images was a 2D view, but taken together, we analyzed them. So, what we’ll show you just briefly is when we take the pictures, we determine where the camera angles are. We can develop a 3D geometry. So, this is actually a point cloud. It’s not — it’s a little difficult to see in this light, which basically shows all of the different specific points in the images that we know are part of a 3D model. So, in essence, by looking from all these different angles, you can synthesize the geometry behind it.

Then we can take the photos and paste them onto this model. In one of the next demos I’m going to show you why I think this process, whether done by robotic surveyors or everybody’s daily interaction, using their cameras and cell phones, will ultimately allow us to build a composite model of the real world. So, many people are familiar with the Second Life, which was a synthetic 3D world that people became quite enamored with. But our view was that was going to be a fairly limited audience who were willing to deal with that and the construction of avatars and operating in that virtual space. But we think the idea of having a parallel universe, essentially a cyberspace representation of the physical world, is going to be an important metaphorical change in the way that we will interact with computers, and so we want to create an environment where everybody can contribute to that on a continuous basis and drive this forward. So, in a minute I’ll show you a demo where we’re going to use that kind of capability to a much greater degree.

Demo: Robotic Receptionist

So the next thing I want to show you — thanks, iRobi — is the program that we call the robotic receptionist. This has been developed in Microsoft research by Eric Horvitz’s group. Eric is here at the conference with us today, and it brings together and is today my sort of favorite example of what it might be like to have a really natural user experience in dealing with a computer system. Just because of the complexity of setting this up, I’m going to show you a video of this system in operation. In a few months we actually hope to be able to take a version of this and put it in the actual reception areas of a number of the buildings at Microsoft, to test it in a real-world deployment environment.

But it really brings together a number of things that I think are quite compelling. So, what I’m going to do, using this static slide, is explain to you what you’re going to see in the video, because it will go by in 90 seconds or so, and if you don’t know what you’re looking at, it would be difficult perhaps to appreciate how much is going on in this apparently simple demonstration.

The large pane is actually what the avatar of the robot receptionist is seeing, and the things that are annotated on this, so you can understand what’s going on, the yellow boxes are where the computer vision system has detected that there are people present, and where their heads are, and what the orientation of the head is — the red dot, which you can see on the person on the left right now, is actually the eye point gaze of the avatar. So, that’s the person which the avatar is seeking to maintain eye contact with, just as we would in a human-to-human interaction. The green circles at the bottom represent the array microphone through which the avatar listens to what’s going on in the room, and the solid green dot, which moves around, is its localization as to who is speaking, where is the voice actually emanating from.

On the top left, you see a standards Windows performance monitor. This is a eight core, high performance machine, and what’s interesting about it is while this avatar is idling — essentially nothing is happening — it uses continuously about 40 percent of an eight core machine, of all eight cores. If you contrast that to almost any application that we would have today on a personal computer, those things consume in a bursty way a few percent of a machine, and on average over time, maybe one or two percent of a machine, just to do the background processing.

If you look at this, you’ll see we’ve really just barely turned it on. The avatar is a very limited facial representation. The features are not elegantly computed. The voice processing is effective both on speech and recognition, but certainly not the best that we know how to do, and yet you can see it just barely turned on. We’re really starting to eat up a lot of compute cycles. In a way I think that shows what will be possible as we get these new generation of microprocessors, where we have high core count, heterogeneous machines, and are able to put them to tasks as computationally intensive as just this type of interaction.

The face you can see in the middle on the left is the actual 3D model of, in this case, Laura, the robot receptionist. Her job is to essentially assist the human receptionist by taking shuttle reservations. It turns out many people pass through the lobby at Microsoft, but it turns out the vast majority of them are employees who want to go to another building, and so we have a big shuttle fleet. A large part of our receptionists’ day ends up being interacting with those people to find out where do they want to go, then they phone up and order a shuttle. The shuttles are all GPS located and computer dispatched, except we don’t have an effective way to bridge the needs of the rider with the shuttle fleet. So, we wanted to bring all that together, so there’s a communications capability and a presentation.

Then on the bottom is just an overview of what you can see is happening in the room, and in fact the screen is essentially today the kiosk presentation of this very simple face of the robot receptionist. But there are many things that are interesting. One of the more recent additions to this that Eric’s group did — the third person who you’ll see enters the frame on the right is actually dressed differently. Most people at Microsoft are dressed casually, and this person comes in wearing a coat and tie. The receptionist actually addresses them differently and suggests — asks if they’re there to see someone, because the odds are higher that they’re there to visit than they are to get a shuttle. So, the level of resolution that’s going on in the processing of these images goes well beyond the idea can you find the face and the eyes? It’s essentially a more holistic analysis of not only the scene but the people in it and even their dress, and the robot’s behavior is altered as a function of that.

So this is essentially a mostly brain and not so much brawn robot, but you can perhaps make the leap to a wide array of things that could be done if we could perfect this level of man/machine interaction. So, with that I’ll let you watch the video.

(Video segment.)

So this, to me, is powerful example of what it’s going to be like to change the whole way in which people interact with computing. You can see that we’re getting to the point where many things that really are human-like, in terms of the listening and overhearing your conversation, determining whether there’s a relevant thing, like her decision to say, well, are you going to Building Nine? I mean, they didn’t actually turn and say to her, okay, we want to go to Building Nine. They were discussing that, and she extracted that from their conversation, and I think that these things are subtle but powerful in terms of making people feel comfortable in these types of interactions, and as we seek to bring computing to not just the billion and a half people who have had access to it up to this point, but perhaps another 5 billion people, where instead of a robot receptionist making shuttle reservations, maybe what you want is this type of robotic health care worker to help with do-it-yourself medicine in a rural clinic environment, where it’s not possible to have people to be physically present and trained in these things at the level that we need.

For literally a few thousand dollars, we may be able to put an assistant there who can guide this type of robotic interaction and perhaps help you solve other problems in health care or specialty tasks in providing assistance in education. So, I think there’s a wealth of opportunity for this, but it’s important to not only allow people to develop these applications, but to change the way in which the bulk of the population interacts with them.

Large scale, Distributed, Concurrent Systems and Programming Models

So let me go on to talk a little bit about how we’re going to move beyond robotics. I think this is just an example of the kind of broad problems we’ve got in computer science, in both concurrency and complexity. Today, software, as it has evolved for 30 or 40 years and ridden up the power curve of the microprocessor, has in fact gotten a lot more complicated, to the point now where I contend that for us and virtually everybody trying to do high scale software, it’s very, very difficult to make these systems be performant and reliable and secure and predictable at the level that the society really will demand.

At the same time, we’re building these systems out of very high scale, distributed components. Whether that is at a microcosmic level, as we talked about in some of the breakouts yesterday, where we’re putting fiber optic interconnections between the processing elements that may in fact be on a single die, but may range into the hundreds of separate processing elements. So, it doesn’t matter whether you squint and look at this at the micro scale or step back and look at it at the macro scale. The Internet also poses us a challenge of a large scale, distributed, concurrent programming environment. Our tools were really never engineered to be able to address this level of complex system design and deployment in that environment, and I think we’re going to have to see a paradigm change in the way we write applications.

In this slide, we talked about CCR and DSS. Thos are the names of the underlying technologies in the Robotics Studio, and we’re moving forward to productize those in some of our upcoming releases of our run times and programming tool chains. And the reason is, if we’re going to do all these other applications in this context, the attributes of those things, as described by the adjectives on the bottom right of this slide, are almost the inverse of the adjectives that you would use to describe programs today. Today, they’re mostly tightly coupled, synchronous in the way that they operate. They’re largely not concurrent, and when they are, we have a great deal of difficulty with them. We can’t compose large-scale systems.

I contend software is one of the important technological disciplines that hasn’t yet graduated to be a real engineering discipline. Because I think a real engineering discipline ultimately gets this level of formal composition that allows people to structurally build larger and larger systems and be able to reason more about their ultimate correctness. They will, of course, have to be more decentralized and they’re going to definitely have to be more resilient. Today, the world is of course completely dependent on computer systems and networking, but underneath it all we know the resiliency of these systems is not really up to the task and the world would not weather a disruption of these systems in any wholesale way very well. So, I think these all represent challenges to the computer scientists among us, to drive this whole thing forward.

Extending the Web

I want to finish with talking about how these things may come together to actually extend the Web as people think about it. We’ve seen the Internet evolve from simple interfaces and presentation models of text and pictures. We made those move through animations. We added video. We added some 3D capability and the ability to get immersed in different things within that environment. More recently, high definition television has come to the Internet.

Then the next stage might be in fact this spatial web, where we bring together this 3D environment, potentially even a 3D display capability, a lot more surfaces for interaction, and we introduced this as a navigational and interaction metaphor on the Internet. So, I want to show you a demonstration that we put together from a lot of these components that might use this spatial Web metaphor as a way of interacting and getting around.

So let’s say I’m just traveling — my wife and I have actually built a house recently, we’re accumulating some Northwest art and I’m going through an airport. So, I see this magazine at the airport, and so I say, oh that’s interesting. I don’t have time to stop and read this now, but I want to remember to do it and read this story in the past. So, here I have the physical magazine, but I really want to interact with it in a cyberspace way later on. So, I take the cell phone, I check into my hotel, and when I get to the hotel, in the future I might see a desk that has a surface computer in it, and in fact a display capability, another surface imbedded in the walls, and we would expect that this will become more and more commonplace.

So when I get there, I can put my phone down on this desk and it gives me a biometric identity opportunity. So, I put my hand down, that gives me access. It basically links the computers together now, so that I can use them as a composite system. So, here’s the picture I took at the airport. I can drag it out, I can essentially make it bigger and say, okay, I’m interested in this. So, I’ll say, okay, analyze this image, in this case, to determine a way to use the picture of the magazine as a way to get into this cyberspace environment. So, it analyzes it, does the image recognition. So, much as we saw the robot receptionist analyzing the environment and the people, this is analyzing this object. By doing that, it gives me the ability to use this as a navigational metaphor, and I say it was the article on the Eskimo art that I was interested in, and so it gives me the story. So, I didn’t actually have to read the physical magazine. I can decide to read the same equivalent online.

So here it talks about the store in downtown Seattle, in Pioneer Square, that I could look at this at, and I’ll say, okay, I want to go to this store. So, I’ll say, go here, and so it presents me a image, which is now this Photosynth composite of a set of pictures of the actual space in downtown Seattle, and a 3D model that’s been built behind them. In fact, if I look at this alternate presentation mode, you can actually see here a number of the images that were used to develop these models, and then to composite those onto the 3D model.

So if I go back in this environment, I can say, okay, what I really want to do is I want to go into that store, and so I just sort of walk into the model. So, the model is actually a very realistic, photorealistic and geometrically realistic model, of that actual space downtown. So, I can do this in sort of the public space, but when I get there, then this business can have its own interior presence in this spatial web kind of environment. So, I can touch this and go into the store. The storekeeper can interact with me with either text messaging or two-way speech. I can essentially wander around in the store, perhaps go back here and look at this piece of Eskimo art. I can go there and ask about the artist, see the price, tell me more about the artist and his workshop. You could be offered an opportunity to watch a video, which I could elect to watch or not watch.

(Video segment.)

I learn what I want to learn, I can go back to wandering around. Here’s a notice that my wife, Marie, has actually joined me in this virtual space, so the presence of each other in your buddy list, essentially, can be extended into this environment. So, she has decided to come, even though she’s at home, and also look through this same store. So, as I continue to walk around, she can essentially bookmark things that she might think is interesting. So, I can look at this virtual bookmark, and she wants to know, okay, how about this to go in my office at home? So I say, okay, well, let me look at this thing. So, I can essentially take this device, pick it up, look at it, spin it around, as a 3D model. This could have been developed by Photosynthing it by the storekeeper, or perhaps the artist, when they actually produced the object, might give you a 3D representation of it for this type of examination. Then when I’m done I can put it back.

What do you think? I could meet you tomorrow at the gallery. Maybe you could find a restaurant nearby for a quick lunch.

Okay, I’ll see what I can find, and make a reservation. So, we will integrate this type of real-time telephony and telepresence together in order to create this experience. So, I’ll go back to my desktop and ask it to show me restaurants in the area. So, now I’m back outside in the 3D model of the streets around Pioneer Square. I can look at a mash-up of information about the various restaurants. I can look at the specials, I can essentially make a reservation.

So tomorrow, Marie’s going to come down and we’re going to meet for lunch and go to this restaurant. So, let me show you a little bit about how we think navigation in this world may ultimately be assisted by a range of these advanced technologies. This is actually a today generation Ultimo (ph) PC by Sony PC, but it’s real easy to think that the computing power that we have in this today would readily be in your cell phone in a year or two, and what we’ve actually done is put a high definition photo of this part of Pioneer Square down there, just to show you what might actually happen when you do this.

So I’m going to point this thing at that screen, and it says, okay, the café you made the reservation, you’re due there. I say, well, where is this thing? It says, okay, it’s over here. So, what’s happening is there’s real-time image recognition going on of that scene. I could look at this, get information. I could hail a taxi, and just like the robot receptionist, that becomes a more automated process. So, it tells me that the Taxi 427 is two minutes away and it will come pick me up.

I know this is hard to see, so I actually have a video which I’ll close with, which shows what this is like, not where I’m just looking at this image of downtown, but what it is actually like. So, Janet, who’s on my team, has actually videoed this, actually working in downtown Seattle. You can see that as she points to the live space, there’s traffic going on, there’s people going on, but using the compute power of this local device and image maps that are provided by a Web service, we’re able to, in real-time, cross-correlate those things and then do mash-ups of personalized information. So, each of these things is essentially shown as a dot on that screen. So, this is, you could say, a step beyond Virtual Earth, or Google Maps, or GPS, where you have contextual information. You’re using the power of these local devices to do a level of processing and recognition and personalization that makes the experience genuinely more useful and more compelling.

So these are the kinds of things I think that the combined capability of what I call the client plus the Cloud, as the next generation computing platform will be like. We’ll have a common programming architecture that allows us to deal with these resources as if they’re just one big computing platform. I think that as you begin to look at the class of computing, and the class of resources that we’re going to have in these broad array of client devices, that the exploitation of that capability is going to be an important part of how we alter this user experience.

So this diagram is a very simple one, but where this spatial Web demo that I just gave you, I parsed it with the different blocks here. Not quite as much color here as I might have hoped, but the ones on the left and lower right are the client computational environments. It really wouldn’t make any sense to take these real-time, high-res sensory inputs of audio and video, pump them up the network, and try to compute them in the cloud. Because as the robot receptionist showed, it’s just a continuous computing process. So, trying to do that in a system that really is only ideally engineered and can only be resourced to do timeshared activities shows why there’s going to have to be this to-and-fro between what you can compute locally and the kind of orchestration or coordination or centralized data services that you really want to get at scale from the Web services.

Of course, the other things like identity matching and the Virtual Earth datasets, and all the geospatial information, these are things that you wouldn’t want to try to carry around in your devices all the time, because they need to be updated continuously, and just the absolute size of them is very, very large. So, our ability of bringing these things together, I think, will create a radically different experience for people, and one that will make computing that much more important in our lives, going forward. Thanks for your attention. I guess we’ll do some Q&A with Jason now. (Applause.)

Related Posts

Craig Mundie: Cleveland Clinic

A transcript of remarks by Craig Mundie, chief research and strategy officer for Microsoft, at the Cleveland Clinic, Jan. 5, 2011