Remarks by Craig Mundie, chief research and strategy office for Microsoft, on the ways in which more computing power, better user interfaces, and ubiquitous high-resolution displays will change how people use computers
MSR Faculty Summit 2009
Redmond, Wash.
July 13, 2009
HAROLD JAVID: Welcome, welcome. Welcome to the 10th annual Microsoft Research Faculty Summit.
By way of introduction, I wanted to take us back 1,000 years or so and mention a very famous poet and mathematician who you all know. You may not know what mathematics he did, but you certainly know who he is, Omar Khayyam. I understand his poetry is not the best poetry in the original language, but he was very fortunate to have an excellent English translator who was a poet. So that’s why we know him so well in the west. They know him in the east as well. And that’s Omar Khayyam, who was born in the year 1080.
And as you probably may not know, he was the inventor of the general solution to the geometric cubic equation. And that’s the equation X3 plus 2ax = B. So he has something to do with us because we end up with equations that look like that, only with lots more elements and lots more diversity and lots more interest.
But the reason I was thinking with him is with regards to this summit because his very famous lines, “A loaf of bread, a glass of wine, and you by my side in the wilderness. Ah, that wilderness is heaven now.” Well, I hope we have some kind of heaven here. We’ve had some interesting challenges this year because we never questioned that we would like to have an excellent faculty summit, but we wanted to do it, given all the situations in the world and the business, on a slightly reduced budget.
In order to do that, we had to decide what was fundamental. So this poem really became the key to my thinking about it. I thought, well, we certainly can have bread, we’re going once again this evening on this cruise on Lake Washington. I think there will be a little wine on that cruise as well. You may see around the edges other things that have been cut from our expenses, but we focused on that beginning part.
And then the second part of this poem, it says, “And you by my side.” Well, you know, there’s all kinds of different “you.” The you, in this case, is all of you. We really feel that one of the key elements of the faculty summit is to bring together a group of people which is you and all of us together who have a love for learning, investigation, new discoveries with a diversity of background and thought. And so really as we put together the faculty summit this year, we kept those key elements in mind.
Now, a faculty summit really is built on two key pillars: One of them is a theme or set of topic areas which is sufficiently rich and sufficiently broad to give great dialogue. An the second element is to bring together a diverse group of people, real leaders in thought, who really have a diverse technical background who can really talk and dialogue and work with each other on those topics.
And this year, the intersection between some of the world challenges we’re facing, the world-class problems that are available, the advances in computing technology and technology together has really brought us to this idea of a theme of world-class and world-sized challenges and making a difference, and that’s really what we’ve done. That’s what we had in mind to put together.
The program, and you have your programs here, just to give you a brief introduction to the program, we have color-coded by topic area the five key themes of the summit this year, which as you’ll see, are earth, energy, environment, health and wellbeing, education and scholarly communication, computing advances, and making a difference through outreach. And that’s something that we at Microsoft Research have been focused on as well is how can we really make a difference through the activities we’re doing both on the young people who are coming into the profession, but also for advances in research and technology across the ecosystem, not just within the company.
Well, that brings us up to the details. So this is a device that’s called a cell phone. My wife calls it my tether, of course you’ve heard the expression before. Always through our sessions, we would find it wonderful if you would make them as quiet as possible, in addition perhaps even turn them off so you can enjoy each other rather than your connections outside. Hopefully we have enough richness here for enjoyment. Flash photography – we would prefer not to have flash photography here in this room.
I want to mention one more thing, you have badges. The blue badges mean that it’s somebody from Microsoft. So if you need to ask a question, you can ask someone that has a blue badge. I just wanted to mention that.
Now, I wanted to comment about Tony Hey. Tony Hey is next on the program. It’s always my pleasure to see Tony whether when we see him in the office or when we see him on the road. He brings a certain light and energy wherever he goes, a certain cheerfulness, and at the same time is continually pushing us towards really having a deeper focus on the interaction between computing and computing technologies and big challenges that really make a difference.
This has really been the tone which he’s brought to our organization. He’s continued to urge us down every line that seems to make sense along those lines and I’m sure that you’ll find that — and I’m hoping that you’ll find that you really enjoy this faculty summit, and I can’t think of anyone better to kick off this event than Tony Hey. (Applause.)
TONY HEY: Thank you. Well, it’s great to be here, and welcome to Redmond.
So what I’m going to do is just have a few scenes of what we’ve done last year, and then make some announcements of some things that I think are pretty exciting. This year, focusing on addressing world-scale challenges where computation can be a real powerful agent for change. That is certainly in healthcare and environment, but also in energy and education. And so what we would like to do is work with you, the community, to see if we can come up with solutions that can really make a difference.
On the right, you’ll see a picture of our Web page so you can find full details of the summit, and talks and videos will be posted after the event.
So, collaboration and community is really what it’s about, and I’m here the whole time, and I welcome feedback from you at any stage. I apologize for the fact that we’re not providing so many meals and so on, but I think that’s an appropriate response. What we wanted to do, as Harold has said, was to focus on the essentials, which is having a stimulating academic conference and providing some networking opportunities. So enjoy the cruise tonight, because that’s probably the only food you’ll get. (Laughter.)
OK. What I’ve been focused on since about 2001 and have continued on here is the fact that science is being transformed and we’re in the midst of a revolution. And it’s a revolution caused by the fact that we’re getting a huge deluge of data from all sorts of things, from satellite service, sensor networks, super computers, the large Hebron collider, switched on last year and probably will switch on again this summer, we hope. Will deliver petabytes.
On the right, you see an image from the WorldWide Telescope, and all these things – visualization tools, data management tools, data mining tools, data cleansing tools, are becoming really important for scientists to do their work. And what we need to do with this huge amount of data that’s coming in is go from data to information to knowledge. And so semantic, semantic computing, is going to be an important enabler and computing is going to be underpinning almost everything we do in healthcare, energy, and educational and social progress. So we will see lots of activities, and I hope you’ll see that through the course of the summit and in the demos tomorrow.
Just for those of you who don’t know me, I used to be a real academic and did the usual things like writing books and writing papers. I even have the MPI standard I wrote the first draft from. I’d just like to pick out from the top, Research Councils U.K. and E-Science. I started running for the Research Councils U.K. – and in the U.K., that’s the equivalent of NSF. The E-Science program, which was really about the transformation of science by this volumes of data being created by scientists and by instruments.
And so what I’m doing here is really a continuation of that because I became convinced over the course of the five years that I was doing that that Microsoft has a really important role to play in enabling scientists to do their work quicker, more efficiently, and accelerate the pace of discovery.
So what I now lead is Microsoft External Research, which was responsible for organizing this conference. And what we’re really focused on are partnerships between academic, but also with industry like ourselves and other partners like we have a partnership with Intel on multi-core computing, and with governments and research-funding agencies around the world in the fields of computer science, education, and the scientific disciplines.
We hope you’ll see examples of groundbreaking research that we’re supporting. And, again, if you have ideas, there’s plenty of us around, and you’re welcome to come and talk to us and tell us about your ideas.
What we’re trying to do, and what I really believe Microsoft can do, is develop technologies to support all stages of the whole research process. And I think that’s the unique thing that Microsoft can bring to science, and it’s something I feel very strongly about because in the course of the five years I did E-Science, I saw what I would say are generations of graduate students abused by their advisors by doing menial jobs rather than doing the research that they wanted to do. So what we’re trying to do is enable them to work at a higher level of abstraction.
So interoperability, choice, openness are clearly key elements to engage with the research community and what we hope we can do is embody that in what we do.
So last year, we announced the reorganization of External Research into four key themes: core computer science, which is of course central to what Microsoft does and to Microsoft Research, but also looking at energy and environment and looking after the earth. Education and scholarly communication. Scholarly communication is important to me because we are actually in the midst of another revolution. We have the data revolution, but we also have a revolution in scholarly communication in what you can do. And I’ll show you some examples.
And health and wellbeing, I think we’re going to see the emergence of personal healthcare, personal healthcare records, all sorts of electronic data which can be used, if we can manage the computer complexity, to make healthcare better and more available for all.
So underpinning this is what we call our advanced research tools and services, and you’ll see two examples later in this talk. And also across the whole lot is what we do not only in North America, but also in Latin America, in Europe, Asia, India, and Australia, for example. So you’ll see examples of that as you go through the conference.
This is what we’ve done since the summit a year ago. I won’t go through these all in detail. That gray-headed fellow shaking hands with the lady there is, unfortunately, me shaking hands with the President of Argentina at our faculty summit. There are pictures of the audience in China, in Cambridge, in India. All the sort of things we’ve done. And on your desks, you will have got one of these booklets, which tells you in a little more detail some of the examples that we’ve been doing in the past year. So I hope you find that of interest and it gives you some idea of the breadth of our programs and what we do.
Harold talked about blue badges. So Judith Bishop was a visitor last year, this year she now has a blue badge. And I’m very pleased to welcome her as the head of our core computer science theme, which is the last element that we have to fill in in our team to support this.
Judith has been a long-term computer scientist. She was one of the first groups to do computer science in South Africa. I knew her when she was in Southampton briefly. I was a physicist at the time, and she was working on something called computer architecture. I wondered if that was the color, the paint, but no, actually, I eventually learned the error of my ways and understood that she was working on something that was more interesting than I was working on.
So she was, in that sense, very instrumental in changing my research field from particle physics to computer science. She’s also written many books about many programming languages. I remember her book on Pascal and then going through Java. Pascal precisely, Java gently, and C# concisely. I think she’s got to the right place, finally. Right? Pleased to hear that. And so her 14 books are available in many languages. So she will bring a new dimension to our core computer science team in the programming languages and environments and we look forward to Judy being here. She’s around, and please feel free to talk to her.
So accelerating time to insight. I was talking to Craig Mundie, our next speaker, and when I just arrived in Microsoft Research in 2005, I was at supercomputing 2005 with Bill Gates giving a keynote and Craig, I remember, emphasizing accelerating time to insight with the examples from oceanography. And so what we believe, what I believe that Microsoft Research can bring to scientists is support for the whole of the data pipeline from data acquisition and modeling, collaboration and visualization, data analysis and data mining, then dissemination, publication, and sharing and finally archiving, preservation, and curation.
All these things are important, and Microsoft actually has technologies that can help support scientists in every aspect of their research. We haven’t yet demonstrated that, and we’re trying to work with the community to develop collaborative solutions that enable you to build on core technologies to make their research life easier and more effective.
So what we’re trying to do is help scientists spend more time on their research than on the IT issues. And with the data, volumes that are coming in now, scientists do require more of data management and support for their data analysis. So we believe we can provide that, and I’ll give you a couple of examples.
So what we have to announce today is two releases. One is of Project Trident, which is a scientific work bench, and the other is Dryad and Dryad Link, which is a tool for distributing data and computation to systems of clusters of 1,000 computers. And doing it in a higher level, much easier to configure and run your jobs.
So let me say a little bit about each of them. So the scientific work flow work bench was in fact the sort of example that Bill talked about in 2005 and the idea is to make it easier for scientists to ingest and make sense of their data and to enable them to ask questions of their data in a way that would have previously taken them a lot of effort and time to organize and put together. So it also captures the provenance, when the data was taken, who took it, what’s been done to it since you got it and so on. That’s going to become increasingly important, in my view, for scientists as they do their research.
So the example I will give is from oceanography, but is also we believe applicable in astronomy, environmental science, and medical research and work flows are going to become an integral part of dealing with this data flood.
So the tools will be available, no cost to the academic researchers and extensible and we can work with the community to put in features that we haven’t put in and to show – for you to educate us to what else we should be doing. But the key take-away here is that what once required weeks or months of custom coding, you can do in just a few hours or less. And that’s the key. So instead of those research graduate students spending their time getting data here, writing scripts there, you can actually write these work flows that you can share, you can actually edit, and so on. So we believe this is a key tool for researchers.
So this is the details, and I won’t go through every bullet on this slide because you can find them out in the demo, you can go and ask for them in detail. But it’s visually programming the work flows, you can annotate, you can pass them around, you can create libraries, it has provenance capture, supported automatically. It can automatically schedule them on high-performance compute clusters. It can store the data locally or up in the cloud, and it’s fault tolerant. So this is, we believe, a tool that scientists will find extremely helpful. It’s built on Windows Work Flow and can link to SQL Server, Windows HPC, or data storage and computation in the cloud.
So this is the picture of something that’s very relevant to us since we live in the northwest. Project Trident is being first applied with oceanographers and computer scientists at the University of Washington and also the Monterey Bay Aquarium Research Institute looking at these sensor networks they’re going to put down on the Juan de Fuca plate just outside Puget Sound, where it’s a very highly active earthquake zone.
And what we’re doing is part of the larger NSF ocean observatory initiative, which is going to instrument the ocean floor in a way that we’ve never been able to do. And John Delaney, for example, our colleague at UW, believes that oceanography will go from a data-poor science to a data-rich science overnight. And so we believe this is actually going to be a really exciting thing. So please have a look at it and see how we do.
Work flow is also part of the publishing. So again, when you’re writing your Word document or looking at a Word document which refers to some data, you can see now you can add – associate the pipeline which created the data, you can re-run the pipeline. So we can actually go from your Word document, you can actually associate the pipeline, you can go and get the data, you can rerun it. You can see what was used to produce the data, you can produce a new version and update and so on.
And so that we believe is a part of a more general approach to science, which Jill Mesirov at the Broad Institute at MIT calls “reproducible research.” So in that case, going from a Word document to that particular repository gene pattern, you can actually reproduce the data and actually give scientists the provenance of the data and enable it to test it themselves and change the work flows and so on. So we believe that’s an example of how publishing will also be changed by the data.
So the last example is Dryad Link and Dryad on high-performance computer clusters for academic research. So if you have a job which has lots of data which is many copies of the same program but with different data, what Dryad enables you to do is to distribute that easily and gives you a higher-level layer at which to do the programming.
So it implements that on Windows clusters. We hope at some point to move to Azure. But at the moment, what we’re releasing is a Windows cluster solution and there’s installation guide, programming examples, an installer, management tools, and so on. So those you can find, again, in the demo. We did a pre-release at Indiana University and the University of Washington to check out the actual systems, to develop applications and what we’ve got is a bioinformatics application and also an astronomy application. So the bioinformatics application is from Indiana and the LSST, which is the Large Synoptics Survey Telescope, it’s the latest vision for a high-data-rate mapping of the sky is going to start in about 2015 or something like that, and what we’re doing is looking at how one will do the computing and the work flows for that.
So we’ve had – using it internally and we hope that will be of interest with you and therefore again, we’d like to share your experiences and understand what other things you need, whether you find this useful, what changes you might like to make. So we’re very keen to collaborate with you and to find out more about that.
So on the right is a picture of the Dryad wood nymph, they tell me. Why it’s called Dryad, I’m not sure. But anyway, so we think this is exciting and we hope that you will find this useful for supporting your research and we will continue to improve this and again with your feedback, that will be really very helpful.
So the tools can be downloaded on this site here. There are other resources I haven’t talked about, tools to access petabytes of data, and I’d like to give you a forward reference to the session Beyond Search with data-driven intelligence and Harry Shum a corporate vice president in the search group is talking about the future of search, focusing on data-driven research to help advance the state of the art in the online world. I recommend that, and again, DemoFest booth 4, tools and services for data-intensive research, you’ll find more tools there.
So where are we? What I hope you will do is have a great faculty summit. I hope you will actually give us feedback to what we have done right, what we’ve done long, what you liked, what you didn’t like. The agenda, you have a printed version, but there’s also an online version. So please, let us know if you have any other questions or you can find someone with a blue badge, and we’re happy to talk and answer questions.
So with that, I’m going to wrap up my talk and welcome you again to the faculty summit. Thank you very much. (Applause.)
Now it’s my great pleasure to welcome Craig Mundie, the chief research and strategy officer for Microsoft to give his talk on rethinking computing. Craig, welcome. Thanks a lot. (Applause.)
CRAIG MUNDIE: Thanks, Tony.
Good morning, everyone. Last year I was on vacation, I didn’t get to come and talk to this group, so I was happy this time to be able to come and share some thinking with you about what I think is going to happen within computing broadly in the next few years and why there are some discontinuities that are likely to happen in that time period.
When I think back over the history of certainly the Microsoft era of computing, we’ve already seen it evolve through a number of cycles. The beginning, of course, was the text-mode interface, and there was a big change in computing with the introduction of the graphical user interface.
And as we go from each generation to the next, I think that these things are always driven by two interesting phenomena: One is there’s some killer application that takes advantage of this technological change to create a compelling experience of some greater utility than what people thought they were currently getting from computing. And two is that that – the adoption of that is almost always driven by the individual, not by an institution.
Research institutions don’t drive that adoption and the business institutions don’t drive that adoption. It’s people who are the sort of early adopters of these things that tend to create this phenomenon. And as time went on, you could say the Internet evolved as a computing platform and Web browsers and e-mail were the killer apps that brought us into that generation.
I think we’re at a point now where it’ s clear that the Internet as a publishing medium primarily is evolving quite rapidly into something which has a lot more capability in terms of connectivity, a broader range of devices, and increasingly the ability to do things programmatically in that environment, not just in terms of a publish and consume model.
And so many people today talk about a Web-based model of computation. I actually think that this is a transitional phase that what you really are going to see is the continued evolution of the local computing paradigm where the graphical user interface has been the predominating model of interaction there. Although as I’ll talk about in a minute, I think that’s going to expand quite a bit. And then this connected world kind of environment where the Internet has been the prototype of that.
And so the next big platform which I think of as one thing, not two things, is this client plus cloud environment where we need a programming model that will allow us to deal with that as if it was a single entity, that we need for people to be able to cast applications across that combined computing capability with the same facility that historically they were able to write either an application that ran on a local computer or to publish on Web pages that people would consume and perhaps discover with mechanisms like search.
And so we’re really at a point in time where I think this client-plus-cloud environment is in search of its killer applications. But I think that we can already see the outlines beyond the client-plus cloud platform. In particular, I think we’ll see a reversion, again, to a major change in how people interact with computing and we might see that that era will emerge and perhaps be called the natural user interface as something that goes beyond what we currently expect from computers today. And that’s a lot of what I want to talk about with you this morning.
The business of software, certainly, from the beginning has always been about increasing the abstraction level that people have in the way that they deal with computers.
In the earliest days, you know, the introduction of even assembly language was an abstraction that first introduced for the programmers a very, very tightly correspondent view of what the machine was capable of and how you instructed it, literally, one machine instruction at a time.
And certainly, actually, when I started going to college at Georgia Tech in the late ’60s, you know, that was still the predominant thing that people were being taught. It was certainly the first thing that I did as I started to write programs. And I can remember having to flip switches on the front of machines to actually enter the boot strap so that the machine would start up.
And so in one man’s sort of working lifetime so far, you know, we’ve seen that go from literally instruction by instruction capability and programming models to things that are obviously, increasingly, dramatically more powerful. And so I think if you look at how these abstractions are changing, I think we’re going to see some big changes. One, I think is modeling will increasingly replace coding. The idea that people will sit there and writing programs will be reserved for a smaller and smaller percentage of the population who want to get computers to do things for them.
And we’ve seen these things – you know, you could say the introduction of the spreadsheet, the killer app of the PC, or one of the two killers apps, was the introduction of a modeling mechanism that the average person could use that didn’t require them to write procedural programs. And I think more and more we see higher and higher modeling concepts being introduced and I think that will continue to be important.
We’ll see continuous networking be the assumption more and more as opposed to a thing that is nice to have. But even there, I’m quite skeptical about a world of perfect networking, and certainly skeptical about that over – when we look at the total number of people across the planet who should be getting some benefit from computing capability.
If you think of computing broadly as computation, storage, and interconnection at any scale, the one that I think from an economic point of view will not progress as fast as the others will, in fact, be connectivity in the wide area. And as such, I think we’re going to always be under pressure to find ways to be parsimonious in the use of bandwidth and particularly true as we extend this to more than the billion and a half richest people on the planet.
I talked about the client plus the cloud as a new programming paradigm. This has been an area of considerable focus for Microsoft and I think you’ll see is continue to push that forward.
I think computing everywhere will be something that people just come to take for granted. This is my 17th year at Microsoft, and I came here to do non-PC computing in 1992. At that time, that seemed like a pretty radical thought, certainly inside Microsoft, but you know, broadly people were still focused more or less on the computer was this thing that was on your desk, you know, and maybe it was in some back room someplace, but the idea that computing would be almost ever present in people’s lives and the places where they would interact with it would become so diverse certainly was not common knowledge as little as 15 years ago.
But I think anticipating that was important and it certainly has prepared Microsoft for an evolution to an environment where we do, in fact, think of computing as being everywhere.
I think we’re also going to see new data models, the way in which we have stored data I think grew out of literally the punchcard record model. Then we put some indexing mechanisms around that. But increasingly, we recognize that we want to preserve huge volumes of data, we want to be able to derive information from that data that wasn’t anticipated at the time that the record keeping was begun.
And that’s quite a change. Most data applications in the past were designed for a particular purpose. And today, the world is full of bespoke-engineered systems, application systems and their attendant databases. And so one of the things that we’ve been doing, particularly in the healthcare area, is looking at new models of integrating the data from these bespoke systems into one new data structure which will support a lot more ad-hoc analysis, interaction, and automation against that data. And I think those kind of things will be increasingly important.
And finally, the change of the model of interaction between man and machine I think will go up a significant level to this concept that I’m calling the natural user interface. I want to talk briefly about that.
You know, many people at Microsoft –- including me and Bill Gates – over the last few years have talked about the emergence of this natural user interface. But as I’ve really thought recently about what that really might imply and how big a change would be required, I really convinced myself that all the things that we’ve talked about as elements of natural user interface so far have been used largely one at a time as enhancements to the GUI.
Now, this is not really unusual. I mean, if you look back over time, you often find that when new technologies are brought forward, you know, the first thing everybody does is they try to do the old thing the new way. And then finally, you realize, no, no, there really is a new thing to do. You know, I remember when I came here, Nathan Myhrvold used to cite the example of the movie camera. You know, when Edison created the movie camera, he pointed out to me, that the very first thing they did for a few years with movie cameras was they took films of plays. They’d stage the play and film it so people could watch the play.
It wasn’t until a few years later that people realized, well, I don’t really have to do that, you know, I can basically take snippets of this stuff and glue the pieces of film together and I can make a movie. And the movies as we know them today, of course, fundamentally depend on that architecture of construction and not just filming of something that was staged.
And I think we’re kind of going through that with computing right now where we’ve had each element of these more human-like interaction models, whether it’s handwriting recognition or touch or gestures or other things, but we’ve largely been using them in the context of the graphical user interface. And I don’t think we have stood back far enough and said is there a completely different way to think about how people will interact with these machines? And if we did, you know, then what would that be like and how should we start to build it?
And so I am now saying we’re going to move beyond this enhanced GUI to a real natural user interface and its attributes will be quite different. One, you know, speaking to the machine and having it speak back to you, having it understand context I think these things will be very important. The machines will be increasingly environmentally aware.
You know, you can look today at, again, specific examples of this where today people at Microsoft and elsewhere are taking cell phones and saying, well, look, now that it has cameras and microphones and GPS capabilities and accelerometers, can’t we use all of this contextual information in order to anticipate what you might want or to provide some implicit context to whatever the task at hand is? And, indeed, as we experiment with that, we find it to be very powerful.
I think we’re also at a point in time where we’re likely to see the very popular emergence of three-dimensional display capabilities. Today, you know, a lot of the recent movies that are being introduced, certainly for kids’ entertainment in theaters are targeted at a 3-D display environment in the big screen environment.
And we also see now that there are emergent technologies that are going to give us the ability in everything from the cell-phone-sized display to the desk size to the wall size. each of them may use a different technology, but I think it will be commonplace to actually be able to render and see real 3-D presentations. And that has a very powerful effect if you haven’t experienced it in terms of your sense of the reality of the interaction.
Another thing computers are going to do is they’re going to become better at anticipating what it is you might want. And I think that this is going to be a very important part of changing the way in which people get benefit from computers.
So lately, I’ve taken to talking about computing more as going from a world where today computers work at our command to where they work on your behalf. And today, more and more, I think about computers as powerful as they are as just really, really great tools. And we sort of evolved these things from the hammer and chisel to the six-axis milling machine. But it’s still a tool. And if you haven’t done your apprenticeship and you don’t really know how to master the tool, you don’t get as much out of it as you might.
And so the question is: Can’t we change the way in which people interact with these machines such that the machine is much better able to anticipate what you might want to do to allow you to describe that in some much higher level of modeling or abstraction and to provide a richer form of interaction.
And so a lot of the work that’s been going on in Microsoft Research for the last 10 to 15 years has really so far only been applied in this enhanced GUI model that I described a few minutes ago. But about a year ago, we started for the first time to pull these together, largely anticipating the arrival of each of these technological underpinnings, particularly in the computation and storage arena, and also in the programming construct arena to say can we bring them together and start to change that model so it really is a lot more about working on your behalf.
And some of you may have seen – I think it’s been about a year since I first gave the demo that’s depicted on this slide. You know, this at Microsoft has come to be known as Lara the robot receptionist. Eric Horvitz’s team – in particular Dan Bohus set out to use our robotics toolkit to assemble many of these individual components into a real-time interaction system. And we started out with what we thought was a relatively simple task, which was could we design a 3-D presentation of a robotic assistant that could sit in the lobby at Microsoft and whose context – whose domain of discourse was ordering shuttle buses to go around the campus.
And so they studied how that happens and they built this thing in a relatively short period of time. And while I won’t show you this demo because many of you may have seen it, I’ll just point out that when this thing was idling, it basically used about 40 percent of an eight-core machine. Because in a sense, when it was idling, it was watching, it was listening, it was essentially trying to maintain context on a continuous basis.
And it’s those kind of things that make it so clear to me that in the future that this all will have to be built on this hybridized client-plus-cloud architecture because it won’t make any sense to take high-definition video, stream it up the wire so that you can process it in the cloud on chips that are actually the same ones that’ll be sitting around in all the local devices. You’re just imposing a latency penalty and a communications cost that largely is impractical when scaled up to the level that would be required to do these type of continuous functions.
I think this is one of the most fundamental things that people have to wrap their mind around is that as we move from this sort of time-shared model of the computer and the graphical user interface to more of a continuous computation, you know, continuous contextual awareness, anticipation of what people might really want from the computer, the idea that you can time-share these giant resources in some cost-effective way will just be increasingly practical, even if communication was free, the latencies that would be implied, we now know make it implausible.
That was one of the things we saw with this is that the timing tolerances for people to experience something that appeared to be a natural human interaction style of the facial expressions, the timing of the words and the speech, you know, all these things turned out to be critical to a really natural experience. And we know that those things are not likely to be computed in real time if you interpose the latency of a wide area network in the middle of it.
So what I want to go on and show you just because it’s a little novel – the goal that Eric and I have talked about each time he’s produced a version of this was, well, what’s the next highest-level task that we can give him. And I’ve already told him that my dream for this in the years ahead is I want a robotic teaching assistant and a robotic physician’s assistant. And in particular, I want them because in the less developed parts of the world, there are billions of people today who don’t get any high-quality education, and they certainly don’t get any high-quality healthcare. And it appears quite clear to me that the only scalable way to approach those problems is, in fact, to use some type of robotics. Not of the factory machine tool type, but perhaps more of this type in order to be able to provide that type of personal and professional services.
And so each step now we take has been to try to say how can we move this thing forward more and more and put higher and higher levels of intelligence, if you will, not in the AI sense, but in terms of the subject matter expertise into this.
So here’s a video of Eric putting a robot receptionist as his personal assistant outside the door of his office. So I’ll just show you a little video clip of how that might work.
(Video segment.)
CRAIG MUNDIE: So it’s interesting, as we build each version of this and you realize we’re sort of using all the compute power that we’ve been able to build sort of in a dual quad-core machine, and yet each aspect of the interaction is a bit rough. You know, obviously, the model is not what you’d get in an Xbox game today where we dedicate a lot more computing to the 3-D modeling and the characters. Because here we’re actually having to spend a huge amount of time on the machine vision capability and the speech synthesis part.
But the thing that’s exciting to me is to recognize that we’re only a couple of years away from a time where I think we’re going to see dramatic changes in the computing capability through these fundamental changes in the underlying microprocessor architecture. And those of you who have heard me talk anytime in the last five or six years, you know, will think that I’m a broken record about that. But indeed, we see all of these things materializing now. And I think that certainly by 2012 or 2013, we’re going to see a step function change in the amount of computing capability that will be in all of these devices.
And so with that, I think that it’s time to go back and reassess, you know, what are the categories of computing that will become important, you know, what is the relationship one to the next, and to some extent, how do they relate to what’s popular today and what work lies ahead for us to do this?
I think that you can basically group all of the computing devices at the client end into largely four buckets of form factors. And I’ve named them phones, specialty devices, portable computers, and fixed computers.
Now, it’s interesting when you would think about that taxonomy. And I think back about our own history at Microsoft, we started out with fixed computers. The desktop PC was a fixed machine. And we bought it, we put it on our desk, we didn’t move it. It became so valuable to people that we decided, wow, I need to take it with me. And so we created the portable desktop, a.k.a., the laptop or the notebook.
And that has been hugely successful. And today, if you say in the PC industry, you know, what’s the dominant form of personal computing, you know, they say well, it’s actually the laptop.
Now, I contend that it’s actually a failure of the industry and even Microsoft itself that we have not continued to go back and ask ourselves the question about, well, isn’t there something more you can do if you don’t have to move it? And I think that that day is coming now, and I want to show you a little bit what that’s going to be like.
The second obvious thing is that today people have taken what we started out doing as embedded computing and we’ve moved that and this desire for portable, continuous computing and it’s made the phone incredibly popular and increasingly more powerful. So now we’ve got two different things. The difference between portable and phone, if you will, is the difference between, in my mind, portable and mobile.
One, you can move it and then use it, the other you can use it while you’re moving. And I think that’s the difference between, if you will, the first and the third categories on this slide. But it’s also clear to me, despite having come here 17 years ago to work as much in the specialty category as in the phones and portable category, that we’re going to see a huge emergence of connected devices in this specialty category. And I’ll share with you a little bit more of how I see that evolving in a minute.
Let me turn first to what I think is going to be really a transformational concept in how people are going to deal with computers that are in the environment in which they are, where the people are. And so I brought a demo this morning of what I think it might be like to have an office of the future. There are obviously versions of this vision that you can apply in homes or other spaces as well.
But we all know from a commercial point of view, certainly, and from an adoption point of view, if you can bring together things that change people’s fundamental notions of business productivity, then they pay money for that. And so it’s very important to think how do we take each of these concepts of computing, communication, interaction and say in a world where the room is the computer, it’s just not the thing on your desk, you know, how much interaction in that environment change?
So what brought is essentially mock-up of what may be my office of the future. Today, my office has just three monitors and a PC and it’s clear that even having that surface area was a productivity enhancement, and we’ve measured that. But we also see now that the cost of the different display technologies is dropping quite dramatically. So we have actually been experimenting with different display technologies, and it’s not out of the question that we may be able to make high-resolution, wall-sized displays where the cost of the display is less than the cost of a piece of sheetrock.
When you think about it that way, you say, well, then it gets really interesting. If it costs me no more to basically build my walls in my office out of these display technologies than it would have to basically screw gypsum board and apply paint or wallpaper on top of it, why won’t I have a lot more display surfaces? And what would it be like to have that?
So we built a prototype of what it might be like where surfaces become active and an integral part of what it might be like. So let me take you through a little bit of what it might be like.
So I come into my office. The computer wakes up. You know, this is sort of the replacement of the screen saver. It’s interesting in that they’ve even done some research that says if you have this thing that looks like a window and you maybe can’t hear it, but you have the sound of the waves and people think they can look out the window, it actually has a calming effect. So even if you happen to have an interior office because you live in the United States, then you might think this is a good thing to have.
But in any case, you’ve got a desk, it has an active surface on it. And as I approach it, the thing will basically go – because it can detect me using vision systems, it can detect my presence here. And it’ll change to my most recent work experience. And so in this environment, there’s a lot of things that I can do. One of the things is I can have on my desktop little buttons here for projects. And I want to return to working on this architectural water project that I was looking at before.
Here, I can have a keyboard which is actually just projected on the desk surface. This is a touch-sensitive display capability. You know, I can have high-resolution documents that are presented to me on that. You know, I can have navigational models in the desk environment that would allow me to zoom in on a lot of these different capabilities. And so if I have this project and I want to work on it with people, then there can be more ways of doing it.
The UI control can actually advance to different ways and more and more we’re going to have simulation as an integral part of this. So I might have been looking at this model, and I can say, look, you know, let me look at the simulation and control it. So the human touch interaction system can change dynamically to do that too. And I can essentially drag this and change model parameters and see what the effect of those might be.
So I get a little notice, a little post on my desktop says I’ve got an upcoming meeting. Perhaps a colleague is going to come and join me and we’re going to discuss this. And of course one of the problems you always have in my office today is if you have a white board, it has a bunch of stuff on it, you want to get rid of that and be able to use it. So in this environment, we can essentially take gestures and camera systems and use it to essentially change the context of the wall so I can save away the thing that I was doing. I can now start to have a meeting, perhaps, with someone who might be physically present.
You know, I can say, hey, we were talking about this meeting, let’s basically sort of rehydrate it to where we were when we ended the last discussion. You know, it can bring up notes, you can annotate them, you can mark them. Because all of this is anticipating the use of cameras and vision systems as a much more integral part of how people interact with the system. If you look at what we’ve done with this project Natal, we introduced recently with the Xbox, which is essentially a real-time motion depth camera capability. In that case, we’re using it to create a gesture-based interaction model for gaming. But we believe that those technologies will become easily as usable in this kind of environment as in others.
And so as the cost of very high-resolution cameras and even displays come together, it becomes a lot easier to do that. When you think about this display as we’ve actually built it, there’s essentially a large area of lower resolution that projects these walls. But in this particular work area, both on the surface and this area, I have very high resolution display capabilities. So I could read fine text even as it was presented in this same environment.
So I could take a paper document, I could say I want to add that to this environment and add it to the collection. So you will get more and more ability to combine electronic record keeping, paper record keeping and ultimately get these things all together. You know, I might have some ability to basically take a device that has notes on it that I might have done, and I can use gestures and camera-based movements to say I want to take information from a tablet or a pocket computer or some other mobile or portable computing device and move it around and add it to this record as well.
So this meeting might end and they say, okay, thanks, I’m going to go back to the context I had before. I saw that my assistant has interrupted me, I’ll basically restore the context that I had before. Good morning, Dag, you know, what’s happening?
COMPUTER ASSISTANT: I have some answers to the question raised at the meeting this morning. Do you want to see them now or should I just post the information to the project site?
CRAIG MUNDIE: Just post it to the site, please.
COMPUTER ASSISTANT: Done. You also have a meeting with Patricia San Juan in a few minutes.
CRAIG MUNDIE: I don’t remember what the context of that was. Can you give me some background?
COMPUTER ASSISTANT: Of course. Here’s some background.
CRAIG MUNDIE: Oh, yeah, that’s the water project, we’re working for that rooftop design. I see the cost has gone up substantially from the last time I reviewed it. Can you give me some background on that particular number?
COMPUTER ASSISTANT: Let me check on that. Here. If we zoom in on this part, we can see that the pricing of some of the upstream dependencies has changed. Would you like me to get Patricia San Juan on the line?
CRAIG MUNDIE: Yes, please, go ahead and set up that call.
COMPUTER ASSISTANT: Is there anything else?
CRAIG MUNDIE: No, that’ll be all, thanks. So here we’re going to have a video call.
PARTICIPANT: Oh, hi, it’s good to see you again.
CRAIG MUNDIE: Good morning.
PARTICIPANT: And now my team has refined three of the designs since our last conversation, here’s what we came up with. Now, given what we know from you and the customer, these are the three we think would work the best. No. 3 is the one the customer liked the most, but we’ve come across a small problem. I’m hoping we can talk through some of our ideas on how to make this one work.
CRAIG MUNDIE: Sounds good.
PARTICIPANT: Let’s set aside the other two for a second. OK. Now, here’s the problem: The water system the customer was so excited about is in a great spot, and the model shows that it really enhances the existing day lighting system. But the model also indicates that we may come up against some of the regulatory changes that are starting to take hold in some other cities.
CRAIG MUNDIE: OK.
PARTICIPANT: So our solution is to use a different water capture system here and here. Okay, now, watch the other variables and see if there’s anything you’re concerned about.
CRAIG MUNDIE: They look OK to me.
PARTICIPANT: Plus, we get a sourcing incentive so we can get most of the materials right here in town.
CRAIG MUNDIE: That’ll make people happy.
PARTICIPANT: So this way we keep the aesthetics the customer likes, we’re forward-compliant with the regs, and we end up saving a bit of money. What do you think?
CRAIG MUNDIE: Sounds good. Let’s do it.
PARTICIPANT: I think it’s going to be great. I can’t wait to start construction.
CRAIG MUNDIE: OK. Thanks a lot. Appreciate your help.
So in this environment, you know, it becomes clear to me that modeling human interaction becomes really important. One of the things that she mentioned was that we had the ability to use more and more of this computational capability for these models and simulations.
One of the things I think will be equally important is to help people use this to understand what these projects might be like, whether you’re the designer or the ultimate customer.
So let’s do one more thing. Dag, can you bring up the real-time simulation of that project?
COMPUTER ASSISTANT: Right away.
CRAIG MUNDIE: So here what we’ve got is essentially a model that is being constructed of this proposed project. And the background is essentially an actual photographic image of the environment in which this building would be built and this rooftop garden environment would be built. But because the cameras are basically tracking my position and placing me within this physical environment, as I essentially move around, it actually recomputes my eye point to be exactly what it would be from that location. I didn’t have to have some 3D modeling or simulation, I didn’t have to joystick, it’s just like what we’re doing in Xbox games where I am the controller.
And so I can move my head a little bit, it’s essentially tracking where the eye points are. As I walk around and look at these things, I can check out different elements of the view. If you want to know whether you can actually go over here and see the water out there, you can say, yeah, you can see the water.
So if you’re someone who’s trying to make a presentation or make these final adjudications as to the product’s design efficacy, these kind of tools become increasingly powerful compared to what we know today. So the technologies that today have found singular application like 3-D modeling and simulation in a CAD/CAM environment or 3-D modeling in a game world, you know, these things all are going to come together I think to create a much higher-level user interface concept for people in the future.
So let’s go on and talk a bit about some other aspects of this. As I said before, I think there are four form factors, and this is my demo of why I think the fixed one is going to become resurgent, that there will be a successor to the desktop, it’ll be the room. But I think that the specialty category will also become increasingly important.
We’re living in a world where the cost of computing itself on a local basis is going to continue to drop dramatically and yet become wildly more capable including this discontinuity in a few years due to the heterogeneous many-core chip architectures that are going to come out.
I think the investments that we’ve been making collectively to come up with new concepts for writing parallel programs, for creating this cloud-plus-client programming environment is going to make it increasingly possible to build these very, very sophisticated applications.
Whether I think about it at the specialty device level or certainly at the level of the room, you know, to me, it’s clear that the future is not merely about enhanced Web pages or just being able to search and find more information. It’s really about a qualitative change in what people expect from computers and how they interact with them. In this specialty category, you know, I think it’s driven a lot by a world full of sensors. And these sensors will play important roles in many societally important areas.
I’ve just chosen a few to highlight here where we have interactions now with many of you in the room, you know, around the environmental forecasting which we’ve been doing medical image analysis and a lot of processing of sophisticated images, the ability to synthesize or use 3-D display technology in concert with 3-D medical imaging. I think all of these hold great promise.
One of the ones that I think is particularly interesting and I just brought to show this morning is this last one where we’ve been working with people — and this is an example of an ultrasonic detector which today if you were going to us this to provide prenatal care, as we do in the rich world, you buy a dedicated machine, it’s reasonably expensive, it’s not very portable, doesn’t run on batteries, you know, but in a world where we want to provide Dr. Laura in terms of the evolution of that robot receptionist, then you want less trained people or even a robot to be able to explain to a pregnant woman who has no access to medical care, you know, is there a way to basically get these images, to do prenatal exams or with someone who has minimal training.
And so this is an example of a cell phone with this ultrasonic scanner plugged in through the USB port. There’s now enough computing power even in today’s smart phones to basically be the data capture, analysis, and graphical presentation capability for that. If you say on top of that maybe I even have some cloud capability backing it up, that I’ve got some ability to take the locally captured data and move it somewhere, or to use it to create some type of interaction if you think there’s a problem, then I think each of these things creates an opportunity for us to use technology to solve some of society’s biggest problems.
I think the issues around education and healthcare particularly in the emerging countries around the world is one of society’s biggest challenges, and I personally think it’s sort of the consumerization of this new form of computing that is really going to provide us all an opportunity to move to address some of these problems.
So I’m extremely bullish about how computing itself is going to evolve. I think that despite the progress and tremendous benefit that has accrued from personal computing and the Internet as we’ve known it and even now the arrival of the smart phones, I think we’re really again just at the beginning that we’re going to be able to use these increasingly powerful machines to qualitatively change what people get form computing and how they get it. And I think that all of us need to raise our sites a little bit when we think about the task at hand, whether you’re working down at the level of the physics of the electronics or all the way up at the highest level of modeling and human interaction, there are many, many things to do. And it’s why we continue to make the investments we do in Rick’s area here at Microsoft Research and why we work so hard to collaborate with many of you to try to find interesting applications and evolutions of each of these technologies.
So let me stop there and use the remaining time for questions and answers as long as they’ll allow us.