RICHARD LEVIN: Hi. Good afternoon. (Applause.)
It’s really a special pleasure for me today to be able to introduce Craig Mundie to all of you. Craig is here under the auspices of the Gordon Grand Lecture Series, which was set up about 40 years ago.
Actually, interesting story, Gordon Grand was a very distinguished alumnus of Yale College, who went on to be a business leader at the Owen Corporation and other enterprises. And in his memory, his classmate Clint Frank — how many people recognize who Clint Frank was? Anybody? He was Yale’s Heisman Trophy winner in 1937. Clint Frank gave a gift to endow the Gordon Grant lectureship, and we’ve had for that entire period of time outstanding business leaders come to Yale to talk to Yale College students. When the School of Management opened later in the 1970s, the program was expanded so that usually we can tap these visitors to go to both, to speak to both Yale College students and to SOM students.
Anyway, Craig Mundie, I had the great pleasure in — I guess we both did in January 2009 of being appointed by President Obama to the President’s Council of Advisors on Science and Technology where there are some real scientists and real technologists and a token economist.
I have learned so much from Craig through his participation in those meetings. He’s got an extraordinary grasp of the whole range of what technology can do for the human condition. I’ll come back to that.
Craig’s history is that he spent a number of years after graduating from Georgia Tech in the computer industry in a variety of startup businesses and some bigger ones. And then in 1993, he moved to Microsoft where he’s been the last 20 years, and where his role has been quite extraordinary for many years. Up until just recently he was the Chief Research and Strategy Officer for this technological giant, directing the entirety of Microsoft’s research organization and also the chief strategist of technology and the approach the company should take towards the development of new technologies.
He also wore simultaneously the role of being Microsoft’s chief liaison to foreign governments when they were entering foreign markets, and indeed in Washington as technology policies were being considered there.
He’s recently gone into a kind of phased retirement where he’s passed on the large management role that he had before, and now he’s serving as senior advisor to Steve Ballmer. But here’s a man who’s been close to Bill Gates first and then Steve Ballmer over the last 20 years, and has had lot to do with the shaping of one of America’s great companies.
Craig has an astonishing grasp of emerging technologies, and that’s what he’s going to talk about today. And I’m sure he will want to welcome questions in any area that he talks about. But frankly you can quiz him on just about anything, because any area of technology where technology intersects policy or changes in human life, he knows something about it. It’s quite remarkable. He’s got great interest in the sort of use of technology in biomedicine, and one of the single contributions he made to the President’s Council of Advisors was in his leadership on a report on healthcare IT, where he essentially conceptualized a whole new architecture for keeping electronic health records that’s really a generation ahead of where we are now, and has provided very important policy guidance as we think about the future of healthcare IT.
So it’s just a great pleasure to introduce a great friend and a great colleague and a real leader of technology and business strategy, Craig Mundie. (Applause.)
CRAIG MUNDIE: Thank you, Rick. It’s very kind of you to give such an elaborate introduction.
It’s a pleasure to be at Yale. We had originally planned to do this a few months ago, and the giant storm forced us to cancel it at the last minute. But I was committed to coming here, and sort of completing that goal that we had a talk in here.
These discussions on college campuses are something that Bill Gates and I for 20 years that I’ve known him and worked with him we both have really cherished these opportunities. And each year that Bill was at Microsoft, and in the time since he’s left, I’ve sort of continued what was my own commitment to spending about a week a year in the United States visiting four, sometimes five, universities and giving a talk like this.
But I also spend time, I have a roundtable with students, a roundtable with the faculty, a roundtable with the administration, and with Rick’s help we were able to arrange all those today, and it’s been a fantastic experience.
Part of the goal in doing these talks is to not only share with you a little bit of some of what we think is going to happen that you might find interesting, particularly more and more as technology is so pervasive in its influence on our lives and our work and ultimately our societies, but it’s also a chance for old guys like me to get grounded in what’s happening with young people and what sort of the fresh thinking is that happens largely in university campuses. So it’s a two-way transaction.
Today, I’ll spend about the first 45 minutes or so and give you some ideas and demonstrations of some technology that we’re working on now. By and large, almost everything I’ll show you is not a shipping product. It’s mostly derived from work in our research groups that’s not in the far future but in the relatively near future. And so we’ll keep our fingers crossed that all of these prototype systems actually work right.
But I think people, like the people at Yale, would certainly be pretty familiar with what you can actually buy today and what you can do with it. At least that’s my operating assumption. And so I think the value add is to tell you what some of the major things are from a trending point of view and what the technology is that’s going to help you to realize some benefit from those things.
After that talk, we’ll actually give away some of the door prizes in case any of you actually have to leave for another commitment sort of on the hour, but then I’m going to stay at least until 5:30 and have an open Q&A.
And as Rick said, basically there’s no real rules. You can ask me anything about any topic, and if I have anything to offer that’s worthwhile, I’ll offer it. So it doesn’t have to be confined to the technology I talk about or Microsoft or anything else. I’m happy to just have a discussion with people.
And then I guess we’re going to have free pizza after that. So for those who want to stay you get fed, too.
So let me start first and talk about how our own business is evolving. Microsoft for many years has been known primarily for Windows and Office, you could say, although if you really look at it, the company is a very huge and diverse enterprise these days. We have about 93,000 people, about 45,000 engineers, and we build products across the entire range, from game consoles and high-tech gaming capabilities and content at the other end to critical infrastructure components for enterprises and governments. And so I think as a company it gives us a perspective across everything that software and computing gets applied to that’s pretty broad.
Also we’re, I would argue, one of the world’s most globalized businesses. I think last count was like 201 countries that the company actually operates in. And so we end up having to deal with a lot of the things that are imposed on companies simply because of the diversity in the planet and its people that we ultimately deal with.
But there is a big change afoot in our world over the last few years. Many of these are things we’ve anticipated. We didn’t always do a great job in execution, but it wasn’t for lack of understanding them. And that’s the emergence of many, many devices, and all of them being computerized and increasingly all of them getting connected.
In fact, when Bill Gates and at the time Nathan Myhrvold hired me at Microsoft, it was to do startups, and the first startups were to focus on how to think about creating a system for when all these devices would become computerized. And so we’ve been working on interactive television and game consoles and watches and cars and televisions literally since I went there in 1992. And many of these things have now come to market and are big businesses, and some of them like interactive television are still in the birthing process.
And it just shows that when you are actually trying to change society’s infrastructure it takes a long time, and it takes a huge investment. So merely having the technology or the insight about how it might be applied still requires that you think carefully about how these things ultimately are going to get deployed, and there are many interesting challenges.
As Rick indicated, one of those challenges is that many of these things are now in regulated industries, and while much of the computer industry was born in an environment where we didn’t really think much about regulation, you know, when you start to let this seep into virtually every form of critical infrastructure and operating environment that the world has, you’re inevitability going to face a lot more regulation, and that affects how you do the business and how you think about the products. And it was that that really originally forced me to get involved in all of these diplomatic activities for the company.
The other thing that’s really changed is once everything got connected, we have evolved to where there’s almost no device or application that we think about in isolation, either from the other devices in the family of devices or from the services that exist in the backbone of the Internet and which essentially help to empower these things. And yet they’re all derived from software and computing in one form or another.
And so we’re going through I think the sixth or seventh major evolution in 38 years of the way we think about our business, to not think about it so much as just building software as a component that we ship to other people and that you buy on a perpetual use license basis, but rather that people want to buy these things more from us as a service, something that we curate and operate, and that they can buy on a usage-based model. So we’re changing a lot of how Microsoft not only builds its products, but even how we think about monetizing the software capabilities that we have in the future.
Now, one of the things that has obviously been valuable so far has been the graphical user interface. And almost everybody today thinks of that as the innate or basic way in which they can operate computers. For a long time, you operated that with pointing and clicking and typing as the way to do it, and in the last six, seven years it’s really exploded in terms of how people have sought to use touch and ultimately other forms of human sense-like capabilities as an alternative way to drive that graphical interface.
But one of the things I’ll talk about later in this talk is a major focus at Microsoft, which has been around the idea of what we call natural user interface, or NUI, to replace or supplement GUI; natural in the sense that we want it to emulate the way people interact with other people. We’d really like the computer to be more like us, and we’d like to find a situation where through that many more people can get much greater utility from computing capability than even that which has been achieved so far.
You know, if you look today, there’s arguably maybe about 2 billion of the 7 billion people who really benefit from computers. And so there’s more who haven’t yet had that benefit than who have already enjoyed it. And, of course, it’s this question of utilization and the skills necessary to use computing that in many ways, along with cost, has been a limiter.
But today, the high volume of these things has continued to drive the cost down, and, in fact, you can go almost anywhere in the world, in the poorest places even, and you’ll start to find televisions and cell phones. And I think that there’s no reason to believe in the future that any person anywhere in the world who can afford to buy a television or a cell phone won’t in the process, in the bargain, be buying a computer system and probably some type of inherent connectivity. And with that, we can really think about how computing then provides a much different range of benefits for all those people.
So touch has been an important change. This last version of Windows that we developed, Windows 8, was really the first time where the company elevated touch-based operation to sort of a first-class citizen status, and we brought with OEMs and by ourselves these Surface products or Windows 8-based tablets to market.
But we tend to believe that over time we’re going to see all sizes of these types of touch-based displays, and that they’re going to find many applications beyond that which we historically have known.
So I brought this, which is a new product from Microsoft, which is basically I’ll say the second largest tablet that we sell. (Laughter.) This one is 55 inches, and actually we currently sell one that’s 82 inches. In fact, it is exactly the same interface and model of operation that you have on any standard Windows 8 tablet.
We’re beginning to find that when you take this model — and everybody has sort of focused so much on mobility, the cell phone and the small tablet. And while that’s important and powerful, our belief is that largely through the cost reductions that have been driven by television sales around the world, and ultimately even the emergence of projectors like the one that drives that thing to well over 100 inches of display capability, these things have just continued to plummet in price.
So even without assuming there’s any radical breakthroughs in technology, we find it completely plausible to believe that you should think that there will be a continuum of screen sizes that will be ultimately touch- and gesture-enabled, and they’ll range from the very small, probably as small as your watch perhaps, up to the very large certainly, which will be wall-sized in your office.
So inside Microsoft we’ve actually started to deploy these things, not completely broadly yet because we’re still sort of driving the cost out of them, but those who have them, including our CEO, find that having it in your office is sort of transformational in the way that you think about doing meetings and how you interact with other people.
In a university environment, certainly my experience was you’d sit and talk to a lot of people as you’re trying to work on a problem. It’s almost hard to do it without either having a shared piece of paper or frequently a chalkboard or a whiteboard. And there’s just something powerful about that ability to have this kind of shared experience, and we share that dream in a powerful way. So by being able to put these things almost on any surface in the future, you start to really think of how a computer-mediated form of that type of interaction becomes possible.
The other thing that’s happening is as computers just get increasingly potent, we start to think about doing applications in radically different ways.
So I’m going to show you one of my sort of personal favorites. I personally sponsored the beginning of this work in our sort of research and incubation area I guess six or seven years ago, and it now actually ships as a product called Fresh Paint.
I don’t know if any of you have seen this, but if you look at it, I mean, these are things that you can now buy as little sort of like paint books like you buy for your kids.
Let me just first show you a little bit about what’s interesting about this. You can say, oh, hey, I’ve seen Paint on my computer, but you’ve never really seen Paint probably like this, because what this thing actually is, is a full-on physics simulation of painting.
So the paper, including its texture, it has a 3-D model of the paper. The different art mediums, in this case oil paint, crayons and pencil, are actually all modeled physically, and the interaction, including of the brushes, which are kind of indicated here, including the movement of the bristles and the twisting of them, are all done with physics simulations.
And so in this case if I pick any color and any brush, and change the line width or whatever it is, then what I’ve got is an ability to just paint with my finger or with a brush. So you have a capacitive brush with real bristles. More and more, there’s this direct coupling between sort of the virtual world and the physical world, and we’re able to model more and more of it.
You can say, well, that’s nice, dummy, but that’s the wrong color for Yale and we don’t make the A that way. (Laughter.)
So I say, okay, well, what else can we do to help me since I’m not a good artist. I can say, well, in here somebody actually helped me out and they took a Yale logo and stuck it in here. We’ll say we’re going to use this nice fat brush because I’ll be in a hurry.
I can show you one other thing. This is simulating blue oil paint. Now, when this thing was done, this started and see the outline was actually in black. Oil paint dries slowly. So if I actually start to paint in this thing, you’ll see as I go across the lines the blue paint mixes with the black paint because it wasn’t dry.
But if I say, well, I didn’t really want that, I’m in a hurry, the nice thing about digital oil paint is I have this little fan, and if I turn the fan on, it dries instantly. (Laughter.) So now I can paint and no mixing of the blue and the black.
This thing, in the app store for Windows now, this is I think either the first or the second most popular application. And it really shows how when you give a high-quality tool to people, whether it’s for the kids or it turns out for people who are professional artists who are trying to figure out what is the future of the art business — so at Yale I assume everybody is doing this already — but what’s it going to be when you want to preserve the classical concept of oil-based painting and yet you want to at least practice or learn or be able to have these kind of digital tools to do it. The really, really wonderful part about this is not only can you dry the paint, but if you make a mistake, you can take it off.
So if you think about this for training kids, it’s unbelievable because there’s no mess. They develop a fearlessness, which of course is characteristic of young people, but it’s really hard in art, because you make a mistake, you trash the whole thing, you crumple it up and start over again. Here the rate at which people can learn turns out to be quite accelerated because you basically can correct errors instantaneously.
So now we see all kinds of these interesting things happening where, for example, we went and we wanted to create a thing. So we actually went to Disney and said, hey, we want to license old Nemo so that kids paint Nemo.
And the thing that was remarkable, they said, sure, you know, we’re happy to do that. And then we showed this thing to their animators, and the animators became so intrigued with the whole thing, that they ended up creating a whole custom set of Nemo characters just specifically to be able to be painted on this medium. And I just think we’re going to see a lot more of that very clever kind of interaction in the future, and who knows where it will end up.
One of the more challenging things that we intend to do but it’s not in this product, but we think we know how to do it now is to do watercolors, including all the diffusion and mixing that happens in the papers.
So when you realize it, the amount of computing that is going on in order to be able to do this is really stunning, and yet it’s going to be available to anybody on a tablet for a couple of hundred bucks, and I think that that’s going to be quite remarkable.
The next thing I want to talk a little bit about is big data. You hear a lot about it. You can hear different definitions of what it might mean. Some people talk about the three Vs, you know, variety, velocity and volume as somehow indicators that your data is big.
In practical terms for us, it really has meant you’ve got so much data that you can’t get a sort of practical insight out of it in any reasonable amount of time from even the largest single machine.
And so we’ve been building these really super-scale facilities, in many cases just to allow us to build and operate these Internet-scale services. And what’s happening more and more is we’re taking the architectures of those machines and adjusting them in ways that allow us to not only ingest but then to process these really staggering amounts of data that are being created.
And the data is flowing from all different sources. It’s not traditionally just the stuff that humans input directly or as a byproduct of doing transactions. We’re living in a sensor-driven world. Every cell phone you carry around, maybe some of you have — I have the FitBit, which basically is a sensor pack in your pocket that keeps track of your movement, stair climbing, activity levels, sleep. So more and more, I think people are going to live an instrumented life, and that will all get plugged into this environment.
There’s all the classical big data sets like things that Rick and I worked on in the healthcare area for President Obama, and as these come together our ability to learn new things from them represent a real chance for breakthroughs in both cost and efficacy of our healthcare.
But there’s two different ways that we’ve thought about how to get value out of this big data, and that we employed initially inside Microsoft for quite a few years, and increasingly will be made available as products for other people to use, too.
One of these is sort of to build on the continuing trend in visualization, which depends on using some type of graphical system typically to couple human insight into the problem. That’s a powerful tool, and we think in the big data world the ability to do that can be made even better.
The other one, and I’ll talk about that more in a minute, is machine learning. This is something that many of us in computer science have pursued for decades. But I think in the last few years, we’ve really made some breakthroughs, in part because of the scale of the machines that we have to operate on, and ironically in part because of the volume of the data sets that we have to operate on with those big machines.
And we’re actually starting to find that many of the problems that we had in some classes of machine learning were really just the fact that we were sort of on the right path a long time ago and gave up on it too early simply because we didn’t have enough data to feed it and a big enough machine to crunch it. And when you do, you find out that you can actually solve some of these problems.
So the next thing I want to show you that came out of Microsoft Research very recently, and it’s called internally Sketch Insight. And when you start to make the assumption that you have tablets, big and small, on all kinds of surfaces and you want to use them for this type of collaborative discussion or presentation, then you want the same kind of naturalness that you get out of using a chalkboard. But again, why shouldn’t you get more help from the computer?
So here this one uses — this has a pen. So unlike just a normal touch thing that many people use, these systems are quite different in that they also support a very high-resolution pen. And so it really is like having a pen or pencil that you can write on these things with. It’s not finger-width drawing.
And so this one, we’ve loaded a few different data sets in, and the one I’m going to demonstrate has some population dynamics in terms of global population.
So it’s been taught a few gestures. So the first gesture, hopefully, is this one. And as soon as it recognizes it, it sort of straightens it out and says, okay, what that is, is he wants to have a graph. To make it a little easier, I’ll stretch this out for now and then move it over.
And in this graph, I want this to be population versus year. So if I start to write population here, it says, oh, okay, well, I only know a couple things. In this case, one thing that’s in the data set, it says, do you want population? So yeah, I don’t have to write the rest of it. I’ll put it in there for you.
And then the same thing goes for year. So if I go down here, and I start to type year, it recognizes that and says, okay, yeah, year, I want year.
So then it goes and extracts all the data and hooks it up.
And you can say, wow, gee, I don’t know, do I like that data or not? Well, maybe I want to have a bar graph. So I can say, make the thing into a bar graph. So it will redo that.
I can say, no, I really think I’d rather have a line graph. So I’ll just say, you know, maybe I can make a line graph out of it.
And I can say, I don’t want all those individual points. What I want is sum. So it sums them all up.
And what I really want to know is not just the global population, which from 1960 to 2010 went from 3 billion to 7 billion, I want to know what it looks like by continent. So yeah, it knows about continent. So it splits it all out and puts it in continent form.
But then a few other interesting things. So I’ll shrink this back here a little bit. It says, okay, I also know how to make other graphs. So how about a pie chart, which is by year versus population. And now it turns out that these all get linked. So these turn out to become controls for the data on this side. So if I turn off the red one, it turns off 1960, and if I turn it on. So now I’ve got it, suddenly made myself an interactive tool that allows me to explore different parts.
Now, it knows how to make other graphs, too. And so it has this inverted gesture. It knows how to do geo-plotting and maps.
So now I’ve got a map, and this map is color coded to match that thing, and it also becomes a vehicle for controlling this. So if I say, hey, I really don’t want to look at all this data, I only want to look at Eurasia and North America. So I turn all them off, and then I only get that data.
So suddenly you’re doing all this interactively. You’ve got this huge data set behind you, and yet you come up with this very, very simple way to make the presentation. You can then record all these things, you can play them back.
And so the whole idea of how you make a presentation, how you give a presentation, how you collaborate all will evolve over time.
So let me go on and talk a little bit more about the way that we think that this collaborative type of work is going to go.
We see more and more of these business-intelligence-type tools that we want to build. And while this gave you some assistance in trying to control sort of the drawing and the linking of these elements, we also wanted to try to build more powerful tools that allowed you to sort of see into super-complex data in ways that would be really hard to do otherwise.
So the next one I’m going to show you is a thing called Sand Dance. That’s a codename inside Microsoft Research. And what this is, is it’s pretty much the entire U.S. Census data. It turns out it’s pretty much easily stored in a single computer now. And it turns out if you just said, well, okay, there’s the Census data, it doesn’t really make any sense as just a big blob of all kinds of data.
So now you can start to say, well, I want to assign different colors to different things. So here’s all the different things that we extracted from the Census. In this case, what I’m going to do is I want to color code income from low to high, green to red. So now it goes through this thing, and it starts to color code each of these Census type things. This is Census data at I think the county level. And you can at least now start to see if you look carefully at sort of what the general income distribution is, but you can’t really tell, well, where is it.
But the thing has different modes. So if I actually say what I want you to do now is go look for latitude/longitude data and plot the thing against lat and long. So now this isn’t actually a map that got filled in, it’s just the counties plotted with their latitude and longitude. And, of course, what does it make? It makes a thing that looks like the United States, because it’s all the counties of the United States.
Now, you can sort of also see where the population densities are, and you can see that basically as you get into the major cities wherever they are in the country, on the coast, in the middle, that it goes from red to the middle range. And you get up here into New York and New Haven and a few other places, Seattle and San Francisco, Los Angeles, a few in the Midwest, Chicago, you start to see a few sprinkled places get all the way up to green there where the highest incomes are.
Then you can do all kinds of interesting things. You can say, you know, let’s start to examine some of these things. So instead of lat/long, let’s plot education levels on this axis, and let’s plot income levels on this axis. And they’re still color coded. Education, and I wanted unemployment is what I wanted. That’s different.
So now what you see is sort of education in the county in terms of average amount of education, versus the unemployment rates. So you begin to ask interesting questions.
So now you can ask questions by just graphically saying I’m going to put a box around these three because they’re sort of outliers in terms of how high the unemployment is and very low, and I can say, well, who are those?
It turns out that those three things are Apache County in Arizona, Apache County in Arizona, a different one, and Coconino County in Arizona. So basically three of the Indian reservations in the United States happen to have the lowest average education and the highest average unemployment in the whole country.
You can go at the other end of the spectrum and basically pick this one and say, okay, what’s that guy? And you’ll all be happy to know that’s Stanford. (Laughter.) So what’s interesting is in Stanford County, they have a lot of education, but their unemployment level is pretty low. But they also don’t have a green dot for income.
In any case, there’s things that you could discover in this data by just crawling around in it and organizing it and sorting it in these ways that you might never know.
If I went back and looked at that particular data in the geographical form, sure enough, it shows up over here as Stanford.
Now, another thing that we think is happening a lot, and if you were here before we started and watched one of the videos, we increasingly think that many of the applications of these devices, whether they’re tablets or phones or these wall-sized or conventional computers, is no longer going to be sort of consumed one device at a time.
We’ve already started to ship this technology we call SmartGlass where your phone or your tablets become sort of an in-your-lap control mechanism for your entertainment experience. You can play games that way, you can browse the Internet on your television, and the keyboard and mouse manipulation on your TV screen are just done through the device that’s in your hand.
But increasingly we expect that the same thing will be done in the business environment. So if you’re collaborating with somebody, and you’re looking at all this data, then it turns out that a colleague might be sitting and having a tablet in their hand in your office, or they could be sitting across the country, and they could be looking at either the same data or they can be looking at a slightly different one.
So on this one, let’s just say I’m interested in knowing where sort of the middle income people are that are of median age. So I can select that by just in this different graph, which was plotting median age against income. I highlight a part of that. And on my screen in my office what lights up is the same yellow dots but in the configuration I’m looking at. And you can say, well, it turns out that the people who make reasonably high incomes and are fairly young seem to be clustered around the big cities, which of course you’d expect. But your ability to explore in this way and to do it collaboratively in real time is increasingly going to happen more and more.
The video that we showed during the walk-in part, for example, of the choir at TED, which was part Skype and part physical presence, I think all of these things are just examples that more and more the geographic barriers are just going to disappear, and whether you’re physically in the same place or in different places, that the idea of real time collaboration, communication and even entertainment will increasingly come to be commonplace.
I mentioned earlier that if I think of these things as sort of the analytical tools based on visualization and where I’m trying to get the human intellect coupled into the problem as directly as I can, that’s one way to do it.
But there are a lot of things buried in the data where, in fact, the people don’t have an idea. We’d like to have an answer, but we don’t really know. And that’s where machine learning really is going to be a powerful tool.
Today, a lot of that is based on supervised learning where it takes some sort of high priests to manipulate the machine learning capability, but increasingly we’re being able to improve that, and the Holy Grail being a lot of ultimately unsupervised learning. But even along the way making toolkits that allow people to have this type of high-scale machine learning facility applied to these huge data sets, many huge data sets is getting better and better.
So let me show you one other way that we’re starting to think about coupling together this idea of searching and learning in order to create other ways to make discoveries and do navigation.
So what we have again is another prototype, and what this thing is, it’s called Doc Map. And if you look at these topics, you should find all of them familiar to somebody here.
What we ingested was, I don’t know, some large number of years’ worth of all the research papers published by the computer science department at Yale.
And what this system does is it actually reads all the papers, and it categorizes them. It tries to sort of understand them and cluster them around words. And based on the side of the words, you get some sense of the frequency with which that topic appeared in this whole space. For any word that’s selected, it basically gives you a list.
So think of this as like a search engine, but this search engine figures out what the interesting words are you want to search on, and then it shows you all of the words that are adjacent to those words. So you start to find, you discover topical relationships that you might have never known about.
So this works where you can actually scan and do things — maybe. There it is. It just takes a little time. And if I pick one of these words — well, we picked a couple, so I wouldn’t have to type too much. So I’ll pick vision and put it in, and this thing will go grind around, and it will find everything that had vision in it. And there’s a few clusters of them. So here’s one around visualization, one around shape, other things.
And if I basically take these things and scroll into them, I could look, and it will start to add more and more granular detail. I can pick any one of these capabilities, let’s just say transformational, and it goes out, and it finds the papers that were in that cluster, and it gives me a list of them.
So here, for example, is one, “Fast Slant Stack: A Notion of Radon Transformed for Data in a Cartesian Grid, Which is Rapidly Computable, Algorithmically Exact, Geometrically Faithful and Vertical.” That’s quite a title. (Laughter.)
But if I actually touch this, it says, okay, you want me to get you the paper, and I say, yeah, just save it for me, and it does it. So now I basically discovered a paper that I didn’t know existed in a field that I may have an interest in, and I’m able to go get it.
Now, as I said, we really expect these devices to come in many different shapes and sizes and flavors. This one, which was built by the guys who do this stuff, is special in that it also has things that make it change its shape.
But what it does is increasingly you may not buy one that will actually either stand up or lay down by itself, although you clearly can do that. I think people will find that they’ll replace their whiteboards with these things, and increasingly they’ll replace their desk with these things. What you want is an environment where this whole idea of how you think about operating wants again to be very natural, that the same things that you already know how to do with your hands and a pen, you want them to apply in exactly the same way to the manipulation or review of documents in this environment.
So if I take this thing and drag it out here from where it was saved, and if I blow it up, sure enough, there’s that paper. I can read it, just flick through it as I turn the pages of a book. I could if I wanted to have a different document, let’s just take another one and pull it out over here, and I could move this guy around and put him up there, make it bigger. “Active Vision System for a Social Robot.” Let’s just say I’ll pick yellow to highlight this.
Now I’ve got my pen. And, of course, if you think about how you would normally work, what people don’t really realize unless you study it is if I give you a piece of paper, and I ask you to write something, let’s say you’re right-handed, the first thing you do is you put your left hand down on the paper and then you start to write. Someday go try to write without using your left hand. You’ll find it incredibly difficult, that people when they write are almost continuously changing the relationship of the paper to themselves and to the thing that they’re writing in order to facilitate it. So writing is really a two-hand thing for most people, not a one-hand thing.
But, of course, computers in the past haven’t really facilitated that because if you had a touch interface it got confused. Well, was he touching or writing? So in this system it actually can distinguish completely separately the touch of your hand from the touch of the pen. And in addition, it’s not just one touch, it’s arbitrary touch.
So it turns out now the process if I want to review this document, I can turn the pages, and I say, well, it’s really kind of hard to annotate it that way. But if I actually just put my hand down, this paper works just like real paper. So I can write with two hands. I could annotate this. I could flip it up; I could go to the next page. I can make the whole thing bigger; I can make the whole thing smaller.
All of the things that you would naturally do by just the movement of your left hand to control the relationship of the paper to your pen and to your eye, all that happens completely naturally.
So much like the Fresh Paint was an attempt to simplify the physics of painting, more and more we’re trying to emulate the physics of dealing with pen and paper. It’s been around a while for a good reason, and the question is just because you can say everything should be on a screen doesn’t mean you should give up the long-developed capabilities and benefits that people have from operating in this environment.
You can do other interesting things. Here I can pull in a map. I can basically make the map big. With one hand, I can twist it around like I would if I was — let’s say I’m looking here for New Haven. So I just kind of zoom in. Okay, here comes New Haven. And I can just keep moving it around with one hand until I find it, and say, okay, here we are, Yale University.
So the ability to take this map in real time over the Internet, suck it up, manipulate it, twist it around, annotate on it, clip parts out of it and send it to other people, it’s basically better than paper, because I haven’t given up almost anything that was a problem in the ability to manipulate the thing, but I’ve gained the ability to pan and zoom, to rotate it, to clip things out of it, and do things that are just impossible with paper.
So I think that more and more these are examples of what we call natural user interfaces. It’s not simply the idea, can we emulate speech and vision and haptic kinds of things? It’s, can you then use those to build new models of interaction for applications that make it just completely natural for people to use these computer systems?
So we’re very excited about that. And while the classical graphical interface isn’t going to disappear, I think we’re kind of in a stage now which has I’ll say frequently occurred when technology shifts happen. People remind me — I’m old, but I’m not quite this old — that when the motion picture camera was invented, the first thing they did with it was they filmed plays. And it was actually only years later that they realized that they didn’t have to film the whole play at once, they could actually film chunks of it and then splice the films together.
And so what we know today is the major motion picture wasn’t at all what anybody had in mind when they created the movie camera. And I think we’re kind of in that stage now in this transition between graphical interfaces and the movement to natural interaction, that when you get to this point, you really want to stand back and ask yourself the question: How do I just get the computer to help me more?
I always tell people one of my goals for the work at Microsoft is I say, okay, how should a farmer in rural India be able to use this machine learning capability? He should be able to pick up his phone and say, “What day should I fertilize?” And the thing comes back and says, “Thursday.” All right?
So, in essence, he asked the question that is exactly what he would have asked to a human colleague who knew the context, was an expert in the domain, and knew everything about what the guy was already doing. And that’s the power of this big data/machine learning combination is it allows us to essentially continually learn about context.
For those of you who have people who work with you, whether they’re assistants, personal assistants, people who work in your lab with you, you realize those people get more valuable to you the longer you work with them — for the most part. And the reason is: They have context.
Right now, your computers, by and large, are not getting more valuable the longer you work with them because it isn’t essentially aggregating the context that it has.
But with computers so intimately involved in our work, our play, our shopping, virtually everything that we do, the amount of context that the system is capable of inhaling and managing is really stunning.
That brings with it a whole set of new challenges both on the policy and regulatory front, as well as the technology front, for example in the privacy space.
When Rick and I worked on the health IT report, actually one part was the invention of the mechanism that would solve the problem of combining it, the data. And the happy thing was that it also turned out to be a mechanism that gave you a way to have the technology manage the usage-based access to the data. Hence, an attempt to solve the privacy problem.
And I think this is one of the great challenges for the computer science and computer business community these days is to try to figure out how do you design these systems such that the value of the big data isn’t forced to be lost simply because people are paranoid about certain bad things could happen. Many more good things can happen, we just have to be sure that we can reel in the bad ones ex post facto, and I think it’s completely clear that we have the technical means to do that. It’s just that people haven’t been sort of forced or made — incentivized to go build those things yet. But I think that’ll come.
So let me move on beyond this type of, I’ll say, analytics and discovery where the human is driving, a little bit more to the kind of things that we’re doing where the machine is driving the discovery process.
You know, if you look at this slide, it’s just six examples I selected where Microsoft inside the company for as long as — as far back as at least eight years I can remember — has been using machine learning in order to build sort of surprising capabilities into products. Speech recognition in phones and other devices for quite a long time.
True skill. Many people in the room probably have at some point played an Xbox game. And if you play multi-party versions, particularly over the network, and you go to the game lobby to find somebody to play with, it turns out one of the challenges we discovered very early on is when you have so many people, if you use just random assignment of players, you get a very unappetizing experience for all the people because the odds that you have people that are reasonably well matched is not that high. The quality of your gameplay can vary in so many dimensions that there’s no simple metric that says, hey, you know, is this guy a good match for that guy?
So we started to use machine learning to figure out out of this huge multidimensional space which things produced satisfying competition. You know, otherwise, it’s sort of like putting a pro against an amateur. The amateur says, “Why do I even try?” And the pro says, “Why am I wasting my time playing this idiot?” And you don’t want that. So you really want all these competitions to be pretty much even up, and that system does that.
The whole search and ad business is all built on machine learning. You know, knowing what ad to deliver, when to deliver it, how to scan the Web, what to keep and what not to keep is a machine learning problem.
One of the things that we actually just shipped that’s been a dream for many of us for a long time is the intelligent inbox. In Office 365, we’ve just started to turn on this capability. It observes how you handle all your mail for several thousand pieces of mail. And based on what it learns, it begins to automatically manage your inbox for you. It clusters things together; it boosts things it thinks you would assume are higher priority up. And it even gets to the point where it can start to make recommended disposition of those documents.
You know, you sit down and the thing sees you and says, I looked at your mail, and all these, I think you’re going to delete, should I? (Laughter.) All right? And you just say yes, and they all go away.
And these it says, I know you’re going to respond to these, and I’ve drafted some basic responses for you, do you want to send them?
So more and more, the computer just starts to say when it can pattern match and say I know that this person is important and this one less, you know, it starts to help you figure that out.
The whole concept of gesture recognition in Kinect was all based on machine learning. And in Bing Maps, traffic prediction. If you go and ask for a route, it says do you want to know now what the traffic is? Telling people, here is the route only based on the traffic that we currently have is not all that interesting. What’s a lot more interesting is to predict over the time it takes them to drive the route, what will the traffic be at those points?
So by basically taking all the historical factors and all the exogenous things like weather and ball games and everything else, and then monitoring accident reports, the thing can actually figure out a forecast of what congestion will be like at every point in the route before you drive it. And then it’ll adjust the route dynamically.
One of the great things that’s been a holy grail for us at Microsoft, and many people in the computer science and research community for years, has been the quest to do real-time speech-to-speech translation.
At Microsoft, one of the ways that we try to aggregate our research results into solving larger-scale, sort of composite, real-world problems is to give ourselves these kinds of objectives.
And so about two years ago, Rick Rashid, who is shown in the next slide, he’s the guy who founded and actually runs Microsoft Research day by day. He said, hey, you know, by the end of 2012, I want to go to China, give a speech, and I want to be able to speak in my normal voice in English, and I want the loudspeaker system to project Chinese in essentially real time in my own voice. (Laughter.)
And so we actually did that. We think that’s sort of a real milestone achievement in the field. And it was all based on some real breakthroughs in the last few years in what’s called deep neural networks and the use of these very-high-scale machines and very large data sets to be able to train machines to be able to do this.
So I’m going to show you a one-minute clip that was from that speech in China last fall, but you get an idea. What you’ll see happening, and I’ll explain it now so you can watch it, is he talks, it translates his speech into English text, it converts the English text to Chinese text, it does a text-to-speech synthesis in Chinese, and then it plays that through a vocoder model that we developed from his voice. So it’s actually his vocal tract — so just like Fresh Paint was a simulation of painting, this is a simulation of the physics of his vocal tract. And we use that so that the thing — it’s a language he doesn’t speak, but it’s his voice in another language.
So we’ll play that for one minute.
(Rick Rashid Video Segment.)
CRAIG MUNDIE: So, what do you think? (Applause.)
So this turns out to be, again, important. If you go back to the walk-in video of the TED Choir, there it was music, and you didn’t have a language problem. And you had people from all over the world who could collaborate with each other.
But by and large, language is a problem and will obviously remain so for a long period of time if you want to have a sort of global communication and collaboration, particularly in real time.
And so the idea that — and I’m going to ask Ethan to come up, and we’ll move into this next demo a little bit. To think about what it’s like, whether you’re sitting at a desk like this where he is or I am and there’s just a tablet sitting here running Skype. It’s on a desk that looks like it has a desk lamp. This is a very special desk lamp in that they both project and see at the same time. They don’t just light. We’ll show you in just a second.
But if you think about it, if I sit down here today, Ethan and I both speak English. And I could ask him for some help, or he could ask me for some help. But, again, we wanted the thing to be very natural.
So this is a research thing called IllumiShare. And what it does is it basically brings the idea of natural drawing, reading, writing, examination of objects at a distance to an environment where people don’t have to know anything to be able to do that.
So you’re looking at essentially what I’m seeing. So he’s writing in real time with a regular pen. It shows up on paper on his side, and it shows up projected on my side.
So we both appear to see exactly the same thing. I’ll just use green, and I’ll circle this. In real time, he would see that on the other side. (Laughter.)
We can give this to children at literally 3 years old, and they sit down, and they use it immediately. And the idea that the person they’re collaborating with is somewhere else, they see them on the screen, they draw, it works, what’s the problem? But you can take things and say, “Hey, how about helping with my math?” So he can take his triangle, and he can draw something there. He can put another line there.
And I say, look, all right, well, I can finish this drawing. And I could be getting some help — you could think of doing this — Rick and I talk a lot about MOOCs, what’s their impact on education at every level around the world? But if you have this as a vehicle where you’re taking online courses, and you’ve got social interaction, and you can do this kind of help me figure this out.
I can say, “Hey, this looks like it’s 90 degrees, this one who knows? I don’t know what this thing is.” But he could measure it for me and say, “Oh, 39 degrees.” So I can say, “All right, well, what do you think this one is?” And he can explain to me that they have to add up to 180 and it all works. So there’s a lot of interesting stuff.
Now, these discussions, these interactions are all being recorded in the cloud. So to some extent (laughter) — close enough. Now, if I take this paper and throw it away, you could see I only had half of the conversation, and he had half of the conversation. And what we really want to do is be able to each, ultimately, walk away thinking we had the whole conversation.
So we’ll do one more that shows how this might work. So because we’re more and more focused on machine vision, another big use of machine learning, you know, here’s a set of objects. And as the system begins to recognize these objects, you can do more interesting things.
So here what this is, it’s a gadgeteer board. It’s a thing where people who want to prototype small electronic gadgets, they buy these things.
So he may have said to me, “Hey, look, I’ve got this thing, I have no idea really how to hook it up.” And I’d say, “Yeah, well, I played with this, and I can tell you that these five things here are all where you plug in the USB connectors. And this set of things down here is where you plug in the LEDs. And this little button right here is the reset button.”
And you can say, okay, thanks, that’s really useful. But he says, “Now I’ve got to go away and use it.”
So if I walk away and crumple this thing up and he walks away, the next time he comes and sits down at a system, the thing basically can look at the physical object, hopefully will recognize that it’s seen it before and basically brings back the annotations. And it’ll register them no matter where he puts it and how he moves it around.
So, basically, it is as if he had the whole conversation even though part of it was virtual and part of it was physical. Thanks, Ethan. (Applause.)
So this is one of the really popular things. It turns out, you can take this lamp and turn it up and shine it on the wall, same thing works. Okay? So whether you want to do whiteboards or desktops. We think that, ultimately, the cost of this shouldn’t be substantially more than the cost of a desk lamp. If you look at how small we can make cameras and how cheap LED lighting and these little projectors are, that’s all we assemble this out of. And then a bunch of clever software algorithms and interfacing technology. Because it turns out, what you have to do is interleave the frames. So one frame you project, then you blank, then you look. And by interleaving these things at a high rate, it appears that you’re doing both at the same time even though it’s in a high-ambient-light kind of environment.
So this is the way that we’re making computers more like us. And with it, I think we’re making these things much more powerful, much more available to people, and I think easier to use and get real benefit from.
So let me just close by showing you just a short video that just came from a — every year Microsoft Research holds a thing we call TechFest. And it’s sort of our internal market for ideas that we generate in research. There are, I will say, nominally 1,000 people in the core research group and about 35- or 40,000 people in the engineering groups that make products.
The real question is always, when you do basic science, how do you get the tech transferred in an efficient way? And it’s a challenge. Even in one company, it’s a challenge.
So we created TechFest a few years ago, which is like a trade fair, and all the people from our labs in the 10 places around the world come, and we set this big thing up for three days, and they come, and they just walk around and see all these things.
So a lot of the talk that I gave you today were things I selected. It turns out, each year we down-select about 200 of the best things that come out of research. What I showed you today was about four or five. So I’m going to show you a video that’s just eye candy that flashes by, and you’ll see the incredible range of things that is ongoing.
I offer that to you to help you understand — so many people get jaded. They say, “Oh, computers, I understand it all, how is it going to change? I mean, we’ve seen everything.” And our view is: You haven’t seen anything yet. The rate at which this stuff is changing is very high and the amount of benefit that society will get from it is amazing. So let’s close with the last video.
(Video Segment.)
CRAIG MUNDIE: So just another dozen or two of the things that are going on. It’s an incredible privilege to be able to be affiliated with this group and company, and I hope you get a little appreciation for the range of things that we do.
END