Craig Mundie: UC Berkeley

CRAIG MUNDIE: Thank you very much. Thanks, Kevin. Good afternoon to everybody.

Let me just tell you what we’re going to do in the next hour or so, so you understand the logistics, and then I’ll get into the talk. I’m going to talk for probably 40 minutes or 45 minutes or so, and then we’re going to actually — we have some door prizes that are probably worth hanging around to get. You have to be present to win. And so we’re going to do that right after I talk, because I know the class technically is supposed to end around 6:00 or so.

And so after we give out the door prizes anybody who has to leave then or at any point after that is welcome to do so as required, but then I’m going to have a Q&A session with the whole audience.

And then when the whole thing is over, I’ll stick around for a while and we’ll have all the big toys here on the stage, and if you want to come down and talk or test drive any of this stuff, you’ll be welcome to do that, too. So, we’ll be here probably at least until 6:30 to do all those things.

So, let me start and talk about how things are evolving at Microsoft. You know, a couple of years ago, we introduced a new line of phones, and with those we started to create a new model of a user interface. So, this week was a big week for the company in that we yesterday in San Francisco announced Windows Phone 8.

The significance of these devices is that we’ve been on a path at Microsoft to try to unify from an architectural point of view, not only at the low levels of the machine and operating system but at a higher conceptual level how people will interact with them.

So, on these phones we created these smart based interfaces, and have a lot of focus on customization. You can essentially organize the front of the phone any way you want.

And then last week, of course, was a super big week. Not only did we introduce Windows 8, but we introduced Surface, which is the first time the company’s actually designed and built a computer of its own; has these pretty cool snap keyboards, and as you’ll see in this environment it has the same big sort of touch-first kind of interface.

What’s different about our approach to this compared to Apple, for example, or even some of the Android things is that we’ve decided that it should be valuable to people to be able to have both a classical view of a PC environment for running things that aren’t really oriented, and to allow the same interface and operating system to scale from the very small like the phone all the way to the very big. And so we’ve done that, and we feel pretty good about that.

So, what I’ve brought today was really the world’s largest tablet. This is the 82-inch tablet. (Laughter.) There it is. And what it is, is it literally is an 82-inch tablet. Everybody can talk about whether they’ve got the seven-inch or the nine or the 10; well, I’ve got the 82. (Laughter.)

What’s interesting about this is this interface scales in a very nice way. It’s all touchenabled. You can basically shrink these things to any environment you want.

The other thing that’s interesting about this device that’s really kind of first in the world is this actually supports a pen, a resolution pen, as well as touch, and it does it with a common underlying technology.

People have had touch devices obviously for a while. We’ve been building enabled devices since 2000. But the real trick is to try to be able to do resolution pen and touch at 82 inches. It turns out it’s a physics problem that’s quite challenging, but, in fact, we’ve been able to do that pretty effectively.

And so to give you an idea just briefly of what some of these things are like, there’s a lot of built-in capabilities now. So, for example, we have a news reading feature that’s all built in, and of course works the same way; everything just scrolls with your finger. And every one of these things in a sense is like an active tile; you touch it, and you get to drill into the story. And so more and more of the things that Microsoft has built all have this attribute.

The way these interfaces work is they all essentially come in from the edges. So, if I want to go back to the Start screen I can hit that button.

Another one, for example — let’s see if I see it. Well, just the weather, for example. One of the things that we wanted to do also is recognize that while the complexity, if you will, and value that was in the conventional multi-window interface where Windows could be any size and moved around and occluded, you know, has been great for a desktop kind of environmentin a mobile environment or one where you want to have a touch-first interface the flexibility of that turns out to be a liability.

So, in this model we’ve actually come up with essentially two ways to think about applications and screens. You can just flick them out from the side and they’re all running underneath, and desktop applications would be exactly the same.

If, for example, I’m reading the news and I decide I wanted to track the weather — sorry, I did that wrong. This thing has essentially two windows. There’s one that’s a quarter of the screen and one that then becomes three-quarters and everything is designed to fit into these two modalities.

So, a minute ago you saw the weather app was essentially designed for the full screen experience. The way we’ve specified these, both the ones we’ve written and other people, you now have the ability to say each app has two modes, the little panel mode and the big panel model, and they’re both interactive. You can essentially just shift from one to the other by sliding the bar back and forth, and they’ll both adjust. So, here the news would just now be converted into the list of little tiles that you can essentially scroll through and watch at the same time.

And so we think that this is going to be quite interesting, and one of the areas where this is particularly appropriate now is we’ve just released the Skype app, both for the phone, the table and the giant tablet, and you can put Skype, for example, in here as one of the little panel apps. So, you have continuous competitively, the ability to actually maintain a conversation with people, and still operate all of the classical applications.

So, let me show you some other cool things that are being done now. As computers have gotten a lot more powerful, we have taken the interesting things that you wouldn’t have thought were particularly complicated and said, what does it mean to refresh them in this environment where you have touch and a lot of computational capability

And this application — let me just get rid of this guy — is called Fresh Paint, and of course everybody probably played at some point in their life with one of the little painting programs, and all you were doing was essentially turning pixels on and off in some color. And what we wanted to do was say, you know, could we actually build a full physics model of paint and painting or art, and, in fact, we’ve done that. So, underpinning this is essentially a computationally intensive model.

So, if I look at a file, those are my paintings. At the launch the other day for Windows we actually had one of these giant tablets set up, and what you have is essentially an easel. And if I touch away from this, so this is a painting that was painted on this tablet a giant screen environment by an artist in the Bellevue store in Washington the other day.

What’s interesting is that this thing today supports oil paint, and the paint at the same time that it would if it was real paint. And you can erase things.

I’ve got this little fan down here. If you’re an oil painter and you’re in a hurry, you can basically turn the fan on and it will dry faster. (Laughter.) Really convenient.

But the way the interface works is you have crayons, you have brushes. In fact, I brought with me a capacitive brush. And because the thing is capacitive and pressure sensitive, we actually can give artists real brushes to paint with, and the brushes can essentially be made different sizes, and you can do different colors.

So, if I wanted to create my own, for example, I’m not much of an artist, now I’ve got a completely new canvas. I could pick different paper colors. We actually modeled the texture of the surface of the paper so as the brush or the pencil goes across it, it actually behaves the same way as it would in — it’s a full physics simulation. So, if I wanted to pick a color, if I was going to I guess pick Cal colors I’d want something blue-like here, and I change to blue. If I wanted to paint, I can basically write, “Cal.” (Applause.)

What’s cooler is I’ll say, hey about this. I want to get the other color, which is sort of like that I guess, and I’ll put that on a really small brush here of this style, and I’ll keep it in the oil paint. And if I was really good now I could basically outline Cal. But if I happen to touch it, you’ll see that it actually blends the ink together.

And so this really is a physics simulation of oil paint on paper. I’m personally infatuated with this kind of thing, because I’ve given this to my two-old grandkids, and they can essentially start to learn. And it’s way cleaner than giving them real oil paints. (Laughter.)

Let me go on and show you other things that we’re doing.

For us we expect more and more this world to be a multi-device world where the experiences that people want to engineer operate in this way.

For us, of course, business productivity applications are also equally important. So, if I start the OneNote notebook, which many of you may be familiar with, you know, I’ve now got essentially a version of it here that zooms and scales and does all the appropriate things just as you would — you know, pinch and zoom all work with two hands or two fingers or whatever it might be, and I can essentially scroll around in this.

But if I wanted to pick part of this and zoom in on it, and I might want — this is actually the history of essentially the evolution of touch. This display was built by a company called Perceptive Pixel, which Microsoft acquired this year, and they’ve really focused on this question of very natural interaction where people — if you think about it, this is a test you should give yourself, try to take a piece of paper out, put it on a flat surface, and write with just one hand. You’ll find out it’s incredibly difficult, that, in fact, people are continuously adjusting the paper with their other hand as they write. But today you can’t do that on a computer because most computers either only do touch or they might do pen, but they don’t do touch and pen together. And, in fact, there’s a lot of very challenging problems like palm recognition and forearm recognition on these sized screens that you have to figure out. But these guys have really focused on it, as have people at Microsoft for a decade.

But now with the pen you can say, well, back here in 2011 I had this capability. If I wanted to draw in a different color ink, I could make it thicker. I could say I wanted to highlight something. So, I really want to talk to you about, you know, our acquisition of Perceptive Pixel, and now we’re, for example, I’ll say super excited about the launch of the Surface.

Now, what’s interesting is if I go over here, I’ve got one of the regular Surface devices. This is actually a Samsung tablet. And if I actually start OneNote on this thing, it starts the same application, and with a little luck when I get down here I’ll find out that all the things that I wrote over there have been annotated on here.

Now, one thing I didn’t show you but this also has a recording feature, and if I’d have basically started and hit the recording button, everything I said would have been recorded. It would have been time synced to every keystroke as I drew on the thing, and when I came back over here I could play it again.

So, we think that this kind of thing will turn out to be quite important in the future, for example in an educational environment, because you all can actually not even be in the same place, you can see a lecture, you can see what happens, and it essentially will be synchronized to your own devices in the course of time.

Another place where we’re building these multi-screen experiences is a technology that Microsoft built called SmartGlass. Increasingly we find that people, particularly people I’ll say your age, live in a world where they have access to game consoles and interactive televisions, plus their tablets and their cell phones, and increasingly they don’t just want to have one experience that they could use serially on different devicesfor example the way we’ve all done email for a long timethey want many devices to have this sort of synchronized type of interaction, and increasingly they want to actually see the experience designed so that they can consume it across multiple devices at the same time. So, last May we introduced this SmartGlass concept, and have now built it into the tablets big and small, the phones, and the Xbox.

So, for example, if I go and start up the video player on this tablet, I can start to play a movie. Let’s see, I’ll play Snow White. And I watch part of it, I’ll just say resume this thing from where I was.

Now, it starts playing the movie, which you can see from the camera over my head here, but if I actually flip up the controls in this environment, I can say play it on the Xbox. So, it pauses the movie on this device, and that screen is now connected to an Xbox, and it basically moves it over there and resumes exactly at that point.

Now, what’s more interesting is that now the experience on the tablet has been changed such that it basically will get information related to the movie. So, now if you can see what’s popped up on the tablet is profiles of the stars that are in the movie, an abstract of the movie. I have all the controls for the movie. I can even ask it to show me the different scenes for the movie.

So, what it does is when you start playing it will actually scan the movie, find the individual scenes —

(Video segment.) (Laughter.)

CRAIG MUNDIE: I don’t like that scene; let me go to the next scene. (Laughter.)

But you can see by just touching I can essentially change where it plays, and I have all the controls. So, if I don’t want to compete with her I’ll just pause her.

Now, next to this you can actually see I also have one of the Windows Phones. The phone actually has the same SmartGlass application — that’s over here — and just as the dualpane version of the tablet, here on the smaller screen the experience is designed to be optimal there. You still get the actors and the profile information, and you can still call up the scenes and navigate among them. And, in fact, you can have all these things active at the same time.

So, more and more we believe that whether it’s for entertainment purposes or communication or collaboration in a business environment or a classroom environment, that the ability to design and operate these systems so that they span multiple devices will increasingly be important.

All of this moves us in a direction — while this has all been, if you will, still oriented around the graphical user interface, obviously one thing that people liked about the addition of touch or direct manipulation was there was something more natural about it, but you were still confined to the interface that was created for you by the person who wrote the application or created the basic graphical user interface model. And that’s been super powerful and valuable, and it isn’t going to go away, but five or six years ago, we became very focused at Microsoft on trying to think what would it mean to make the computers more like us, to introduce what we think of as natural user interaction where the same attributes that people have in terms of the ability to see and speak and listen and manipulate things would be used in order to create a new type of interaction with computers, one that would elevate the semantic level of interaction with the device and would really allow many, many more people over time to get a benefit from computing than currently do, and we’ve been very focused on that. And this ranges, and I’ll show you more as time goes on today, across a very broad array of ideas of natural interaction.

The first real product in this space for us was the Kinect on Xbox where the business unit basically said, hey, our objective is to have controller-less gaming. We want people to be able to map themselves and the things that they know how to do in the physical world directly into the game. And, of course, now we’ve applied that to the entertainment and control environment.

So, a lot about NUI is emulating human senses, if you will, but other aspects really are sort of the design sense of thinking about applications in a way where you can just essentially sit down and do things that you didn’t know how to do.

So, what I’m going to show you now is a demo called IllumiShare. This is not a product, it’s actually a research prototype, but we built this in order to be able to see what it’s like when you can give people an almost completely natural ability to collaborate at a distance.

So, Matt, you want to come up here? Matt works with me, and he’ll just sit on the other one of these tables.

What you have here is a thing that for all intents and purposes looks like a desk lamp and a tablet, and what’s on the tablet is essentially a Skype call. So, I can talk and see Matt sitting at the other side there.

But this is a very fancy desk lamp in that it’s designed in a way where sort of in interweaving frames it can project and it can see. What that gives us the ability to do is to create a composite image on each desk that is half what you physically have in front of you and half what you observe on the other side.

So, he just took his pen and set it down, and it shows up here. Now, I’ve just got a video of it, he’s got the real object, but I’ve got the red pen and I can put it here and do the same thing.

And so I can take this thing and say hey, you know, let’s say I was trying to teach somebody geometry or something, and I say, well, look, I’m going to draw this nice triangle here. And as I draw it, it will show up on his, too.

And I can say hey, you know, tell me about this, teach me about angles. And so he can take out his protractor, he can measure these things. He can say, okay, well, that one’s 30 degrees and that one, let me guess, looks like 90. (Laughter.) I know, it’s been a long time but I do get that.

And so he says, okay, well, what’s that one going to be? And I can say, well, no, I don’t think it’s 65 because there’s this rule that they have to add up to 180 — (laughter) — so I can say this is really 60.

So, we’ve basically put young kids down in front of this with literally no training. You can give them books, magazines, crayons, games, physical toys and that kind of stuff, and in a matter of seconds they just start playing with each other, because they can see them, they can talk to them, they can manipulate these things, and they basically have for all intents and purposes the ability to operate on what appears to be the physical world at the same time even though they’re not in the same place.

So, we think of this as essentially telepresent collaboration, and while this is essentially a version that will be I think employed in a work environment or maybe just for kids who want to communicate and play, this can be really, really cheap, the lamp, in that it has a very, very simple camera in it, it has a Pico projector in it, and a small amount of intelligent circuitry that knows how to interweave these images.

And so our view is that in the future people will buy furniture for their office or furniture for their kids to do their homework on, and they’ll think nothing of being able to collaborate, not just by saying, hey, I can make a phone call or I can have a Skype call but rather I will be able to really show people things, demonstrate things, measure things, move things. We think this is going to be important.

There’s a big effort in the company at all scales of thinking about this idea of telepresent collaboration, and I think it’s going to be one of the more important things that we ultimately will master how to do.

Beyond NUI another big trend nowadays, you can certainly even read airport ads about it or TV ads, is what people call big data. And you can say, hey, what is big data? My definition of big data is it’s datasets that are so large that in practice you can’t operate on them on a single machine. No matter how big that one machine is, the datasets are just too big to do anything interesting on that machine.

And so there’s been a big activity at Microsoft to think about how do you give people better tools to support analytics, and I really think of this as two different classes of ultimately tools.

One set is where much as we have for many years with powerful visualization tools and analytic tools you’re trying to couple the human intellect and their intuition or perhaps domain knowledge into trying to develop some new insight, and the human really has to master the tool and drive it around in the data.

This will remain very important, but ultimately I think will be superseded or augmented in a dramatic way by the next topic I’ll talk about, too, which is machine learning. But first let me give you a demonstration of some of the advances that are being made in the tools for this sort of big data and the analytics around it.

What you see up here is a screenshot. This is essentially down at the Monterey Bay Institute they are doing a lot of work on studying the oceans obviously, and in particular even Monterey Bay, and the left is a floating buoy that has all kinds of instrumentation built into it, and on the top right is an autonomous underwater vehicle, and that could be programmed to work in conjunction with this floating guy. On the bottom right is a picture of one of the instruments which actually does — using laser light it can in real time do measurement of chlorophyll in the water, for example, or even do RNA type analysis.

And so what I’ll show you now is some tools that we built lately. These are derived from a thing that we did out of research at Microsoft a few years ago that was called the WorldWide Telescope. There was one of the first examples of a lot of really big data, which was essentially all of the astronomy community’s collected images, observations and measurements of everything in the sky, and we set out to create a tool where people could ingest all of those things and essentially in real time manipulate them, examine them, zoom in on them, and importantly make stories to be able to share their observations and explorations with other people.

So, we went on with the product group to basically make this new thing, which is called Layerscape, and it takes the same idea of being able to build interactive stories and time-based sequences around big data. In this case it’s not astronomy images, it’s just other types of data.

So, I’m just going to jump into what is a prerecorded sequence of activities that were built using this tool now, and this has actually been released by Microsoft. So, if I just touch this, it will continue.

The first thing you see here is a trace over time of that thing floating in the Monterey Bay and sampling chlorophyll at a depth of about 10 meters, and the chlorophyll intensity is reflected in the colors on this.

And now if you want to basically try to understand, well, what’s going on here, why might this change, the next thing that they do is they basically rotate it around and they overlay satellite measurement of surface chlorophyll. So, now you see these composited and you can see where this thing floated around, and what the other observations show, and you can see that it turns out that new water kind of flows up at this part of Monterey Bay, the currents basically flow this way — that’s why the buoy actually follows in this path — and then there was sort of older, colder water that was over here.

And if I go on and want to understand how to correlate chlorophyll at different layers, then you want to bring in a new layer of data. In this case this thing called a Tethys was the name of the torpedo-like thing up there, and what it does it it’s programmed to swim around the buoy. They call it porpoising. It basically goes up to the surface and then swims down to 100 meters and up and back again and it goes around and around. And what it’s trying to do is essentially to produce a correlatable set of data at every point with what the buoy is able to measure. Here, of course, I could do the same kind of zooming and panning if I wanted to.

If you look at this graph, this is what you would historically have been presented with as a scientist for the output of the Tethys system. So, you get a time sweep of the data, you’d have color dots for depth, salinity and chlorophyll, and you’d spend your evening staring at the dots trying to figure out what did they mean.

It turns out it’s a lot easier to figure out what they mean if you can look at them in a time sequence, so you can sort of wind this thing back to the beginning, and then to essentially plot it in a completely different way where you’ve now grouped things into colored bands which represent each of the circles that the thing had as it swam around in this environment.

And so in this environment I could zoom in on it, I could kind of conclude that there are some anomalies in this data. Normally you’d expect, for example, that if green was the lightest and blue was the middle and red was the bottom, you’d expect that the layers would always be that way. It turns out when you look at it graphically this way you can find that, no, in fact, there are anomalies in this environment, that occasionally you get layering and you might want to go figure out why that was the case.

So, this is just an example of very powerful new tools that are being developed so people who have domain knowledge have a much richer way to try to gain insight from the data.

But I think at Microsoft the thing that we’ve been doing research on for almost 20 years, and I think is now really coming of age is this concept we call machine learning. We’ve been using it for some years at Microsoft already inside the company to solve problems, and increasingly we’ll make these tools available to people outside, much as we have the kind of analytic tools that I just showed you.

But to give you an idea of the dynamic range of the applications of these machine learning technologies I’ve picked just six things that we do with it today. One thing we do is speech recognition, and, in fact, almost everything that we do now about speech and language learning, the ability for the machine to see and read, all is done by essentially learning, not by programmers being able to describe this.

This is a very fundamental trend, and you’ll see as this goes on that more and more people whose job is to write programs are no longer by and large writing programs whose output is completely predictable. In the simple sense if you go back, people said, oh, I know, programs are algorithms and data, and you’d take some fixed set of data, you apply some algorithm, you get some answer, and the job of the programmer was to figure out what that algorithm should be.

But increasingly the things that we do are not so predictable, that what the programmer is doing is harnessing these superscale engines together that are really operating on a more statistical level, not on a completely predictable level.

Now, this poses all kinds of new challenges to the programmer in terms of deciding what’s correct and what isn’t correct, you know, how do you debug these kinds of thingsThis is a problem both at the platform level and ultimately at the application level.

But it’s our belief that the things that really are going to make computing both easier for people to use and ultimately to have it produce more interesting and useful results are now going to be predominantly determined by your ability to master these superscale machine learning techniques.

The second one we did, this is quite a few years ago, when we introduced Xbox we introduced Xbox Live, which was a gaming service. And the idea was we wanted to allow people to play games together who weren’t physically at the same console.

One of the things it turns out to be a challenge in any gaming environment is how do you match people. You know, you’ve got 60 or 100 million people who are playing games, they play different kinds of games at different skill levels, and when they show up in the game room and say, hey, I’m ready to play a game, who do you match them up with? And it’s very hard because the games span such a wide genre, set of genres, and it’s very hard to describe what constitutes excellence or high skill in any particular game.

So, what we did is we applied machine learning and created a thing we called TrueSkill which for each game by observation would figure out what skills could be observed in people who had the highest scores, and by observing those things and measuring them you are able to essentially not just give the sort of points for winning the game but you could basically come up with a profile of different aspects of the skill of an individual player on an individual game, and then use that to match people up.

And this produces a very satisfying gaming experience because, in fact, you can make it such that people are so evenly matched that you can’t actually reliably predict who will win, and, of course, that is the ultimate thing that you want in a gaming environment. You don’t want to always get creamed by the person you’re playing; that’s just no fun at all. And so this is an example of very subtle ways you start to see this applied.

The whole concept of building superscale search engines like Bing couldn’t be done without machine learning, and even the monetization system like what ad should be presented to somebody in 30 milliseconds based on their current activity and history, that’s also a machine learning driven system.

Years ago, we started with this issue of having to filter mail and instant messaging around spam, and in a minute I’ll show you a video that shows how this whole concept of filtering has become important in other areas.

When we introduced Kinect where the goal was to have the ability to have controller-less gaming, what we had to do was figure out how the machine could recognize people very quickly, map them into a skeletal position, and then return that description to the game designer.

So, we developed the algorithms that would do the skeletal mapping. We can do four people, 42 major joints at 30 hertz simultaneously using just a tiny fraction of the power of the game console.

It turns out the challenge then becomes how in the world do you actually describe the gesture that is meaningful to you in the game you’re writingSo, let’s say you wanted to make a handball game. So, what you have to really do is figure out how to describe how you swat the ball back. You might swat with your right hand, you might swat with your left hand. But it turned out in the first generation it took really skilled people about 200 hours per gesture to figure out how to describe the translation of all the skeletal joints to reflect what was considered a swat and what wasn’t, because, in fact, it could be here, it could be here, it could be there. A human, you look at them, you say, hey, is that a swat, and they say, yeah, but how do you tell the machine, how do you describe that?

So, because we wanted more and more people to be able to build more and more things, not just games but now many applications of Kinect, we built a machine learning system that makes it brain-dead simple for anybody to teach a gesture, and literally what you do is you stand in front of the thing, you start the record thing, you make a few swats that you consider representative of a thing you want, and you make a few gestures like a kick or you turn yourself around, anything that is not a swat. And then like a home movie editor you just put little brackets around the things that you believe are representative of a swat and you feed it into the thing and in about 60 seconds it analyzes it all and comes out with a canonical description of a swat in terms of the simultaneous deflection of all the joints.

So, these things look easy, they’re very difficult, and only using these techniques can we make it possible for most people to do.

Another one is if you look at all the mapping stuff we do in Bing and for the phones and tablets and other devices, we wanted to be able to predict traffic, because when you ask for a route it turns out unless it’s a very short route, based on the time of day and other events, the traffic may be very different when you get to a particular point than it is when you leave.

So, years ago, we started and have now built a machine learning system that ingests all these different factors in every locale, along with the current measured properties of traffic, and it will actually predict over the duration of your trip what the predicted traffic is going to be at each point, and then builds a route based on that prediction.

So, these are just six examples of that.

Let me just run a video for you that highlights the work of David Heckerman. He’s a longtime Microsoft researcher. In fact, as he states in the video, his original work was in spam filtering, but he’s also trained as a physician. A few years ago, we decided — he decided he really wanted to see if these techniques could be applied in the hunt for an AIDS vaccine. So, let’s watch this video.

(Video segment.)

CRAIG MUNDIE: You know, I think of this as a precursor to the kinds of techniques that are going to have to be employed not just to find a vaccine for AIDS but to ultimately move the world completely to sort the world of digital biology and digital medicine.

Today, we’ve basically taken this kind of stuff in Microsoft Research and put it into the Azure cloud platform to get the ability to scale it up and have other people access it. This space requires a lot of focus on parallel computing but not in the conventional computational modeling but in essentially this big data handling capability. It requires that you be able to take and annotate and understand things like genomic data and the genome of the virus. And then there’s many different machine learning algorithms that have to be developed and employed in order to be able to get these insights.

But ultimately I think this is just one tiny example, and I think all of the world of medicine and pretty much every field of science and engineering and even sociology will no longer be able to be progressed without the use of these big data and machine learning capabilities. So, I think that’s an essential feature of the use of computers in the future.

So, let me close with just a few thoughts about how we’re moving to use these combinations of big data, everything we can observe about people and their interactions, the data sets that we have to process, and this goal for natural interaction to really make computers that are more like us.

In this environment we’ve made some progress. I mean, clearly Kinect was a major step forward. We got a very strong global reception simply because it really was the first system where people could do otherwise complex tasks like put yourself into a 3D game with no prior training and able to operate in that environment. Why? We already knew how to operate in the real world; you just didn’t have a way to translate what your brain and body knew into actions that were meaningful in the game, and Kinect showed us a way to begin to do that, and now we’re applying it in many other areas.

So, if you look at the next slide, you know, I’ve annotated what looks like just the average, you know, I want to sit in my living room and talk to my Xbox with sort of the next class of problems that we’re trying to solve now that we’ve been able to endow the television through its Xbox interfaces and computational capability and the cloud behind it with all these things.

We do the skeletal tracking, but of course we’d like to be able to not just do the 42 major joints, we’d like to be able to do finer grained tracking including all your fingers and things, but that turns out to also be a physics and an optics problem, as well as a machine learning problem.

We want to recognize people. We do a limited amount of that today, so that when you walk into the room you just login because the device recognizes you. But ultimately there’s lots of other benefit to doing it. But to be able to do it reliably and quickly, particularly in low light situations, all these represent interesting challenges. When you’re not playing games or watching movies, there’s very low ambient light and yet you want to reliably be able to identify people.

All different kinds of gesture recognition, I talked about skill matching. We want to be able to do content recommendations. Today, we can do a good job serving up ads for people against search queries, but the world doesn’t do a very good job of trying to really recommend content.

In a world where video entertainment is essentially everything that you’ll be able to stream off the web, any classical concept of navigating through a guide is just completely implausible.

So, a lot of our focus now is to be able to apply the search engine capability to the content environment through a verbal or gesture-based interface and be able to get that on your Xbox, and in fact we do that in the latest version of the product today.

I’ll close by just showing you one more short video. Let me show you this slide first. So, this is Rick Rashid. Rick works for me and is the founder of Microsoft Research 21 years ago. We’ve collaborated together for 20 years at Microsoft, and across a wide range of things. In recent years a lot of our focus has been to try to create an optimal impedance match between the basic science that we do in MSR and the needs of the product groups.

But we also at times seek to challenge the researchers particularly to collaborate more. This question of collaboration and the ability for multidisciplinary cooperation to solve really hard problems is going to be an essential requirement for people in the future. It’s a bit like medicine today. You know, general practitioners are largely going away because medicine has just become too complicated, and so you become more and more specialized. The same is going to happen in all of our disciplines, and therefore the premium on being able to function as a team in order to solve problems will be higher and higher.

So, every so often we give a set of challenges, if you will our own little internal grand challenges to the Microsoft esearch labs, and one that Rick gave them and emphasized two years ago — we call them Impossible Thing Initiatives because at the time they sort of seem impossible — was to complete the ability for realtime speech translation, which is not just can I say something and have it done, the challenge was a little higher. We wanted it to translate your spoken word into your native language in text, which allows a level of verification, to basically convert the text into the corresponding output language, to synthesize also the spoken word for that target language, and to synthesize the speech in a physical vocoder model of your voice so that it is exactly as if you spoke the other language, not heard another person speaking the other language.

So, this has been a goal we’ve had for a while. So, last week was really not only important for us at the company because of Windows 8 and all these other things, but last week we actually achieved this goal, and at the 21st Century Computing Conference in China Rick for the first time gave a public demonstration of this. So, this is from the audience, just a little video clip, but I’ll show you how this has progressed. Of course, everything I’ve talked about, big data, machine learning, natural interaction, all of that is essentially underpinning this ultimate goal of sort of Star Trek of you speak in one language and you’re heard in a different language.

Go ahead and run the video.

(Video segment.)

CRAIG MUNDIE: So, as far as I know, this has never been done before. It’s quite an achievement, a technical tour de force I think in that there are very few things that I consider as seminal demonstrations in computing as this is, and yet I think of it as just the beginning of a completely new era of computing and how it will evolve.

So, there’s obviously lots of other things we could talk about, but I want to stop there for now, and before we have the Q&A we’ll have the giveaways. So, I’ve got some good toys today for those of you who stick around. (Applause.) Thank you.

So, here we have all the ticket numbers. So, get your little stub out that we gave you when you walked in the door. The first thing we’re giving away, we’ll give three of these away, is an Xbox with Kinect, 250 gigabytes of storage, and a one-year Xbox Live gold membership and 4,000 Xbox points.

The first one of those will be — ends in 361, 361, the lady in the red sweater, 361. (Applause.)

So, when you’re done, just find Erica. She’ll be standing back there in the back, and she’ll arrange to get you these lovely little things.

Number two Xbox winner ends in 434. Ah, there you go, another lady. These ladies are going to be popular here. (Applause.)

Okay, the next one ends in 476. Okay, in the back against the wall. (Applause.)

So, those are the three Xboxes.

So, the next three things I’m giving away are three of the brand new Microsoft Surface RTs. (Cheers.)

So, we’ll stir the box up here. The first Surface goes to 496, 496. Anybody with 496? Ah, too bad, you have to be here to win. Okay, throw that one away.

Okay, now the first one goes to 505. Right here, 505, okay. (Cheers, applause.)

Okay, number two, number two goes to 324. The man in the green shirt. (Applause.)

And the last Surface RT goes to 486, 486, back in the back. (Applause.)

Your last two chances to win, we’re going to go small herewe’re going to give away two Windows Phone 8s. They were announced yesterday. Everybody else will actually get the things before you leave. The phones, because you have to pick your carrier, we’re just going to give you a little coupon and you tell us which one you want and which carrier, and we’ll get you the phone.

So, two phones, the first one to 508. Okay, there’s the man in the white shirt right there. (Applause.)

Okay, the last giveaway, the last phone, 337, 337. Right here. Okay. (Applause.)

All right, if you didn’t like the talk, at least the gifts were good.

END

Craig Mundie: UC Berkeley

Related Posts