Remarks by Microsoft General Manager, Microsoft Research, Kevin Schofield during Convergence 2008, Microsoft’s annual Convergence customer conference, March 14.
Friday, March 14, 2008
ANNOUNCER: Ladies and gentlemen, please welcome General Manager, Microsoft Research, Kevin Schofield. (Applause.)
KEVIN SCHOFIELD: Well, good morning. Thanks for you all coming so early in the morning. It’s a privilege to be here. I’m Kevin Schofield, General Manager, Microsoft Research. I’ve got some fun stuff to show you this morning. Actually, just sort of something you’ll find out about me over the next hour or so, I’m sort of a one-man-show. I write all my own talks. You’re going to see me do all my own demos. I even volunteered to do my own acrobatics out here. The folks backstage put a nix to that pretty quickly. They pointed out that this session is right after breakfast, and it’s probably best for both me and for you that none of us see me in a white leotard right after eating.
So I decided I’d go a little more cerebral this morning. I want to talk about data, because we all, as the title of the talk says, we all live in a data intensive world. We’re surrounded by it, it pervades our work lives, it increasingly pervades our personal lives. It’s sort of like the Force, if you don’t mind me ripping on the Star Wars thing again for a second, it surrounds and binds everything, and holds the universe together. And there are definitely some people who are more adept at sort of harnessing it than other people. But I think we’d all really like to be kind of data Jedi Knights, if you will. So I’m going to talk a little bit about technology and how it can help with that. I don’t often make a point of quoting literary figures in my talks, but this time I actually thought Elliott kind of nailed it, because we don’t really want data and information, what we want is knowledge and wisdom. We want to be able to extract out of all that ocean of data and information the useful part, and insight, and the trends that help us to make smarter decisions. So in Microsoft Research, as well as in Microsoft as a whole, as you’ve seen as you’ve been going through the sessions about our Dynamics line or products, we certainly feel a very strong commitment to making sure that we’re giving people great tools to help extract out that knowledge and that wisdom.
A little bit of an introduction to Microsoft Research, I actually want to share with you this, what at first looks like a really God awful complicated chart. This is actually created by an organization whose mission is to extract out knowledge and wisdom. It’s created by the National Academy of Sciences, which is chartered by Congress to go off and think about hard problems that Congress asks them to think about, and then come back with sort of their thoughts and kind of insights on it. Back in 2003, Congress asked the National Academy of Sciences to go off and think about what is the value of basic research to our country, to industry, to our economy, because since Sputnik and the space race, for about 50 years, we’ve really been investing very heavily in this country in basic research. It’s certainly worth asking the question, how much value, how much ROI are we really getting out of this.
So this is the chart they came back with, and it talks about 19 different information technologies, one set was really sort of either came to depend on, or continue to depend on, everything from relational databases, and Internet, to worldwide Web, and speech recognition. There are several commonalities across this. One of them is that all of them eventually became billion-plus dollar businesses. But really when I look at this there are three big takeaways for me. The first takeaway is that research does, in fact, pay off. Not every piece of research individually pays off, but research in the aggregate pays off. You do enough of it, you invest broadly enough, and there are going to be some big winners, right? Billion dollar businesses do come out of doing basic research.
The second one is that it’s a long-term play. If you look at these, and look at the red lines are sort of academic research, blue lines are industry R&D, the black dotted lines, I’m sort of telling you for the people in the back of the room who have a hard time with this, black lines are when it actually sort of hits products, real products out in the marketplace, and green is when it hits billion dollar businesses. You can see that from sort of the inception of the research to by the time they became billion dollar businesses was at least 10 years, and in some cases as much as 25 years. It’s really a long-term play. Sort of the mythology of, you do some great piece of research, and you stick it out there, and suddenly you’re making a billion dollars that never really happens. But over the long-term it really does pay off big.
And the third big takeaway from this is that research is messy. You’ll see lots of arrows going back and forth between academia and industry, and that’s very, very common for the way that research works. That technologies tend to bounce back and forth. There’s a lot of interplay, and a lot of collaboration between research, basic research, and industry. They learn a lot from each other. It goes back and forth. But in the end you get a great result that comes out of that.
I sort of walked through that because those three points will really serve as the heart of why in 1991 Bill Gates decided that he wanted to create a basic research lab for Microsoft. And to put that in perspective, in 1991, Microsoft’s biggest selling product was DOS 6.0. A little bit of a stretch from there to say, you know, we should really have a big research lab. But they did, because their thinking at the time, Bill, and Steve Ballmer, and other heads of the company, their thinking at the time was, we’re really in this for the long-term. We want to make sure that we’re delivering great value for our customers for a very, very long time. So we’re going to make lots of short-term bets on our product line, but we’re also going to make long-term bets as well.
So from that start in 1991, 16 years later, we’re now 800 people in five labs on three different continents. And we actually announced a few weeks back that we’re going to be starting our sixth lab this summer in Cambridge, Mass. We’ve got about 400 or so people in Redmond, Washington, that’s our biggest lab. We’ve got about 200 based in China, about 100 in Cambridge, England, about 50 each in Silicon Valley and in Bangalore, India. So it really is sort of a worldwide operation.
Our mission is three-fold. Number one, across about 55 or so different areas where we do work -everything from lowest levels of operating systems, networking and databases to the highest levels of defining new kinds of user experiences, and lots of stuff in-between – advance the state-of-the-art, have a steady drumbeat of moving technology forward.
Second part is getting these advances in Microsoft products as fast as we possibly can, and really making sure that those products stay fresh, and stay state-of-the-art.
And in a sense that really sort of leads into the third part, which is making sure that Microsoft knows the future, and that’s two parts. One is making sure that we can actually keep our products state-of-the-art, that we continue to evolve forward to meet your needs as our customers, as our partners. That’s because you’re making such big bets on our product lines that they don’t become technological dead ends ever. But the other part is to really sort of place the hedge for the future. Because if you ask Bill Gates, or Ray Ozzie today what are the important technologies, the crucial technologies that are really going to make or break our business three years from now, they couldn’t tell you. They have some ideas. I have some ideas. I’m sure each and every one of you have some ideas. And we’re going to be right about some of them, and we’re going to be wrong about some of them. So by having an investment in basic research that really tries to cover in aggregate the breadth of the field, two-and-a-half years from now when we turn a corner and say, oh, my God, suddenly this technology is super, super important, minimally we’ll have expertise in house, and most likely we’ll have technology that we can bring to market very, very quickly. And time after time after time, it’s proven to be a really good bet over the last 16 years. So that’s sort of the thinking and reasoning behind our mission at Microsoft Research.
Quick snapshot of some of the areas that we work in, this is not a complete list. One of the downsides about talking about Microsoft Research is, because you cover so much area, there’s no way in an hour I can tell you about everything that we’re doing, and show you demos of everything we’re doing. But I do have some very cool things to show you this morning. I just sort of want to give you a little highlight there. We’ve actually got a ton of technologies in Windows Vista and Office 2007. Most of what we do is actually working on individual technologies. We don’t really incubate a lot of products. There have been a few that have come out of Microsoft Research over the years. For example, Microsoft Surface came out; ResponsePoint, a great new product that came out about six months ago, which is voice-operated, and actually voice over IP aware phone system for small businesses; tablet PC came out of Microsoft Research; you might have seen the Roundtable in the expo floor in the next room over, that also came out of Microsoft Research. But, once again, most of what we do is really focusing on the underlying technologies so things like spam filters that we develop can end up in a lot of different Microsoft products from Outlook to Exchange to Hotmail. In fact we use it on our own Microsoft.com mail gateway as well. So we really sort of cover the breadth.
And the other thing that we have a really huge investment is around development tools themselves. And we use our own internal product groups as guinea pigs as well. So the Dynamics lines of products, as well as all of our products, in fact, at this point every single product that Microsoft builds today and ships was built using technologies that came out of Microsoft Research. In fact, as long as I’m mentioning Dynamics, I seem to remember last summer here at sorry, last year at Convergence, we showed some Microsoft Research visualization technology, and actually I’m going to show you some more today, the ones we showed last summer are still getting their way into the products. We have a bunch more things I can’t tell you quite about that are actually on their way into Dynamics, but lots of fun stuff is going to be online. It’s a group that we’ve partnered pretty closely with, and there’s going to be a lot more great stuff coming out in the future.
Let’s get back to data. You may recognize this man. This is Oscar Pistorius, he’s a South African Paralympic runner. He’s a double amputee. Sometimes they call him the Blade Runner, or the fastest man on no legs. Back in January, the IAAF, International Association of Athletic Federations ruled that Oscar is ineligible to compete in the Beijing Olympics this summer. Why? Because his prosthetic legs are better than human legs. They actually tested him and found out that he has a distinct performance advantage over those of us with dumb old human legs in running. They’re actually better than human legs. Wouldn’t it be great if we had data and cognitive prosthetics, if we had sort of ways to extend our own cognitive power beyond sort of normal human abilities? Despite the fact that our world keeps getting more and more complex, you know, going all the way back to cavemen drawing on their walls, the amount of information, the amount of complexity of the world has continued to increase. But after that initial 21-year boot-up period, we don’t really get any smarter, do we? I keep hoping, but it’s not happening. In fact, I can feel it start to drop off a little bit now that I’m into my 40s. But there’s a great promise of technology, because we can actually build tools that take that complexity and map it back down sort of a prosthetic, into something that we’re actually capable of dealing with. This is one of the reasons that we invest so heavily in technology and research projects, as well as our product lines, to help give great new abilities to people to manage and work with and visualize the data so that we can sort of bring the world, this ever complex world, back down to within our own abilities.
Let’s sort of break this down to some of the underlying problems. The first problem, of course, is have you got the right data. Have you got the data that’s actually you can use to extract out knowledge and wisdom from that, right. I know we all share the experience of seeing a presentation, have somebody pitching something to you, and you go through slide after god awful slide, where there’s just no useful information whatsoever. And it’s charts and it’s graphs, and it’s diagrams, and there’s just nothing there that’s helpful at all. For all intents and purposes they could be showing you this.
On the flipside, when you do actually get the right information, and you can see it, and really sort of see how it relates, it can, in fact, rise up to an art form. This is one of my favorite examples of this. This is a drawing by Charles Joseph Minard, who was a 19th Century engineer, a French engineer. And he drew many of you may have seen this before. It’s a pretty famous drawing. This is a depiction of Napoleon’s march from Poland to Moscow, march on Moscow, in the winter of 1812 and 1813. It’s an amazing diagram, because it actually has six different dimensions of information on it.
It’s got the latitude and longitude of the troops, so the physical location of the troops, the troop strength, which is the thickness of the bars it goes along. So it starts out really thick, and it gets sort of smaller as it goes along. It’s got the direction that they were marching. The brown is towards Moscow, and black is going back towards Poland. And it’s actually got the temperature at some of the places along the way. You can see at the bottom he’s got a temperature chart, and he’s drawn lines at particular points on the march back, so you can see what happened to temperature.
Together this tells an amazing story of what happened with Napoleon’s march on Moscow. You can see he left with 422,000 troops, and by the time he got to Moscow he had 100,000, 322,000 died. Actually a little less than that, because you can see he split off a couple of reserve troops to watch his flank. He got to Moscow, the city had already been sacked, and mostly deserted. There was nobody there, nothing to sort of capture there, so he turned around to go home. And the winter just kept getting worse, and worse, and worse. And by the time they got all the way back to Poland, he had 10,000 troops left. He left with 422,000 to 10,000.
They didn’t fire a single shot in an entire winter along this long march, 400,000 people died of the elements. And that’s, in essence, why Minard drew this picture, because he wanted to talk about the horrors of war that nobody ever talks about. It’s not necessarily about battles and bullets, right. There are awful things that happen in war that are just simply dealing with the elements, and dealing with outside, and dealing with marching, and the exertion that goes into it. And he had the right data to tell that story.
One of the areas where we’re, in Microsoft Research, looking at how we can get the right data to tell the story of what’s going on, is around data centers. We’ve made a huge investment in Microsoft, across the online services, and our other businesses, in building out data centers. And we have to manage them, and for those of you who look into these things, a huge amount of the power, the energy that’s put into data centers isn’t actually powering the CPUs and the disks, it’s actually the HDHT. In fact, by some estimates 50 percent of the power in a data center goes towards HDHT.
Most people don’t really sort of understand the interplay between all these different things. In fact, we don’t even really understand it yet, but we’re running research experiments on it, and one of the things we’re trying to use to get better information, get the right information, is sensor networks. We have a couple of experimental sensor networks nodes that we’re using that have little radios in them, and sensors, and they can collect information. These can get temperature and humidity, and a couple of other interesting factors, and then they can all talk to each other, so they can all sort of bounce information back to sort of a collection station, or they can hop to each other, if they’re spread out far enough that they can’t actually see each other.
One of these will run for about four years on two AA batteries, so it’s easy to spread these things out in different situations. So we’re actually taking a bunch of these and instrumenting one of our data centers. We started with a couple of rows of racked machines. I want to show you a demo here which is actually a live display of what’s going on in one of our data centers. And this is a data center just outside of Seattle.
So lots and lots of this is actually sort of the physical layout of the data center where we have rows of individual racks of machines, we have HDHT systems in the middle here, and you can see what utilization is on them. We’ve taken these two rows of machines and we’ve actually instrumented these pilot projects with those sensor networks. So we collect temperature information at various points along the racks and rows. And we’re overlaying here with some of the information we’ve collected so far.
The darker the bar here, the fuller the rack is of machines. So really dark it’s pretty much full. White ones are almost empty. And we can see the sort of aggregate temperature of what’s happening. These numbers right here actually represent the power utilization. So we know how much power is available in each of these racks, and we can see how much it’s utilized. One of the things we’re going to work on through here is to start pulling in more information about CPU utilization, disk utilization, see the actual resources we’re consuming that are available.
I can click into one of these, it’s going to take a second to come up here, and it will show me a visualization of that whole row, full row of machines. And you can see rack-by-rack what’s going on with them. You’ll also be able to see where the individual sensor network nodes are. The blue, places we actually have nodes in, you can see there is an almost empty rack, a very full one, and what I can do here is I can actually take information from these individual nodes and draw a temperature contour across this entire rack that lets me see what’s going on with the temperature across different areas. There’s a very cool area down here, mostly cool in the middle.
Of course, we’d expect, because heat rises, it’s warmer up near the top. This is the front, switch to the back of these machines, and I’ll see the back, it will definitely be warmer. It’s still very early in the morning in Seattle, so the load on these machines is not terribly high. If you look at them later on in the day, you’ll see that it’s much higher, which is what you would kind of expect. It’s much cooler in the areas where there aren’t as many machines, it’s warmer higher up, and in more dense areas.
In fact, I can even do an animation of this, and show at 5-minute intervals how it’s changed. We’re not going to see a lot of change right now, but it’s interesting to think about the kind of information that we can get from that, and start to think about as we look at the aggregate of this information, in the aggregate, we’re going to continue to instrument that entire full data center, and start to learn from that how do we better manage HDHT systems, and what is the interplay between where we provision different services within our data center, and the temperature, the humidity, and the load on various systems.
So if I went back I can see I’ve got an HDHT system that’s not getting very much load over here at all, and I’ve got other ones running full out, that I could potentially move some of the not the physical machines, but the applications and services that are running on some of the machines over here. I dynamically move them over here, to help balance out the load on the HDHT systems, or think about from a planning point of view, if I need to take some of these machines down for servicing, and I think about the overall monitoring, and HDHT, and environmental controls of the room as a whole, and make sure I understand what capacity I have, so I don’t cause an overload somewhere else in the system. So that’s one fun example of a project that we’re working on related to really getting the right information.
Now, of course, the next problem is once you’ve got the right information, are you looking at it in the right way. And this is another case where there’s lots of interesting cases where people are doing it well, and people doing it badly. I want to talk about one example of people doing it badly that happened very close to here, it happened down the road at Cape Canaveral, and that of course is the Space Shuttle Challenger.
January 28th, 1986, 72 seconds after take off, Space Shuttle Challenger exploded, and there were congressional investigations, the president created a blue ribbon commission to look at this, called the Rogers Commission. They interviewed lots of folks. All the testimony and charts, and graphs, and evidence is online, and is actually fascinating to look through. And when you go back through it, and read through the conclusions of the Rogers Commission you find out a couple of really interesting things.
You find out that the engineers who actually designed – Morton-Thiokol is the company that provided the solid rocket boosters for the Space Shuttle Challenger, knew that they had a problem. They strongly suspected they had a problem with the O-rings, the rubber-steel sets that failed as or sort of the joints between the sections of the solid rocket booster. They knew they had a problem with that, but they were having a hard time extracting it in a way that convinced themselves, convinced their managers, and NASA managers that this was really a serious issue.
Here is one of the charts that they used, literally this is one of the Morton-Thiokol charts. And I think that’s kind of telling down at the bottom it says, basically, don’t use this chart unless somebody gives you an oral explanation. That’s kind of a tip-off that, hmm, maybe there’s a better way to look at this. What they literally did, they took every flight of the Space Shuttle up to Challenger, and they represented its two solid rocket boosters. And the marked on there every incidence of O-ring problems on the solid rocket booster, and you can see it up at the top of each solid rocket booster they actually wrote in the temperature, the ambient temperature, that day when it took off.
I have a hard time getting any actual knowledge, and wisdom out of this. And of course, with perfect 20/20 hindsight, and I really don’t want to pick on Morton-Thiokol engineers, because they had a super-super hard job. With 20/20 hindsight, it turns out they actually had all the right data in this chart to get the insights they needed to get, they were just sort of looking in a way that wasn’t really optimal. And experts have gone back since then, and this is one of the ways that they suggested would have been a much more insightful way to look at it. This is a chart where there’s a dot on here for every single shuttle flight, and they charted it on the temperature that day, versus the number of O-rings problems on that particular flight.
So you see in the bottom right corner there’s a whole bunch of flights that had no incidents. You actually learn two things from that. First you learn that probably one of the design flaws was the O-rings in general, which is something that a lot of mechanical engineers believed up to a year before the Challenger accident. It’s probably a design flaw, because no matter what the temperature was, there were some O-ring problems. So they never had a flight that took off at less than 65 degrees that day that didn’t have O-ring problems.
Does anyone remember what the temperature was in the morning of the Challenger accident? It’s kind of hard to believe anybody in their right mind looking at the data this way would have decided to launch that morning. Once again, I don’t want to pick on the Morton-Thiokol engineers, because the basic problem is there’s no manual for how to do this kind of visualization. There’s no book you can pick up and look through and say, if you’ve got this kind of data, here’s exactly how you should display it. There’s just no manual like that.
So what we do in Microsoft Research, to work on data visualization, is we build interdisciplinary communities. It’s really a hybrid problem. It’s partly about design, and it’s partly about studying the user, and cognitive capabilities, how do people perceive things, how do people understand data and different kind of visualizations. So we built an interdisciplinary team of graphic designers, experimental psychologists, cognitive psychologists, in some cases sociologists and ethnographers, so we can understand the cultural aspects of looking at data and visualizations, as well, to try to put together a larger picture. And we build lots of test beds of different kinds of visualizations. And then we try them out with real people, and collect data about whether people get this, whether people don’t get this.
I want to show you an example of three different test beds that we use, with different kinds of data, just to kind of give you a flavor of the kinds of work that we’re doing in this area. The first one I want to show you is something called Photo Mountain. And this is something, obviously, now in the world of digital photographs we’ve all got lots of photographs, and organizing and sorting through them all is a big issue. You can end up with… I’ve got literally thousands and thousands of digital photographs. I know many of you have lots, and lots of photos, as well.
And this is an example of trying to use people’s spatial ability to help them remember things, because particularly in 3D we have amazing spatial abilities, and spatial memory. I can put things around my house, other than car keys, I can remember without ever thinking about something for months and months, I can remember exactly where something is. I know where my coat is. I know where rooms are in my house, I remember how to get to that door. We just have really amazing 3D spatial abilities.
The problem with trying to map that onto a computer is that we don’t have any good 3D input tools. So this is an example of a test of how much of the 3D spatial abilities can we pass into with just a 2D input device. Do we create an inclining plane with these, where I can take these different clusters, and I can move them to the back, bring them up to the front. They get obviously smaller as I push them to the back, larger when they come up to the front. We did some fun things with that, things push each other out of the way, so they don’t actually occlude each other. Then each of these clusters I can take on a picture and it comes up, I can see it in full, I can rearrange the pictures, I can attach whole pictures between different clusters, and see they immediately change color. There are lots of interesting things I can do with this.
Now, we’ve done tests with this, we’ve asked people to organize a set of photos, in fact, when we did this we asked them to bring in their own photos and organize them on this incline plane, and then asked them to go sort of retrieve a set of pictures. And they could do it almost instantly. They distinctly remembered. Then we sent them away for three months, brought them back, where they hadn’t even looked at the thing for three months, and they still remembered where everything was. It’s just a truly fascinating, amazing spatial ability that people have to work with this kind of data in this sort of way.
We’ve also tried this with Internet Explorer favorites, and found that there’s the same effect, that people, if you let them organize things on incline planes, in clusters, or without clusters, just organizing them individually, people just have this amazing ability to recall where these things are. There’s one example.
The second example I want to show you is something it’s just an animated graph technology that was actually popularized a couple of years back by a company called not a company, an NGO called Gapminder. They’re an organization that’s trying to raise awareness of global population trends, things like life expectancy, infant mortality, overall population trends around the world. And one of the things that they really have a lot of fun with is to put graphs in motion. Instead of just having a still example here of what happened in 1975, overlay and put it in motion, the population trends here, say, in this case, 1975 to 2004. So this is a test bed interface we looked at to sort of get a baseline about how that works, and then try some different flavors of that.
So, for example, here is a mapping of infant mortality, versus life expectancy for a bunch of different countries. And I can sort of flip over these and see Liberia, Rwanda, India, bigger bubbles are countries with bigger populations. There’s the United States down here. And if I put it in motion we see a motion sort of down near the right, which is good, because life expectancy over time is going up, and infant mortality is going down, but then you see right about 1994 two countries just sort of go crazy, and that’s sort of the nice and interesting thing about this kind of visualization, is when something goes off into a direction, it immediately catches your attention, and that’s Rwanda and Liberia.
So it’s interesting for us to look at that. But, as researchers we have to question, did you really need the animation to do that, are there other ways to do it, because if you printed it out on paper you wouldn’t have that animation, it’s just less portable in that fashion. So we decided we wanted to try some of the other sort of standard methods for showing data like this. One was the what’s called TRAF, which is to sort of show farther faded out the whole history of where this has gone over time. You can see Rwanda turning around, or Rwanda heading off left, Liberia is coming back up, you see overall the rest of these tend to trend down to the right.
It’s kind of messy when you do that, it’s really hard to sort of pick out the individual parts here. So we decided, we wanted to look at one more example, which is small multiples. You can see each of these individually, a smaller version of that you can still sort of see that one is weird, that one is definitely weird, most of these they started at the bottom right corner, and they quickly move down. With quick snapshots in these thumbnails you get an idea of what’s going on with each individual part of the data set.
So we take this, we run it and the users have the different demographic groups, men, women, age differences, from different parts of the world, because there are cultural and regional influences, to try to understand how we can understand how these different things affect each other. This is a study that’s not done yet. It’s one of the tests that we’re actually using right now. But, this is part of our decision making process to figure out which kind of technologies we should push into our products sooner, because they’ll be more helpful than others.
The third one I want to show you is for a different kind of data. This is for data that’s got a ton of metadata associated with it. An example these user interface researchers used when they created this was a database of all the research papers at the most popular user interface research conference that started in 1982, and is still running, but they actually have data up to 2004 here. So they’ve got 22 years, 23 years of data in here. And they take it to these papers, and all the metadata associated with it, and try to build a user interface where you could sort of flip through it and slice and dice it in different kinds of dimensions, without sort of any preconceived notion about where you might want to go next.
So at a high level, with all the papers in this database I can really start anywhere. I can drill down by topic, by author affiliations. I can drill in, I can say, for example, show me the papers on usability. And you’ll see we spend a lot of time on transitions here. We’ve done some research that shows that transitions between different views, it’s super-super important that people can very easily get lost. They’ll lose sort of where they’re situated in the data. Providing transitions significantly reduces the amount of time it takes people to get re-acclimated to where they are as they’re browsing through data.
So now, we filter down to papers by the topic of usability, and I can see the different authors here, there are four papers in this database, and as I flip over these different papers you see occasionally some of these other ones highlighted in different places. That’s because it’s either the same paper in one of those other categories, or it’s a paper that’s related to a link, a paper citation, something like that, but there’s a link between these two different papers, and they get highlighted, as well. I can right click on any of these and find out about the paper, which authors are cited, who the authors were, more metadata on that. So it’s very easy for you to kind of click through.
In fact, I can continue to browse further in, and you’ll see it will transition up and move another filter up at the top, and kind of zooming down more I can see, okay, here’s papers on usability, really only four over the years, that’s what years they were in. Once again, I can flip through them and see all the other places where those papers show up, or I can choose one of them on the site here, and see which different categories they appear on.
One of the other nice things I like about this is I can easily delete, any of these filters and pop back out, and I don’t have to delete them. I don’t have to pop out the way that I came down. So, for example, at this point I can delete the usability filter, and it will rearrange the whole thing to show me all of Dennis Wixon’s papers across all the different topics, with a very flexible interface to let us kind of move in all sorts of different directions, through data that has just a ton of metadata.
Once again, it’s a test bed, we’re running this through another very active experiment that we’re running right now with users to try to understand what of this works, what do we still need to iterate before we try to put it in customers’ and partners’ hands. We don’t want anybody to have a bad experience with this kind of interface, as well as what kind of training do we have to do with it.
So that’s how we learn about how to present information the right way. The next sort of problem is really, how do we deal with a lot of data, and how do we make sure that we have right context and expertise. And just to give one more sort of historical perspective on this, this is Dr. John Snow. He was a surgeon in London in the middle of the 19th Century, and he was there at a very opportune moment, because as tended to happen every 5 or 10 years in London, in many big cities in Europe, in 1854 there was a cholera epidemic in London. In fact, particularly in the Soho district in London. And John Snow happened to have worked right on the edge of the Soho district. He didn’t live there, but he worked right next to it, so he sort of had a front row seat to this as it happened.
Now, just to put this in perspective, this is before epidemiology, this is before public health initiatives, and most importantly, this is before germ theory. They didn’t know anything in 1854 about bacteria, or viruses, they didn’t know that cholera was a waterborne virus. In fact, the prevailing theory about illness in Europe, and London at the time was what was called miasma, literally bad air. They thought that it was bad air, and that made people sick. And given that in London people would dump their waste in the basement, on the street, in the Thames River, London kind of stunk. And it’s pretty easy to believe that bad air was really the cause of a lot of people getting sick. And, yes, it might have been the cause of some airborne illnesses, as well.
But, it wasn’t actually the cause of cholera, the cholera epidemic. And John Snow didn’t believe that it was the air. He thought that there was something in the water, so he set out to prove it. But, the problem was, there was a cholera epidemic going on, and people were scared out of their wits, because lots of people were getting sick, and lots of people were dying, and they die very quickly. They die usually within about 72 hours of getting sick. So lots of people were dying quickly, nobody really knew what was happened, and nobody really knew what was causing it, other than this crazy theory about bad air.
And John Snow, because he didn’t know these people, and he didn’t live there, really didn’t have a good way to get information about what was happening on the front lines. Fortunately, he met this gentleman, the Reverend Henry Whitehouse. He lived in the Soho district, and he worked there, this was his parish. And he thought that John Snow was onto something. So with his connections he knew everybody in the Soho district, and he went around and he got good information about where people were falling ill with cholera, and they drew this map that you see in front of me.
You see these little dark tick marks, and every little tick marks wherever somebody is falling with cholera, you can see that there are places that a long thick bar, or a particularly place where lots of people fell ill, you can see places where there are just a few, that’s the smaller ones. And they laid all this out on this map, and marked places where there are water pumps, and those are where the water pumps are in town. It became very clear that most of the cholera cases kind of circled around this one water pump that I marked with red in the middle.
So John Snow took this and went running to the town leaders, and they said, well, there are a lot of people who live around that who haven’t gotten sick, and there are a lot of cases of cholera in the outlying areas that have nothing to do with that, so we don’t believe you. So the two of them, Snow and Whitehead, went back out and talked to more people, and got the rest of the story. And they found out two things, they found that number one, mapping right near this mark is a workhouse, there’s a brewery, and it had its own well, and its own water supply. By the way, every one of the pumps is a water pump for a separate well. So they’re all completely disconnected water sources. And the brewery had its own water supply. And there were a bunch of people who lived right near it, who worked in the brewery, drank the beer, and drank their water all day, and never drank from that one red pump.
The other thing they found out was that one red pump was known throughout the Soho district for having the best tasting water. So people would go way out of their way to go collect water from that pump. And when they went back out and talked to all of the outlying cases they found out that those were all people who went out of their way to the red pump to get their water. So they took all that data, and went back to the town leaders, got the one red marked pump shut down, and within a week the cholera epidemic was over.
Thus was borne the public health initiative in London, was borne the field of epidemiology that made John Snow’s career, and it’s really a testament not only for getting a lot of data, getting the right data, getting a lot of data, but also the need to have context, and expertise in order to really understand what’s going on with that.
So we believe very strongly at Microsoft Research, we are trying to do a lot of research around how to deal with very large data sets, and when we do this we try to partner with the expert in these areas. And a lot of the work that we’ve been doing, is sort of ramping out from beyond computer science, to look at other sciences, as well, partnering with life scientists, with physicists, with chemists, with astronomers, and it’s interesting. Today all these different sciences are all data-intensive, computational sciences. You can’t do any science in any of these fields any more without computers, and without collecting and analyzing an enormous amount of data.
So I just want to highlight a couple of projects. One is with Jeff Lichtman at Harvard, where we’re actually building a 3D model of all the neural circuitry of the brain. You can fly through, and slice up, and look at particular pieces of it. You can analyze how particular areas of the brain are connected. You can programmatically walk through this entire thing, too, and use that to sort of build your own models, and test theories about cognitive uses of parts of the brain, super-super interesting project.
The second one is actually it’s two projects we’re doing with the Berkeley Water Center. One of them is actually in cooperation with the largest environmental data repository for North America, and the largest environmental data repository for Europe. They have been running separately for a long time. We’re working with the two of them and the Berkeley Water Center to create this thing called FluxData.org, which is one global repository for chromatology, hydrology, earth sciences, every kind of environmental data you can think of, put it in one place, designed to use global modeling, but they can also do better modeling across all these different kinds of data, and get a better sense for how the environment as a whole is working, as opposed to just individual parts of it.
The other project we’re working with Berkeley Water Center on is looking at the Russian River in North California, which is a threatened river. The water sources are drying up, the salmon are having a harder time swimming upstream to spawn. So working with them to build what we call sort of the digital watershed of the Russian River.
Once again, we’re taking some of our Sentry Network technology and spreading it out up and down the river to collect information about what’s actually happening in real-time, and build a model over time of what’s happening with water flow, what’s happening with various forms of life, with flora as well as fauna, and trying to build as complete a picture of that whole watershed as we possibly can.
The third project I want to talk about relates to astronomy, and this is one place that I particularly want to highlight the sea change that’s happened over the last 20 years, because for hundreds and hundreds of years astronomy was about being at that telescope in the middle of the night to take an observation, but it isn’t anymore. Now we have hundreds of instruments on the ground, we’ve got the Hubble up in space, we’ve got visual telescope, we’ve got X-Ray, microwave, radio telescopes, all digital, all collecting data 24/7/365, and they literally have petabytes and petabytes of data now that they want to share. One of the cool things about this, and one of the reasons why we’ve gotten involved with astronomers over the last seven or eight years is because, as they say, their data has zero commercial value to anyone, so they can completely share it with each other, with us, and it’s a great, huge dataset, but it’s really easy to share and get access. Actually the biggest problem we have is transporting from one place to another.
One of the things we’ve been doing with the astronomy community is to try to get all that data out of flat files and into databases up on the Internet, so that programmatically they can do queries over it, and so that we can take more and more of the disparate kinds of data and start to show it together. Some of you may have heard of this, it’s done a little bit of a press thing, it was a Worldwide Telescope the last couple of weeks, I actually got permission to show you it this morning. So let me go ahead and fire that up. One of the very cool things about Worldwide Telescope is, this really is a case where we’ve gotten data from all these different sources, and a lot of the research that we’ve really put in it, there’s real research behind this, is building a coordinate system and an overlay system that we could actually see all this data, all this different kinds of data, within the same frame.
So welcome to the night sky. This is the Worldwide Telescope. And you’ll see different constellations of stars, and highlight them in yellow, but the red is to show the connections to the constellations. That’s actually the sun. I can zoom around, and see these different aspects. I can zoom in on different parts. We’re actually pulling down data, it’s caching locally, pulling it down on demand from a server farm out on the Internet. I can move up here, for example. As I look at any of these different constellations, any particular elements that have been highlighted in the database show up at the bottom. And as I sort of flip through it will show them to me, I can sort of click on any one of them, and it will transition me in and zoom over to it, and you’ll see it will bring that data down, it’s arriving in higher and higher resolution behind as we pull this down.
I’m going to zoom back out here. We have actually a ton of different sources of data here from Hubble, from Kandra, from Spitzer, some larger sky surveys. For example, I can look at the background microwave radiation of the universe here. I can look at infrared dust. One of my favorites to look at is actually background X-Ray radiation, because you catch things like this. That is the remains of a supernova. I can zoom in and, in fact, a burned out supernova that’s still putting out this massive amount of X-Ray radiation. In fact, I can save back over from there back to the visual, and you can see that’s the visual remains of a supernova. Isn’t that cool? At least a couple people thought it was cool. (Applause.)
Once again, this is a case where we have all these disparate sources of information, we can literally overlay them all. When we talked to astronomers they told us, you know, I work with my small piece of data, and I work with X-Ray data, I’ve never really managed to sort of take that other data and overlay it. We worked with astronomers to build this, because they really have the expertise on how to do this kind of thing. They told us, it’s new for them, and it’s super exciting for them, because they’ve never really ever had the chance to pull all this data together in one place before.
So one of the things that we think is important about doing this is, you know, it’s easy to get lost up in space. So we’ve created a way for people to build guided tours, whether they’re an expert, or whether it’s you or me. So Alyssa Goodman, for example, an astrophysicist at Harvard, and she built a guided tour for one of her favorite things, which is dust, not dust in sort of the Golden Compass notion of this, but literally cosmic dust. And she could author this whole thing as part of the visual experience authoring system that we built into this very simply and very easily where she can say, we’re going to go in here, you can see right now she’s switching between different data sources, and you can see the visualizations, she can pop up images on top of it. She wants to make the point right now about key sensing cameras and telescopes that are the same as the key sensing camera you might take a picture of your cat with.
One of the cool things about this is, we’re really actually navigating to Worldwide Telescope Live. At any point, I can stop this, stop the guided tour, and I’m live, I can go zooming around, and pick out something nearby, and whenever I want I can just start it up again, because it’s all authored within this environment. So we’ve been having a lot of fun with this.
I want to show you one more thing here, actually I’ll just show you this. We’ve been actually working with Virtual Earth to start to incorporate Virtual Earth data in that as well. So this is the Earth, and I can fly around this, and zoom in on a particular area, and you’ll see it pulls down more image, and gets more details as I come in. I wanted to show you that real quick, because there’s actually one more guided tour I wanted to show you, which is this one right here. This is Benjamin, he’s six years old. He wrote his own guided tour.
Okay, I’m just going to stop it there for a second. We didn’t actually help him with that. He did it all himself. His folks didn’t help him with it. That’s all Benjamin. (Applause.) But it makes us super happy, because we really wanted this to be an environment where amateurs and experts together could build a community, anybody could build one of these guided tours, and really share their thoughts and their explorations of space with each other. So we’re trying to build a way to bring expertise into this. We’re going to build the community portions as well. This isn’t released yet. We’re going to put it out later on this spring. We’re still madly trying to build out the server farm, and fix a few more bugs in the interface. But we’re super, super excited about this, and it’s a lot of fun for us. We’re going to release it free of charge to everybody, for research, for education. (Applause.)
I want to talk about a couple of interesting opportunities. One of the opportunities I want to talk about sort of with data is sort of riffing off Surface a little bit to look at, are there ways we can make our data more tangible, we can literally interact with our data itself? Taking some of the notions of Surface, and kind of moving them forward. Surface, actually, as I mentioned before, came out of Microsoft Research, but we’re not done with it. We continue to look at different ways we can build kind of Surface containing devices. If you start the video here, Andy Wilson is the key researcher who actually created Surface in the first place. He has been looking at, can we turn to any tabletop into a Surface computing device? This is a rig he built that’s got two cameras, an infrared one and a Webcam one, and a data projector that bounces off a mirror and will display an image down on a surface. So just like a Surface computing device that has all the stuff underneath, you just pop it down on a table, and your table is a Surface computing device. And you can have this seamless interaction of physical and virtual objects together. You see he’s got these little disks in the bottom left corner, each has a little barcode, little CD barcode on them, when you drop them, the cameras see them, can read the barcode, and uniquely identify them, and their position orientation, and move them around. You can get a lot of different objects in there that all can be uniquely identified by the system within there.
You can take a piece of paper and drop it on there, and suddenly it’s a video playback surface. All these things can sort of seamlessly work together in this environment. That’s a start, but that’s just where we start with this, right? (Applause.) We’re thinking about this with sort of mobile phones as well. Bill Gates has shown over the years a number of examples with Surface computing where you take your phone and you drop it on the environment, and it sort of has a Bluetooth chat back and forth with the system. In this case it recognizes, oh, I see a phone, talks with it, can potentially log you in, can pull your photos off of it. We’re showing how these new virtual objects you can work with on the Surface, it may be just simply dropping your phone on the Surface is how you log in to the environment as a whole.
We’re actually even thinking beyond that. What happens if we take two of these Surface computing devices and connect them together, maybe two people separated a long distance away. Does this become either a fun environment, or potentially a collaboration environment in a workplace? Here’s sort of one example on the fun side of two people playing checkers, where each person has a checker board, and half the set of pieces, and they can see the other person’s checkers, and play the game together. Now, I grant you, the ghost hands are kind of spooky. That bugs me, too. But it’s a lot of fun and it really works. And we get all sorts of other research things out of this, too. A ton of vision technology goes into this, even things like can you take the checker board and turn it, and can we keep them aligned as different things move within the environment.
But we really want to think beyond just games for this to really looking at work and collaboration as well. Could we take, for example, a document, and have two people mark it up together, literally by putting it on the Surface, and this is just sort of a simplistic example of that, can we have two people drawing a picture together. And, in fact, you can do it. And the goofy thing about this, when you step back and think about it is, neither one of them actually has a full picture on a piece of paper. It’s a shared surface, and the computer has the entire picture of it. It’s capturing everything. But either one of them, if they pick up the piece of paper, they’re only going to have half the picture on it. But it really does work. And it’s a fun sort of experimental environment that we’re looking at to try to see where collaboration and a very tangible way of working with your data could be going in the future.
Okay. So last point I want to talk about (applause) fun stuff. Last point I want to talk about is really sort of thinking about how these different kinds of technologies actually change our culture, because every time you put a new technology in an environment, whether it’s a home environment, or a work environment, you actually change the culture. Think back to when copy machines first showed up in a work place, and it really did change the way we work, but it changed the way we interact with each other, and people hung out at the copier. In some cases it became great sources of frustration for people, but it really changed the environment. Think what cell phones have done, just the ubiquity of everybody having a cell phone, and it’s almost become okay now to walk around with a Bluetooth dongle hanging out of your ear. There are cultural changes that come out of it.
A guy named Melvin Kranzberg, who wrote six laws of technology in society, and one of them is “invention is the mother of necessity,” which is an interesting sort of twist on the classic way we say it. And what he meant by this is that every time we put a technology in an environment, it uncovers a new set of needs. Sometimes the needs are on the technology itself, and sometimes the needs are actually just simply uncovered by the technology. For example, think of the printing press, where when the printing press was invented, suddenly there was a much greater demand for paper. And as people started ramping up their creation of more paper, manufacturing more paper, suddenly there was a need for more logging, because we needed more trees. Right, there’s a downstream effect. We create a technology, it creates a new set of needs, we invent new technologies to satisfy those needs, which then uncover a whole new set of needs, and this just keeps on going on and on forever.
Within sort of the data intensive world view, and the cultural changes that that’s been causing, within the computer science community we think about this in terms of what concepts from information processing are people going to need to be able to just think about how we work with data, think about how we manipulate it, different ways you want to slice it, what are the underlying conceptual models of something like Excel, or a general ledger, or all the different ways that people are working with data now. How much of just the notion of a database has crept into our address book if it’s gotten really large, or all the different sources of data that we keep and carry on in our personal and our work lives.
There’s actually a woman named Jeanette Wing, who is the Program Director, National Science Foundation, who back when she was at Carnegie-Mellon University coined this notion of computational thinking, which is literally the concepts in computer science that everyone is just going to have to know to survive in this data intensive world where computing and data just permeate our entire lives. And Microsoft Research has actually funded a center for computational thinking at Carnegie-Mellon to explore this notion of how we teach these underlying computational concepts in our educational system to help people thrive in this kind of world. And actually on our own, beyond sort of this collaborative work we’re doing in academia, we’re doing some context on our own, and one of them is the project we call Beaucoups.
And Beaucoups is, at its simplest level, it’s an Xbox 360 game. But it’s an Xbox 360 game that’s about teaching the concepts of programming without forcing kids to actually write code. So, if you can start the video, I’ll give you a quick snapshot of some of Beaucoups here. Beaucoups is the little 3D character in a 3D world, and you program it. You program it by telling it through a set of menus the kinds of things you want to do. So you can say, if you see a fruit, then move forward towards it. And when you touch a fruit, go ahead and eat it. So it’s just a very simple sort of game controller notion, you can do that. Then you tell Beaucoups to go for it, move towards the fruit and he’ll eat it; a very, very simple kind of control like that. And we can get much more complicated, we can do iteration, and incursion, we can do some concurrency, and multiple programming. You can have lots of different kinds of robots at different points, and create multiple instances.
And you can get the whole notion of multiprocessing of what happens when you get lots of different things all working dependently. You can teach a submarine to actually navigate above water, or in the water, what happens when it hits obstacles. So we’ve actually taken this, and we’ve been running experiments with workshops with kids. We couldn’t actually get them to stop. We had sort of a three-hour session with no breaks at all, and the kids just totally got into this, but we learned a lot about what concepts are easy for them to pick up, and which ones are not easy for them to pick up, and different things. So we’re iterating there, trying to learn more and more about how we can really teach these programming and computational concepts to kids so that they’ll be available to them for the rest of their lives without really getting to write any code. (Applause.)
That’s pretty much everything I wanted to share with you this morning. I just want to leave you with one parting thought, which is, we can’t go back. This really started with the first caveman who picked up a stick and started writing on the wall of his cave. That was really the start of the information, and it’s just kind of grown from there. It’s grown and grown in complexity. We can’t undo that complexity. But there is this great promise of technology to take those kinds of tools and map that complexity down to stuff that those of us with dumb human brains can actually understand and work with in our world. I’ve shown you some examples of cases where doing it right saved lives, or really brought home how we can save lives in the future. And some of the research projects we’re working on with environmental scientists that hopefully, if we do this right, one day will save the world.
Thanks for being so generous with your time this morning. (Applause.) And I hope you enjoy the rest of the conference, and have a safe trip home.