Speech Transcript - Craig Mundie, Trustworthy Computing - Today and in the Future

Silicon Valley Speaker Series
“Trustworthy Computing – Today and in the Future”
Remarks by Craig Mundie, Senior Vice President and chief technology officer, Advanced Strategies and Policy
Nov. 13, 2002

DAN’L LEWIN: Hi, everyone. For those of you who don’t know me, my name is Dan’l Lewin. I’m responsible for much of the activity that Microsoft has in Silicon Valley and in general business development for the .NET initiative. I work with early stage companies, et cetera.

I’m really pleased to introduce Craig Mundie today for our Speaker Series. For those of you who are new to the Speaker Series, it’s something we do roughly on a monthly basis or so for community outreach, both our own internal campus community but equally important anyone within the Silicon Valley community that is interested in learning more about particular topics that might be of interest relative to what Microsoft is doing.

And with that: Craig’s involvement with Microsoft, just to give you a little bit of a history for those of you who don’t know him, is long and actually illustrious in terms of the different kinds of things that he’s been involved in. Today, his role is that of Chief Technology Officer of Advanced Strategies and Policy at Microsoft. He works for Bill Gates and has broad purview over a whole series of initiatives that are very important to Microsoft and the industry in general.

And with that, the presentation that he’ll make today I think does cover both things that we’re doing as a company but perhaps more importantly some of the broader initiatives that are fundamental to the success of the industry as a whole, particularly as we through these kind of trying times in general where there’s another level of what I would say abstraction in the architecture of interoperability in computing, all about XML and Web services and, of course, with the Internet ever present that becomes more and more important.

So just to give you a little background about Craig, he is known within the company and I think within the industry in general as the champion behind this Trustworthy Computing initiative and I think he’s got some great perspectives that you’ll hear about, but again as part of his background he does work on all of our policy related activities — security, privacy, encryption and general telcom regulation. His early history at Microsoft, which goes back to 1992, was in starting the consumer platforms division. And, in fact, everything that’s going on within Microsoft that’s not PC-centric, Craig has had a hand in starting and initiating and in many cases leading the early, early stage development, and in some cases into maturity, some of the different projects.

Prior to Microsoft he was the CEO of Alliant Computer Systems, which was a very high-end massively parallel system architecture. So his background in computing is deep and his vision, which goes back quite a distance in this industry, has always been about the inevitable outcome of computers as they disappear into consumer-oriented devices and as they become really part of the fabric of society.

So the role that he plays for the company and I think for the industry is an important one because it’s rooted in great insight into major and deep parallel system architecture.

His earlier educational work was at Georgia Tech and has masters in Information Theory and Computer Science.

And so without any further rambling on my part, I want to introduce Craig, someone I’ve known for almost ten years, and I’m really glad he’s here today. Thanks.

(Applause.)

CRAIG MUNDIE: Good afternoon, everyone. This is sort of the one-year birthday party almost exactly. It was last week a year ago that I sat on this stage and gave the first public talk about the Trustworthy Computing initiative at Microsoft. And it had been in development for in some sense more than a year at Microsoft; and in the course of thinking about it, it became clear to me that this was not a problem that we would face alone or could ultimately solve alone, and we had decided in May of 2001 that we would have this trusted computing conference in November a year ago and we did.

And between the time we planned it and the time we had it, of course, September happened and there was a terrorist attack and the NIMDA virus and other things that tended to really change people’s perspective on particularly security — personal security, network security, infrastructure security and security has been perhaps the most visible component of activity in this area at an industry level for the last year or so.

But we’ve really focused a lot, refined our thinking about what Trustworthy Computing is and to some extent how we get there and I’m going to try to give you a status update today.

Despite the malaise from an economic point of view, at least for computer folk, we still continue to think we live in interesting times and the Internet is going to continue to change personal and professional ecosystem. And we are of a mind that most of the computing up to this point has been deployed in enterprise and a lot of the productivity gains that the economies of the world have enjoyed for the last 10 or 20 years does trace a substantial part of it to the effect of IT investment, mostly in medium and large enterprise.

And we believe that this next phase of the Internet, the connectivity of lots of different classes of devices and the ability to have them operate in more of a service relationship one to the next will give us at least the potential for a productivity enhancement that’s akin to all that we’ve got up to this point, so that’s a huge opportunity in its own right. But it’s the personal ecosystems that are being redefined by putting computers and software and connectivity in them that also looks at the margin to be a huge opportunity for us.

So we’ve spent a decade; as Dan’l said, I came here to Microsoft in 1992 to start the work in non-PC computing, and back then it was kind of speculative and today people begin more or less to take it for granted. So every aspect of life now, whether it’s personal or corporate productivity, communications is pretty much being integrated now into this form of computer-based interaction and then entertainment, are all going to be defined by this.

So, the concern that has emerged is will this be caught up short, will it stop to some extent or not happen at the rate that we think it could happen simply because people don’t trust the computer systems. Eighteen months ago, two years ago when I started working on these things and become responsible at Microsoft for our security and privacy initiative, I was at the outset struck by how much tension there seemed to be between the privacy advocates and the security people. The privacy people would tell you that their dream was for perfect anonymity, because then, of course, privacy was assured, and then you’d go talk to the security people and they’d say, hey, the biggest problem we’ve got is all that anonymity because we can’t get any law enforcement, we can’t get any deterrent effect because we can’t track anybody who’s doing anything bad. And at some level if you take it to the extreme, these things are irreconcilable differences.

But the more I thought about it, the more I began to believe that the real question would be this question of trust and not a specific issue of do you feel you have enough privacy, do you feel that you have enough security, but really the question of whether all the factors that conspire to build comfort in the use of a system and the services related to that would, in fact, emerge.

And our concern was that our efforts in this area weren’t adequate and certainly it wasn’t something that the industry at large was focused on and we began to be convinced that even as the economy picked up and we had all this great technology available to us that we, in fact, wouldn’t see the kind of up take that we would dream about.

So a year ago here we offered the first version of a Trustworthy Computing framework and we’ve done a lot to refine it and make it simpler, but we can just think about it, and we’ve kind of reduced it to four high level objectives. We kind of call these things the pillars that hold up this trust framework.

And the four areas are security, privacy and reliability, which are largely technical or product in nature in many regards, and then a broad class of things we call business integrity, which really speaks to the business processes, cultural aspects and operational aspects, with which a company and its brand are represented or reflected to the ultimate consumer.

So security is about making computers resilient to attack, protecting whatever is entrusted to them and then that turns out to be a building block on which you can at least get one aspect of privacy done, which is to ensure that the inadvertent release of people’s personally identifiable information or otherwise confidential information isn’t the result of a security breach.

But then there are many other parts of privacy, which really speak to the business practices and ultimately the intent that a company has anytime that they acquire and hold and manipulate data that belongs to an individual.

A big issue, which frankly at Microsoft we started working on quite a few years ago, was this question of reliability. If you go back certainly to the time four and five years ago and you ask people what we’re most worried about in using at least our software systems, more often than not they would complain about the reliability of the product before they would complain about any other aspect of it, and that was something we took to heart back then and have done a lot of work on.

As time went by, it became clear that as the machines got connected and the uses of the machine became much more intimate in what people did with them, then these gradual concerns about security because of the connectivity and privacy because of the intimacy, if you will, tended to increase.

And so reliability, the third one, and this is a question of just does it do what people expect, if we’re going to diffuse this into every aspect of your life, then is it reliable enough, does it meet the test of being critical infrastructure.

And my contention is that nobody in the computer industry really has stepped back and focused on the question of what’s it going to take to get these things to be good enough that they ultimately disappear in the way people think about them, that just some software, hardware in connectivity tends to support whatever the task at hand is and you don’t worry is it going to work, is it going to crash, am I going to understand how to use it; all these things need to meet expectations.

And then the business integrity question I think at the end of the day will boil down to the question of brand. Today in many other areas people make purchase decisions based on brand. When they think all other things are equal, brand tends to convey something about the value system or the relationship that a customer has to a company and I think that in the computer industry brand has been somewhat of a factor. I contend it will be more and more of a factor as time goes on. And so the values that the brand stands for, the transparency in the dealings that the company has with its partners and customers will all become more of a factor than they have been in the past.

So, to some extent we decided, of course, we have to start at home and we coined a term — I call it SD Cubed or SD3, which stood for three things: secure by design, secure by default and secure in deployment, and I’ll talk briefly about each of those.

In fact, the more we thought about it the more we realized that these three phases, the design phase, the sort of installation phase and at which point a product initially configured and then its maintenance in a deployed state, each require some very careful thought and consideration. And so we started to think about applying this first in the area of security.

So on the design side we really told people, look, we want to go back and think about this differently, we want to raise the bar in our own company and we started about this time a year ago with the Visual Studio .NET group. They’d been in development for about three years. They thought they were about done. But because of this initiative, the executive management of that group brought the group over, we sat down and talked about it, we hooked them up to some people in my group and other specialists from other parts of the company and said, look, we’re going to look at this with a more collective lens and think about what other changes we could make largely to improve the design of the product or ultimately to improve mostly what we could do at that point was to improve this installation default configuration.

So we’ve applied a lot more people to this process. Today, if you look at all the R & D that we do in the company from the pure research to the development and testing, security related things now claim in some groups as much as 40 percent of all the spending, at least from a manpower and equipment point of view.

And we’ve retrained a lot of people. In the early part of the year we actually stopped development on Windows and decided we would train every single person in the Windows organization on these questions about security and trust, put them through special programs to familiarize them with these and give new tools to some of the people involved in both the design and testing work. We estimate that we stopped all development on Windows for about two months. We figure we spent about $100 million incrementally on that particular analysis of both the shipped versions and what we could do to remediate some of the things that we felt were still not good and then making new design choices for both the .NET Server product, which is finishing its development cycle now and is in test and then ultimately for things that could be fundamentally changed in a subsequent implementation of the product, which is now known as the Longhorn release that’s several years hence.

So in general we’ve really put a lot more emphasis and, in fact, in the course of this year virtually every major product group in the company has done a similar security stand-down, where we stop development, we bring people in, we provide special training, we raise people’s awareness of these issues and we basically are in the process of creating a cultural change within the company.

This is not merely something where you go to a few designers and say we want you to get better about this thing. We’ve recognized that to address this we really have to do it pervasively and that represents a bit of culture change. Partly it’s culture because it reflects a need to think differently about the long-term sustained business relationships that we have and the mass market software industry has with customers. In the past Microsoft would produce a product, people would buy it either on a machine or as a packaged product and they would have the right of quiet enjoyment of that product until they chose to buy another product. And so we as a company we would only get paid when we would offer somebody enough new features or in concert with a new machine new features that they would decide to buy again.

And what that does is it creates a tension where we’re incented to put as many new features in as fast as we can to give the most new people whatever it would take to incent them to buy again. And as you do that over a period of 20-odd years you end up with a lot of features that are not used by the majority of people. And when they’re not managed and they’re not administered they turn out to become one of the big places where you get security holes.

And so by changing the way to think about this and, in fact, the snippet — I don’t know if you can read it up there — it’s from a piece of e-mail that Bill Gates wrote to the whole company. He only does this very rarely and the last time was, in fact, when we were finishing the development of Windows 95 and Office 95 and we’d been pondering what the Internet meant to Microsoft and the software business at large and Bill wrote a somewhat now famous memo to the whole company saying, “Look, the Internet, either you’ve all got to be using it or improving it and otherwise your business won’t be relevant here.” And that was this thing that got the whole company acclimated to move to the Internet very aggressively as an objective.

And what’s interesting is Bill wrote this mail in January where he said to the whole company it won’t matter how much great development work and how much great technology we have; if people don’t trust these computers systems and don’t trust Microsoft, then they won’t buy it and it just won’t matter anymore.

So this was a really significant event for our company and I think it has had the desired effect of getting the whole company to think about this sort of person by person and move it forward.

The second of the SD3 was secure by default and here the goal is to ensure that when a product is installed, unless somebody who’s expert makes a choice to the contrary, that we have minimized the attackable surface area and have maximized the integral protections that are available so that the product can defend itself against attacks or is minimally susceptible to errors of administration or management. And so many of the things that we would have historically just turned on so that people could discover them or use them easily are now just turned off or not installed unless people specifically ask that they be turned on or installed.

And so this is really an inversion, as I said, of the way that we historically thought about merchandising the features that were put into products. And you could say there’s ultimately some risk in this from a business model standpoint, but we thought that a couple years ago, and despite all the somewhat pain that was created, we started people moving down a path to a different licensing model, particularly in large organizations. In part, to create a different financial relationship, more like the ones that people had for very large-scale software products where they have some recurring payment that entitles them to the upgrades and future releases, simply to take some of the tension out of the relationship between the vendor and the customer. And we think that all these things over time will conspire to both make the products better, be what people want them to be and to have a smoother financial relationship in the process.

We continue to change things. We’ve reduced the privilege levels that exist in .NET Servers. We’ve actually created new service levels with finer granularity within the core operating system itself. All of these things tend to help mitigate what failures do occur or what breaches ultimately do occur because by having minimization of privilege then even people who do get in by some means have a much tougher time doing anything that they’re not in theory supposed to do. A lot of focus has gone into not only minimizing privilege but making it much, much more difficult for people to find ways to do privilege escalation.

And these are things that you could say the average person wouldn’t know about — and shouldn’t have to know about — but they are part and parcel of making the system such that they’re just intrinsically more secure.

We’ve done a lot of other things. For example, in Service Pack 1 for Windows XP we changed the configuration defaults again for all the wireless local area networks so that if you don’t have encryption on the link even as a basic form of security it won’t actually automatically install and connect you to a network; it will find it and tell you it’s there but it makes you make a specific election if you want to connect your machine to essentially a completely open network. We’ve changed the default firewall configuration so that in a consumer machine it’s always on, even though in a corporate machine it would not default to on because it would break some corporate applications. So we’re using what intelligence we can build into the system to really make the default configuration appropriately more secure.

The third part is secure in deployment and here we realize that we needed to do a better job to provide people tools that would allow them to assess the current state of affairs in all the machines that they currently have, and we took some of these — which we had developed internally or used in our internal corporate operations — packaged them up, and have delivered them for free to the community so that they can do these assessments in a more expected way.

We’ve provided hotline support and had a big, big focus in the service pack for Windows 2000, which is the currently most actively deployed server configuration, in order to really take what we’ve learned and roll it out even into the products that are already shipped.

We’ve done a lot to change the source-licensing program so that a broader class of people, individuals, universities companies have access to these things and can partner with us and be more involved in doing assessments.

And we’ve done a lot both to upgrade operationally and extend the relationship of our security response center to those other security response facilities that exist with various governments of the world and other institutions that do similar things.

Perhaps the most important thing though is the Windows Update facility. Two significant things have happened in the last year. We’ve produced a corporate version of this that allows a corporation to subscribe to the changes, get them staged automatically to a server in their institution, they can qualify them by whatever means they want and then they can automatically deploy them under their own control. That’s mirrored by what in SP1 was just released as the first fully automatic update facility for an end consumer or a small business user’s machine. You can now opt into a completely automatic installation of critical updates and you pick a time like 3:00 AM every morning and if there’s an update it will stop and it will install it and it will go forward with no manual intervention.

And for us this has been the Holy Grail we’ve been after for a while, and one that we anticipate will be more and more required as software and computer hardware finds its way into virtually every device in an average consumer’s life — your television, your car, your telephone; all these things will have lots of computers and connectivity and we can’t presume that there’s an intelligent IT organization around or even a less than intelligent one who might be able to care for these machines. They have to be able to take care of themselves to a much greater degree.

So this notion of thinking about sort of SD3 as important for every aspect of Trustworthy Computing is something we’re now more focused on. So, thinking again in the privacy area, in the reliability area, how we think consciously about the design phase, the deployment phase and the installation phase, all of them working together to build more trust is a key way we’re addressing this.

We’ve also had some significant events that show, in fact, that we didn’t fall off the turnip truck just a year ago and decide we should think about these things. We started three years ago on a common criteria certification of Windows 2000 and just a few weeks ago I went to Washington and in a ceremony picked up this certification.

This is actually a pretty significant result because unlike any of the common criteria work that have been done before, we didn’t just certify a special version of the product or just the kernel of the operating system; we actually certified the standard shipping version of Windows 2000 that everybody gets, and then we actually certified some of the subsystems that run alongside it, in particular Active Directory, which is the thing that allows single user credentialing to be done across a network of machines by virtual private network facility, the IP SEC authentication and link security mechanism and the encrypted file system.

So why did we select those? Well, when we talk to people who really want to build highly secure systems and deploy them in mission critical, particularly government applications, they all want to build distributed systems now. And in the past, even if you had a common criteria evaluated kernel operating system everything else was left as an exercise for the user. There was no way to have a certified or evaluated system that had common logon across multiple machines. There wasn’t a way to have remote access over the Internet in an evaluated system.

So by doing these things we did more of a comprehensive sort of what I’ll call a usable system than we think anybody has ever done before. So we did this and got EAL level 4, which is the highest level of evaluated certification that you can achieve for a commercial product.

We also achieved what’s called Remediation Level 3, where we took the mechanism whereby we identified flaws and can control the repair of systems in the deployed environment, and got that at the highest level that’s been achieved for a commercial system too.

So, we’ve now made the commitment to go on. We think we can now do this on a somewhat more expedited basis, but we’ve agreed to achieve the same level for .NET Server but actually to take some other significant parts of the system that people are interested in for these large-scale distributed applications and get them pushed through evaluation too.

So this in itself is a big effort. It takes a lot of time and money. It has to be done by a third party. The attestation is not done by Microsoft itself and so it requires a level of engagement, transparency and documentation that is quite substantial.

So this took three years, and we hope we can accelerate that a little bit now as we go on, but we do think it’s a significant achievement and it’s all part and parcel of creating the internal processes, the controls that we think will ripple out and have effect. So even if you don’t want to buy or aren’t required to buy an evaluated system, the fact that we did this for the standard product we think will ultimately turn out to have a lot of benefit for just the average company or user who buys it.

One of the biggest challenges we face as a company and to some extent as an industry is depicted by this graph. The various humps and bumps on this graph are the current estimated deployments of the different generation of Windows. And they total something about 400 million active users.

And to put this in perspective, about what the challenge is: the little black line growing slowly, that’s the population of New York City. The big dotted line going up very quickly is the rate of growth of the number of people connected to and using the Internet.

And so what you realize is that the newest systems, the ones that have had all this work done to them are down here in these little slices. They’re the ones that are in the earliest stages of deployment.

And what society is doing and we’re doing as a business is dragging around behind us a giant tail of systems that, of course, were built and deployed quite a long time ago. And this is infrastructure for our society today. And any time you decide to change infrastructure it costs money and takes time. And so it doesn’t matter how fast you push this stuff out.

If we wanted to go out, and some days I think about the challenge that we face and we say, oh, if you have to do this with the conscious effort of real people it would be roughly many times worse than just saying, okay, we just want to get every single person in New York City to do the same thing today to their computer system, please to fix it today. And even if it was just New York City you’d have a tough time. The reality is we have the equivalent of about 30 or 40 New York Cities that all want to in some sense move together or get repaired in one fell swoop.

So we know that in practice it’s impossible for us to remediate the threats that we know exist in the world today in systems that were designed in 1991, ‘2 and ‘3 and deployed in ’95 and which are actively still in use today. It’s interesting the single largest bump on this graph is Windows 95. And while it’s actually shrinking now, Windows 98 has kind of surpassed it, but the newest stuff is still considerably less deployed.

Now, we know that these waves just keep rolling through and they will ultimately change, but it shows how long the threat exists of bad things happening and why it’s not completely possible to fix every old system.

The message here is that there will have to be two tradeoffs that have to be made, and to some extent the events of last September have facilitated us in making one of those tradeoffs or changes.

We have decided that we will begrudgingly forsake certain app compatibility things when, in fact, they don’t allow us to have a default configuration that opts for more security. In the past, the biggest thing that happened to us was IT managers would come to the company and say, hey, all those new features, they’re great, all that new security stuff, that’s great, but whatever you do don’t break my app. So just turn it all off and trust me, we’ll fix the apps and then we’ll turn it all on. And the reality is that never happened.

And so we’re going to tell people that even if it means we’re going to break some of your apps we’re going to make these things more secure and you’re just going to have to go back and pay the price.

And the other thing is that the customers, whether they’re individuals or corporations, are going to have to make a decision about when and how much they spend to get these machines to be more secure. And to some extent you can do it by insulating them, to some extent you can do it by putting things around them or in front of them that protect them, you know, firewalls in some sense. And then in some cases, you can just replace them when you get new machines or new software or both that have intrinsically better capabilities.

But I think one of the things that we say, and even if you look at the national cyber security plan that was put forth, Dick Clark and the people at the White House have realized that security is going to cost some money, whether it’s having a new transportation safety authority to make people feel like they have more security in the airport or spending other things on homeland defense. It isn’t free, and to some extent as the threat models continue to emerge in new ways, then we are all going to collectively have to spend more, both in the development and maintenance of these machines if we’re going to be secure.

This cycle has no end. Just like in the physical world crime hasn’t been eliminated, despite lots of efforts, crime won’t be eliminated here. There’s a lot of focus on whether these are flaws in systems but to some extent we also have to realize that we are not in a state of equilibrium relative to the normal functioning of a stable society relative to cyberspace.

For things to be stable, there’s really three things that have to come into balance. One is sort of the mores of the society. Two is the effect of legal deterrence, and three is the technical approach. And to some extent, all the focus in our industry today is more or less just on the technical approach; you know, are we seeking perfection, are we close enough to perfection technically? But the reality is that we now have a completely homogeneous adoption of these technologies across a world that is very heterogeneous today, in terms of both the mores of the society and the state of the laws as they relate to the things that these people now want to apply the computer systems to. And because this is a transnational phenomena, the traditional role of law enforcement and the role that that has in creating a deterrent effect in terms of people not just capriciously doing bad things to computers is not at the level that it should be either.

So part of the reason to give these talks not just to the technologists but to the policy people is the realization that there needs to be as much energy applied in the policy domain I believe as is applied in some of these other domains if, in fact, we’re going to continue to address this or bring it into a state of equilibrium that people are happy about.

But between now and then all we can do is continually run around the cycle where as the bad guys get smarter and the tools that they have get better, we build better defenses and deploy them into product, and put them back as far as we can and the whole world just keeps going through this cycle.

And I don’t think that even if we thought we could write perfect software that over the useful lifetime of these products that, in fact, they would be able to protect themselves adequately against the evolved attacks. And so I think in this way it really does emulate our real-world experiences.

I’ll talk for a minute about progress on a privacy side and a number of things have manifested themselves in products in the last year or so. P3P, the Privacy Preference Mechanism in Internet Explorer, gave people a level of control or at least visibility into these things that they didn’t have before. We’ve basically ensured that the mechanism, for example, of activating Windows XP is a completely anonymous mechanism. MSN 8 was recently launched and has a lot of tools for parents to basically create more of a partnership requirement between parents and children to establish what it is that is appropriate for them to do — who they can talk to, what they can see — and therefore it kind of goes to this question of mores: Are family values being essentially conveyed to kids as it applies to cyberspace? Do they understand what the difference between right and wrong is in that world as opposed to just in the tangible world?

And we’re trying to recognize that we can’t as a company establish what those policies should be and, in fact, neither are they the same in any two countries. And so, ultimately it has to be reduced to the common denominator, I think, of family, and here we’re trying to go forward and let families play a role in establishing these bounds.

Media Player 9 was I think a technological tour de force, but perhaps one of the most celebrated features was shown here as the first screen, when you active it or run it the first time was where you basically had to select your privacy option. You couldn’t do anything with the Media Player unless you actually went through and made your opt-in selections for what features you wanted to use and what information you were willing to exchange in order to get those features.

And so to me it was with some pride that the company said, look, we’re being very transparent about these things, we’re telling people what their options are, we’re giving it on an opt-in basis. It isn’t buried in some language hidden in the middle of a program or an obscure privacy statement.

And I think privacy in deployment is true too in MSN, where we really show the users what data we hold on them, thereby I think taking away some of the concerns that people had about where they speculate about what we might have or might do with them. We’re trying to be very explicit about that.

We’ve basically been working to support Gramm-Leach-Bliley, the Safe Harbor Agreements and many of the other new privacy regulations that exist and to facilitate those. We basically subjected a lot of what we do to third party audits on a voluntary basis. And my own group has been trying to drive the process of creating a set of indices that become part of the company’s internal measurement system, one of which is the privacy help index, whereby all the employees of the company essentially participate in an online process which allows us over time to make an assessment about whether people are appropriately aware, trained; assess operational issues with respect to privacy and then measure these things so that really they become a management focus at every level in the company. So this is partly how we’re trying to institutionalize this.

We really are adopting these things and using the tools ourselves. We’ve taken Passport and are using it for Microsoft employees and family members. We work with a company called WellMed to put together Microsoft’s internal health portal, which is essentially where the means by which all healthcare services are provided to the Microsoft family, if you will. And here again it forces us through the process of being very explicit about opting in for the sharing of data with healthcare providers, how you make updates and benefit changes, and then having a mechanism for a lot of these mechanisms, because under a lot of the HIPAA provisions and the other things that anybody involved in the medical area or healthcare provision area have to be really very scrutinizable I guess is the way to say it or auditable with respect to your policy and execution in these areas. So we are really living with these things on a day-by-day basis.

Today we did an announcement where we have come up with a new facility, a service called MSN Messenger Connect. One of the trust issues around instant messaging, which has, of course, been wildly popular, you can say there’s actually two issues here. One is if you’re a user of instant messaging and you wanted to actually do something other than casual chat with it, it’s hard to know who it is that you’re really talking to on the other side because most of the IDs for these things are just pseudo-anonymous names and they can be spoofed or you might not be sure who you’re talking to.

So when you go out of the realm of just talking to somebody in a chat room or somebody you bump into and there’s no value being exchanged, where you say, oh now I want to use instant messaging to tell my broker to trade my stock, right, well suddenly it takes on a different character.

Reciprocally, the company who wants to do this stuff doesn’t know how they represent themselves and, for example, if you’re a financial institution you’re under a set of legal obligations to be able to have records of all the communications and be able to hold those records and audit these records. And so, instant messaging in the way it’s been has been essentially a completely out of band communication mechanism, and so corporations don’t have any way to control it, audit it, know whether it’s essentially a leak for information or whether they can even meet their legal compliance requirements against financial reporting.

And so we, as have some other companies, are now offering a service where we’re going to operate managed name spaces within the instant messaging environment. So we’ve worked with a number of financial institutions because they seem to have the first real need and interest in this. And Reuters has been one of the development stage partners here, and they’ve kind of defined for us what they want to be able to do. They want to be able to come and say, look, I want, for example, all the Reuters employees to be able to instant message within the company and still meet all these financial reporting guidelines and auditing requirements and how do we do that, and we can’t let our employees just have these pseudo-anonymous names that appear somewhere else.

So this is a service where for basically a couple of bucks a month per person you can actually host, for example, the Reuters IM domain within the overall MSN messaging domain, and who gets into that IM pool and out of it is essentially automatically controlled by linking that to the domain name security mechanism of that corporation. So the names are the real names, they’re the same user names that exist within the company. So now you’ve got real names in the IM world as opposed to just pseudonyms.

And we think these again are steps moving us toward this notion of mutual trust, so e-commerce can move into these exploitation of the new forms of communication and yet still meet some of the controls that are imposed on these things from a regulatory point of view.

This is just a simple diagram of the architecture of these things. There are some third party software companies that are involved that actually create the logs and audit control mechanisms. That’s their business in this. The MSN Messenger Service and Passport, what we’re essentially doing is allowing the Active Directory entries of a company that has their domain names stored that way to be directly put into Passport as a sub domain for instant messaging purposes. And once they do that they could extend this to customers too and thereby create both internal and business-to-customer instant messaging applications that they have some trust in.

So we think each of these is a step towards this next level of sophistication in the use of Internet connected machines.

On the reliability side, we have a product quality initiative that has been underway for quite some time, where we establish metrics. We have a lot of investment in new software development and testing tools that deal with each of the different both classes of historical failures that systems’ programmers and app programmers produce and also tools that deal with just the vagaries of trying to build and assemble very large-scale software things.

So, pretty much all the phases of development from the test case prioritization, this also affects how we are able to turn patches around for problems that come up that are critical without having to go through a complete retesting cycle: automated checking of the models and the interfaces between modules of a system, compile time, detection of certain classes of bugs in essence where the computers are looking over the shoulder of the programmer looking for patterns that have been demonstrated in the past to generally produce flawed code or buffer overruns or what have you, and then systems to take whole assembled systems and examine them for deadlock and other types of potential problems.

Another thing that’s been hugely successful this year for us has been the Windows error reporting mechanisms. If you run one of the new versions of Windows, if an application fails or terminates abnormally or you stop it abnormally it basically comes and offers you the option of sending a failure report to Microsoft if you have an Internet connection. And if you choose to send it, then we log it, you can optionally ask to be informed about what happens with that, but if not you can be completely anonymous.

But if you choose to send these things in, and it turns out millions and millions of people do send them in, it has allowed us to really get a handle on what kind of problems exist in the real world in the huge panoply of deployed configurations that are out there for which there’s no practical way to test. We maintain labs that have literally thousands of distinct configurations of machines by vendor configuration, bus type, you pick it. We’ve got many permutations and combinations, but that’s still a drop in the bucket compared to the number of permutations and combinations of things that are out there in the real world.

And we also have to deal with the fact that Microsoft’s operating systems are really at the heart of an ecosystem and many other companies are developing not only new hardware but drivers and other things that get essentially married up to Windows at a late stage or even post shipment state of life, and it turns out a lot of the problems come from the operation of that loosely coupled ecosystem of development.

And so this process has allowed us to expedite getting data about what these problems are, focusing on prioritizing which ones get fixed and then using the Windows Update mechanisms to propagate those things back out.

And the net effect is we’ve found some very interesting things. In the second half of ’02 we’ve had so far about 1.7 billion downloads through Windows Update. So people are definitely using it. And with the Service Pack 1 release where it now can be completely automatic, the number of machines, as that little sliver grows, who will be getting this on a completely automated basis will increase too.

There are about 13 million unique users who are getting downloads today in just Windows 2000 and Windows XP configurations.

This is perhaps what people might have expected, but it really has put a fine point on for us how much leverage there is in getting this kind of data and being able to operate against it in a real time way. This is sort of a graph of the number of defects against the number of crashes in rank order. The 80/20 rule clearly applies: 20 percent of the defects produce 80 percent of all the failures. More notable was that 1 percent of the defects produce half of all the failures of every machine that’s in the field that’s reported on.

And so when you look at that 1 percent, we’ve been able to distill them down to a very small number of classes of things that in most cases third party device driver people do wrong and have been able to go back to them now with very specific information that says here’s what’s happening with your machine configuration in the real world, and if you would fix this and we put it out there you can take away a small number of changes can actually eliminate half of all of the failures that have been reported on Windows machines in the last year.

And so the leverage is super high, and getting this closed feedback cycle working with more and more people with broadband connectivity really we think poses a great option for increasing reliability of these systems quickly over time, both because it affects the design of the next version and it allows us to remediate some of these things even for the shipped version.

As we look forward, we face a number of difficult problems. To some extent there’s a lot of processors that are spreading too fast and the system that is really now operating around the world is not one that we designed or anybody designed. It is now one of these systems that exhibits emergent behaviors and we can expect that that will continue.

I think programming as we all know it today and have known it for many years is too error prone, and as the scale of the software continues to go up and up, as the capacity of the computers and ultimately the diversity of the computer type goes up and up, I think it’s going to continue to be more and more of a challenge to manage these things and that’s going to result in us having to put more effort into new tools or new ways to build large scale systems.

People frankly I think are losing ground in this battle. There are too few knowledgeable ones. We’re not actually graduating, if you will, people that are trained in these disciplines nearly at the rate that the society has demand for the adoption of these technologies. Therefore, it will require that that become a more automated or self-administering process.

And to some extent, the kinds of problems that emerge from these networks and even individual machines are really kind of baffling for the mere mortal. The speeds and scale at which things are happening really make it very hard for people to understand what went wrong, the way we build them today and instrument them today, and when you start building them in large scale and you get the connectivity and asynchronous nature of these things it gets very, very hard to reproduce the failures.

And so getting these kinds of reporting mechanisms built in and making systems that are more sort of aware of what these things are and watching for them as an integral part of the design I think will become more the norm as time goes on.

And as is always the case, policy, law lags reality. It’s only when a society has some dependencies or has some problems that the legislative or regulatory mechanisms come in to play. They’re always retrofit, ex post facto.

And so what we’re starting to see is, in fact, I think the regulation of the computer industry where more and more. as society has critical dependencies on this technology, it will decide that it’s got to help us out and make sure that it’s regulated into a state of reliability or privacy or security and there’s a real threat in terms of whether we can sustain the kind of pace of innovation and deployment as an industry that we’ve had in the past if, in fact, people start to, from a regulatory point of view, decide that we have to go a lot slower or be more deliberate.

But I think there are some longer-term fundamentals we have to start and work toward today, having hardware that is both more redundant and error resilient in its own right and more trusted in terms of being able to really build from the hardware up a trusted system. We do this today around the world in various specialty systems. Someone who designed your satellite set-top box is basically building you a special purpose computer that starts with some very fundamental, very carefully engineered hardware security tokens like smart cards and other things and builds up from those to build a system that is good enough in terms of protecting the high value distribution of video content that people make a business around it.

What’s happening now is that all content, whether the value is monetary or otherwise, is now going to be moved in and through these computer systems but we’ve never really engineered a general purpose computer system to be able to exhibit those kind of strong guarantees, and that will be we think one of the next things that will have to happen and there’s certainly work underway to do that.

I think the systems ultimately will be more modular and loosely coupled in their construction at every scale, from the low-level components of a machine on up to Web scale applications, and that means that the subsystems will be more autonomous, that the interfaces will almost be exclusively in the form of protocols, message schemas and policies, and that that will be the new way in which people think about writing applications unless the idea that they’re writing to a specific API within a specific machine.

And as I said before, I think that ultimately we have to work to get these things to be self-organizing, self-managing and more or less self-repairing, simply because there’ll be too many of them in the hands of people who couldn’t care less about how they work, just that they work, and therefore they’ll have to do it themselves.

So we view this as a long journey. I think the stage we’re in right now is a little bit remediation where we’re trying to fix some of the sins of the past and make improvements of the things that are already deployed. We’re trying to make improvements in the design, particularly relative to configuration management and default security provisions for the things that have been in development for a while but are now kind of coming to the end of the design phase, and then in the longer term what I think we’re doing at least in our company is architecting these things more for trust, starting with fundamental mechanisms way down in the hardware even and then building forward.

So Longhorn, which will be the next big version of Windows — the rights management architecture, the underlying Palladium, which is the codename for our system working with the hardware folks to create a trusted security environment within the hardware framework — all of these things will be there.

And, to some extent, when all these things become invisible, when people don’t think about them anymore, when the journalists aren’t having to write about them as problems anymore, that is when we will have succeeded as an industry in getting Trustworthy Computing. It will just be part of the fabric of our society.

So I think there is a rising awareness. Certainly it has risen to a significant level in our company. We continue to be an evangelist for getting other people to think about these. It is a set of hard problems. They are not just engineering issues. You can’t just say to a few people in your company, go forth and make everything trustworthy. It really comes down to a cultural acclimation to the need to have these attributes in everything you do and everything you deliver.

We’ve started working at home, but we are working with others. We’ve been party to the efforts with this government and the United States and in other countries. We participated in this development of the national strategy for cyber security with the White House. We’ve been coming up with new products in the source license area to allow governments to take our products and build things that they need out of them.

At the industry level, we’ve been working with third party companies on auditing for privacy, for example, on trust marks that people can apply to Web sites or individual products, which is sort of I think a stage that the industry will go through. It’s sort of like when the electrical industry needed the UL, the Underwriter’s Lab. If you didn’t have that little sticker on the cord people wondered whether or not the product would electrocute them. But eventually people realized, well all these things are good enough now, I don’t even care about looking at the label anymore. But we’re not at that level in this industry yet, and we may need some of these third party attestations in order to help people get over the hump.

Standards I think is an area where we can do a lot of work and are doing a lot of work in wireless and Web services and we’ll continue to be actively engaged in that.

But I think beyond Microsoft, however you choose to think about it, this issue I think will be the defining issue for the industry and if we all collectively want to enjoy the same kind of business results and personal prosperity that that’s brought for the last 10 or 20 years in the high-tech business. I think we are going to have to put some of our IQ to dealing with this class of non-technical problems in some sense and in conjunction with continued refinement of the product concepts and the engineering processes behind them.

And I’m personally quite committed to that. I think you’ll find that Microsoft is as a company very committed to that and we’re paying some price now in order to be prepared for a future that we are sure will come.

So with that, I’d be happy to answer questions for the time we have left.

QUESTION: Can you comment on bridging the gap between the Trustworthy Computing sort of at the operating system and trusted network information systems or trusted electronic systems at the very high business level that embodies information management systems, identity management systems, time management systems and how you have to integrate all of that into a system that enables a trusted digital enterprise?

CRAIG MUNDIE: Well, I guess there’s no simple answer. My view is that all of the things that we build with computer systems today are one of the sort of food chain kind of things. Some guy started with the transistor and then worked it up into the microprocessor and then we put the OS on top of it and then we put the tools in the middle layer.

And what we’re doing is really saying we’re going to give people a firmer foundation on which to address those questions when they have to address them at the level of the application, that at least the infrastructure will either take on more of those things directly or provide you a richer framework with which to address the trust issues at the app level, and that’s why I say everybody has to really focus on this. I don’t think, no matter what we do, that doesn’t mean that an application wouldn’t do the wrong thing relative to either collecting data or divulging data in a way that was inappropriate.

We can give you, for example, a rights management architecture that’s hard core where data is all tagged with meta data that includes the rights and you can have an architecture that would guarantee that they will always travel together, and if you had that uniformly implemented then, in fact, it would be a lot easier for somebody to say, look, this piece of data should never be used for other than this purpose and you wouldn’t have to trust to the correct implementation of an application programmer that they never actually took the data and copied it to the wrong place or sent it to the wrong IP address or whatever it might be.

So I think we are going to escalate substantially the class of the tools the developer has to deal with these things, but at the end of the day you can make a mistake at any level of the food chain.

QUESTION: Two questions really: Could you give us an update on Palladium and can you tell us a little bit about any work on a Digital Rights Management operating system?

CRAIG MUNDIE: Okay. Well, Palladium is a codename that we have for an initiative we started three or four years ago and which we’ve disclosed the existence of but we haven’t actually disclosed the detailed technical specs, so I can’t give you blow by blow exactly what is contained within this architecture. But let me tell you what the problem is we’re trying to solve with this. I alluded to it earlier in the analogy to the set-top box.

Today we cannot make any kind of strong guarantees about several things in computers. We can’t actually make a strong guarantee about what machine you’re actually interacting with. We can’t actually make a strong guarantee about what software you’re actually running on any given machine, and to some extent we can’t make a strong guarantee about who the person is that is interacting with that machine. And when taken together, all these things mean that there are fundamental limits to how much trust you can put into that computer system.

So the function of the value of the data you have and the value could be ascribed to be the monetary value of media content, Hollywood films or music, it could be highly confidential information that belongs to a business or a government, it could be valuable personal information that you’d choose to write, it could be e-mail you consider to be private for any of a variety of reasons that you want to control the dissemination of. We can take some strides in just the software systems and the machines we have today to make that look pretty good, but we can’t really make any strong guarantees about it. So the higher the value of whatever the item is people want to protect, the less they’re inclined to just sort of say, well trust or I assume that everybody will assemble this all in the right way.

So the general problem I just described is the one that we want to control or solve by giving people a set of mechanisms that I’ll generally call Digital Rights Management.

So we don’t think of Digital Rights Management as a separate operating system itself but rather a set of facilities that can be used to create in a variety of applications, whether it’s content or e-mail or whatever it is, some controls on how rights are specified, how those things are recorded, how they’re not separable from the data itself and how all the underlying systems then adhere to those specifications in the way that the data is manipulated, and so long term we want to be able to build that general set of tools.

Our problem, and so we’ve got limited Digital Rights Management things today, for example, the Media Player and the E-book reader have a digital rights architecture, but today people who are determined can easily subvert that because they can change the underlying operating software, they can change some underlying mechanism in the machine and they could subvert it. And that’s why there’s such a big issue and interest in the media industry about these kinds of questions and why they’re willing to trust a special architecture, for example, like a set-top box and aren’t willing to trust a general purpose architecture like a computer.

So the real question is what is it you have to do to a computer such that you can programmatically guarantee that it is as trustworthy as an otherwise specially designed machine. And we think that that has to happen because all the media is going to flow through these things, the general-purpose machines. It won’t be constrained to specialty boxes.

And so Palladium is the project that Microsoft is working on with a number of hardware vendors to fundamentally add new things to the underlying microprocessor and hardware of the machine so that we have this trusted core on which to build a rights management mechanism or other things related to security and identity and to then be able to prove that that is as trustable for that application as any specialty designed system might have been. And so that’s where Palladium fits relative to DRM.

The two are not necessarily bound together. You can have DRM without Palladium as we do today, but it’s a weaker guarantee. You can have Palladium as a mechanism with which to build a whole variety of trustable application mechanisms but may have nothing to do with rights management or media management or any other particular general facility. But for us the idea of hooking them together at least allows us to create a set of general applications that we think have broad applicability and which are an integral part of future platforms.

QUESTION: For the year or two of the Trustworthy Computing initiative, what are the like high priority action items do you see?

CRAIG MUNDIE: Well, I think we will make continuing progress in the security area, new versions of many products will come out, and I think will be ultimately quantitatively as well as qualitatively better in that regard.

We’ll continue to make progress in the privacy area with more and more of both the services and systems being very transparent with respect to the person identifiable information and how that’s gathered and administered and probably more things will move from either unspecified or opt-out to opt-in simply because consumers tend to prefer that in our own products and services.

There will be continued work to try to find ways to be compliant with the emerging regulations like HIPAA and Gramm-Leach-Bliley and even the Sarbanes-Oxley stuff that just happened. I think all of these things will come back and require us to do more work to make it easier for people to comply with these new regulations.

In the reliability area, I think it will probably make some big strides in the next year simply because this process of the closed feedback loop will continue to accelerate, we’ll apply that to more products in the company. Today we apply it largely to Windows and our applications; only a limited number of third parties are actually benefiting from it yet. But I think as more of our customers — the third party software vendors — instrument their apps in a way that they can take advantage of that closed loop thing and we come up with better ways to distribute more than just Windows and Office on an automated update basis, all of these things will basically ratchet overall system and application reliability up another level in the next 12 to 24 months.

And then on the business integrity side, I think for us it’s just always scrutinizing all the processes by which we communicate and touch the customer, whether it’s contracts or support mechanisms or what have you and always now looking at those with an eye towards saying are we doing this in a way that really inspires trust in the company and the brand.

So the last part of that is also making sure that all the service components really meet people’s expectations. So there’s sort of an implicit Service Level Agreement that I think people expect when they deal with these commercial services now and in the world of the Internet those things haven’t always had the same level of reliability that people really want them to have and we’ll probably continue to focus more there too.

QUESTION: Thank you for today. It’s been very interesting. And much of what you’ve said reminds me of the context of the capability maturity model from Carnegie Mellon. You’re at the far end of the maturity model, you’re facing these ongoing engineering challenges, they’re not going to go away anytime soon. The cost of managing those problems is, as you’ve pointed out, large, probably a lot larger than you’ve estimated in these charts. And maybe by this methodology you may never reach what you’ve called Trustworthy Computing.

So my question relates to formal methods. I know that Microsoft employs in its research division a number of reputable characters in that field. Can you say something about formal methods in this picture?

CRAIG MUNDIE: Yeah. In the talk, I alluded to the fact that there are several phases to this. There’s what I think of as sort of the remediation phase, the phase where we’re doing a better job with the techniques we currently have and then a longer term need to have new techniques. And so your question relates to what I think of as the “we need new techniques.”

I do think that to some extent necessity will be the mother of invention here, that the exponential expansion of the capacity, sort of as the product of expansion of computational capacity, storage capacity and interconnect capacity, you know, that creates the container, which is growing and which keeps getting filled up with software. And so you get this exponential increase in how much software people demand.

I think Windows today at 50 million lines of code as a single integrated thing is a huge — it’s an engineering feat to some extent to get it to do what it does today against an array of configurations, even not-yet built machines that it’s supposed to run perfectly on.

And so I do think that we are going to have to move to a more loosely coupled construction of things. That allows for I think — this is my belief, not a solved problem — for the incorporation incrementally of some more formal methods in allowing us to do this. I talked about some of the tools that we have today where we’re trying to actually look at correctness and deadlocking and other things. That’s sort of applying an overlay on the current implementation method.

I think ultimately you can, and we are doing, as you point out, both at the research level and even in some incubation work that we’re doing, we’re looking at whether we can, in fact, adopt some more formal methods in order to address these problems and claw back some ground relative to the exponential expansion of capacity and complexity that comes with the increased code size and the number of people that are working on it.

So I tend to agree or believe that that will be important. We are investing in that area. I personally am quite close to that and I can’t tell you that I see in either the academic world or any corporate environment that I’m aware of an obvious answer on the horizon. No one has had the big “aha” that says “Oh, yeah, I know exactly what to do with any of a variety of the formal method mechanisms in order to make progress at the scale of these systems.” And so there’s always the dilemma of how do you get from here to there.

If you say, “Oh, well, I’ll just jettison everything we already have and use,” we the industry, it’s really hard to get back and yet unless somebody comes up with a really clever new idea and you try to figure out how you can apply it in some sense back to the things we’re already doing to break the back of this exponential growth of complexity, I don’t know exactly how to solve it.

So I’m always open to new ideas. If you have any, come on down. But we are interested, we are pursuing it and we do believe we have to do something. I just don’t think the answer has emerged yet.

Well, thanks for coming today, I hope you found it useful and have a good day.

(Applause.)

Speech Transcript – Craig Mundie, Trustworthy Computing – Today and in the Future

Related Posts