Microsoft Research Tech Transfers: Better Decisions Faster

REDMOND, Wash., Oct. 31, 2005 – Faster results. Better decisions. Turning insight into action. Those are the capabilities Microsoft hopes to deliver Nov. 7 with the launch of Microsoft SQL Server 2005, Visual Studio 2005, and BizTalk Server 2006, and key innovations in those products stem from contributions by Microsoft Research.

When working together, the new products are intended to enable customers and partners to gain deeper business insight and to take appropriate action throughout their business and software-development processes. The collaborations between Microsoft Research and the teams that produced these products are the latest in a long history of technology transfers from research labs within Microsoft into the products the company ships.

“Our research labs and efforts across the company have created a large portfolio of innovative technologies that extend the reach of personal computing today, with much of it going into Microsoft products,” says Rick Rashid, senior vice president of Microsoft Research. “Our researchers are here to push ahead the state of the art in computer science. When we have great ideas that work, we strive to move those ideas and technologies into Microsoft products as rapidly as possible.”

Fostering rapid, smooth technology transfer through deep relationships with Microsoft product groups has been a top priority for Microsoft Research from its 1991 creation, and the upcoming launches of SQL Server 2005, Visual Studio 2005, and BizTalk Server 2006 are no exception. A dedicated technology-transfer team bridges the long-range research and near-term product-development functions within Microsoft, and that team focuses on building strong, collaborative partnerships between researchers and product teams. The result is a shared vision: seeing innovative research work reflected in improved software products for customers.

“Overall, the collaboration between Microsoft Research and the product teams worked extremely well,” says Scott Wiltamuth, a product unit manager who is part of the team that worked on Visual Studio 2005. “We directly applied research work to our product efforts, with great success from both research and product perspectives.”

Such success is the result of intensive collaboration between researchers and product groups. Asked about the scope of his team’s collaborative involvement with the SQL Server 2005 team, David Heckerman, research-area manager, replies, “many days, weeks, and months of meeting, designing, polishing and testing.”

It’s not quite as simple as tossing researcher-devised code over a transom to a product team and saying, “Hey, implement this, OK?”

“Tech transfer,” states Paul Larson, senior researcher, “is not necessarily code. It’s more about sharing and communicating ideas and algorithms than actual implementation.”

Those tactics seem to be working.

“I only wish all cross-group collaborations worked so well,” says Jamie MacLennan, a development manager for the SQL Server 2005 team.

A few examples of successful technology transfers from Microsoft Research into the about-to-released products:

Data Mining and Visualization

Heckerman and his team, the Machine Learning and Applied Statistics Group within Microsoft Research, have been working on using graphical models for data analysis and visualization. They also are learning about how data can be better analyzed, as is obvious from the list of technologies he and his team have contributed to the forthcoming SQL Server 2005: Decision Trees, Clustering, Sequential Clustering, and Time Series.

They are the result of a long series of innovative research projects, an example of Microsoft placing an extra-long-term bet and seeing it pay off with new, useful features for customers.

“Some of the technologies in SQL Server 2005 began in 1992 as pure research,” Heckerman says. “We picked up practical insights as we went along, and many of those insights made it into this product.”

The technologies Heckerman’s team provided to SQL Server 2005 are based on graphical modeling and Bayesian statistical methods. “There’s a statistical algorithm component and a visualization component,” Heckerman says. “It doesn’t do much good to find patterns in your data if you can’t understand them.”

Making data patterns visible, and thus helping customers make better business decisions, gives Heckerman a lot of satisfaction.

“My favorite part is the graphical-modeling interface, which connects math and statistics with the human mind,” he says. “Our tools translate complex patterns into a visualization that people are very good at understanding. It’s a two-way interface between human intuition and probability theory.”

SQL Server 2005 is a much enhanced product as a result.

“Our collaboration with Microsoft Research has brought data mining to every SQL Server user,” says MacLennan of the SQL Server 2005 team. “Data mining in the past has been a highly expensive, ‘white-coat’ technology, accessible to only the few with the technical training required to use it. Due to this, data-mining technology, although extremely valuable to almost any firm, has been out of reach for all except the few willing to take the chance that their investment will return.

“By taking the work of Microsoft Research and turning it into a commercial product in SQL Server 2005, we have dramatically lowered the bar for our users to take advantage of data mining—allowing them to fully exploit their data resources in order to improve their businesses, experiments, or any other situation where they maintain or collect data.”

The technologies Heckerman and his team—which includes Max Chickering, Chris Meek, Bo Thiesson, Jesper Lind, Alexei Bocharov, and Carl Kadie— provided also enable SQL Server 2005 to help customers peer into the future.

“There’s a predictive component with using Time Series,” Heckerman says. “You can look at business data, at sales data, and extend the patterns into the future. Database tools traditionally have been used to store and look at data, but with SQL Server 2005, users can look into the future.”

MacLennan confirms that the collaboration between the SQL Server 2005 team and Microsoft Research was extensive.

“Far from a simple technology transfer,” he says, “David’s team was actively involved in the design and development of the SQL Server data-mining product from day one through completion. Their input and insight into the variety of issues we ran into allowed us to provide one of the most compelling data-mining offerings in the market today.

“Microsoft Research and the SQL Server team also have continued to work as ambassadors for the technology and the product, educating and evangelizing data mining and SQL Server, directly contributing to customer deployments where such solutions would not have been considered prior to our collaboration.”

Heckerman agreed that the teams worked well together:

“The SQL data-mining team has been fantastic in recognizing that these innovations are useful. They’ve done a tremendous job integrating them into a product that everyone can use, and they have given us great feedback that has helped in our research.”

Another of Heckerman’s passions is working on halting the spread of AIDS. The data-mining and visualization technologies being implemented in SQL Server 2005 can help there, too. “This technology can be used in both business and scientific computing,” says Heckerman, who is using the same advanced techniques in an effort to overcome roadblocks in the hunt for an HIV vaccine.

All in all, it’s been quite a rewarding effort.

“I’ve had great success using these technologies over the years,” Heckerman says. “It’s great to know that, soon, everyone will have access to them.”

Generics: Reliable, Efficient, Flexible

Generics, technology devised in Microsoft Research’s Cambridge lab in the United Kingdom, constitute an extension to the .NET Common Language Runtime (CLR) that enables object-oriented code to be annotated with parameters that indicate how the code can be reused in different ways. The Visual Studio 2005 team found the technology invaluable.

“Generics are the most important language feature in C# 2.0,” says Wiltamuth, of the Visual Studio 2005 team. C# (pronounced “C-sharp”) is an object-oriented programming language designed for building a wide range of enterprise applications that run on the Microsoft .NET Framework.

“For customers,” he adds, “generics provide the benefit of strongly typed collections without per-collection hand coding. The result is better reuse, less code to accomplish the same task, increased developer productivity, and high performance.

“In addition, generics are a key building block for the Language Integrated Query (LINQ) work that we demonstrated in tech-preview form for Visual C# and Visual Basic during the Professional Developers Conference 2005. Generics deliver value for customers today in C# 2.0 and are a key building block for our future C# 3.0 work.”

Generics let developers reuse and write efficient, flexible, reliable code components, including those provided with the .NET Framework 2.0. Generics metadata is understood by Visual Basic .NET, Visual C#, Visual C++, Visual J#, and other .NET language compilers, and generic code is executed efficiently by the CLR.

“The design is widely recognized as being one of the key technical features of .NET 2.0,” says Don Syme, Cambridge-based researcher. “Over the coming years, we believe generics will positively affect the working lives of hundreds of thousands of developers.”

The generics project had its genesis in 1999, when Syme presented the white paper “Language Innovation for COM+ 2.0” to Microsoft’s Developer Division. From 1999 to 2002, Syme and fellow researcher Andrew Kennedy were responsible for the design and the prototype implementation of generics in an experimental version of the .NET CLR, based on the product team’s internal version.

At about the same time, Syme also wrote and implemented the initial version of the C# language specification for generics, and Kennedy wrote the initial drafts of what eventually will be the ECMA International Common Language Infrastructure for Generics. In 2002, the Developer Division made an internal commitment to move toward shipping generics in its next release and agreed to have the code written by Kennedy and Syme integrated directly into the Visual Studio 2005 development tree. Syme, Kennedy, and colleague Claudio Russo had primary responsibility for the design, the development, and the testing of the core CLR support for generics throughout 2003 and 2004, and they remain engaged in design discussions for future releases.

“Microsoft Research was instrumental in shaping the design and implementation of generics for .NET Framework 2.0 and C# 2.0,” Wiltamuth says. “Don Syme’s team worked independently on a prototype CLR and C# compiler while the product teams were dedicated to the Visual Studio 2002 and Visual Studio 2003 releases. This effort expanded when Visual Studio 2003 was released, and the research effort turned into a product effort involving the CLR, Visual C#, Visual Basic, Visual C++, and Visual J#. The Microsoft Research work did not end with the prototype and related academic papers; they drove the design and the implementation of generics in CLR.”

Microsoft Research’s Cambridge-based Programming Principles and Tools group made it happen.

“Implementing generics was deeply challenging, requiring significant alterations to core components under extremely rigorous reliability criteria,” Syme says. “Generics are a great example of the potential for technology transfer from Microsoft Research into the Developer Division, and the project has led to many excellent, ongoing opportunities for interaction among researchers, architects, developers, and upper management.”

Indexed Views

For Paul Larson, senior researcher, delivering fast, reliable query results is like preparing dinner: It’s all in the planning.

Larson supplied indexed-view-matching technology for the impending SQL Server 2005 release. Indexed views can provide massive improvements in database query processing by storing previous requests and analyzing the stored queries when a new query is issued, to see if a stored one can supply some or all of the requested data. That process is significantly faster than if each query is treated as completely new.

“Think of it as pre-cooking a meal instead of preparing it from scratch,” Larson says. “You can cook potatoes for 20 minutes, or you can use potatoes you already have prepared ahead of time.

“In view matching, you get a query, and you need to figure out if part or all of it can be computed faster by using some pre-cooked materials you already have. The challenge is to do this very quickly, even when there are lots of options.”

Such query optimization, the algorithm for which was produced by Larson and fellow researcher Jonathan Goldstein, has found a home in SQL Server 2005.

“Paul is a great sounding board,” says Cesar Galindo-Legaria, a SQL Server development lead who works on relational query optimization. “His work influenced significantly the improvements on indexed-view matching we put in place for SQL Server 2005, which had enhanced functionality and efficiency over SQL Server 2000.”

Larson and Goldstein also contributed technologies to the upcoming version of SQL Server that improve the product’s estimation of the number of elements in a particular mathematical set and that deliver faster, more reliable hash functions and partitioning.

In addition, they wrote an algorithm for efficient, scalable hash partitioning, although that had to be scaled back because of scheduling constraints.

“We do plan to use this work when we enhance the feature in the future,” Galindo-Legaria says. “I have been in close contact with Paul over the past few months, looking into various areas of interest for query optimization. I expect some of the results of the collaboration to be part of future releases.”

For Larson, it’s the view-matching part he savors.

“The part I’m most proud of,” he smiles, “is the view-matching algorithm. It does a very good job. It’s plenty fast and scales to lots of views.”

Related Posts