SQL Server 2000: Data Mining Helps Customers Make Better Business Decisions

REDMOND, Wash., April 24, 2000 — Microsoft today announced the public beta availability of SQL Server 2000, its comprehensive database and analysis offering for building and deploying scalable e-commerce, line-of-business and data warehousing solutions. Along with other sophisticated features, SQL Server 2000 integrates for the first time new data-mining tools designed to help customers make better, faster business decisions. To explain data mining technology and what it means to businesses and developers, PressPass turned to Amir Netz, architect and development manager of Analysis Services in SQL Server 2000.

PressPass: What exactly is data mining?

Netz: In simple terms, data mining is the computerized art of discovering knowledge by examining data. Data-mining tools
“dig through”
large volumes of data, such as tables of product purchases or visits to Web sites, in an effort to uncover patterns, rules or relationships contained within business transactions. Using special mathematical formulas called data-mining algorithms, a computer can extract knowledge and reach conclusions that are beyond simple human analysis.

PressPass: How can people use the data-mining features of SQL Server 2000?

Netz: Data mining can be used everywhere, from e-commerce to retail, telecommunications, banking, insurance and health care. In a Web scenario, for example, companies can sift through data to figure out what type of content, ads or cross-sales opportunities might interest visitors to its site, and then use that information to personalize the Web for customers. Insurance companies can browse claims and predict the likelihood of fraud, depending on specific attributes and historical experience. Banks might use data mining to determine whether certain loans should be approved based on applicant characteristics.

In fact, the applicability of the technology is so wide that it can benefit practically any business with a large database — it helps humans gain insight into data that would otherwise be too vast or complex.

PressPass: What implications does data-mining have for e-commerce?

Netz: Data mining is especially compelling for e-commerce applications. Because Web site visitors interact with computers, e-commerce companies can very easily collect a huge body of information and a rich set of data that provide a “360-degree view” of their customers. This data — which pages customers visit, which options they select on a page, which transactions they perform — is typically compiled in a relational database. Mining this collected data will likely cause intriguing patterns and rules to emerge, which in turn can drive business decisions such as product promotions or bundling strategies. The next step would be to apply this set of rules to customers in real time. In this way, companies can personalize and improve the customer’s Web experience and at the same time increase their revenue potential. Microsoft has embodied this approach to customer interaction on the Web in an initiative called Business Internet Analytics, which relies on both SQL Server 2000 and Commerce Server 2000 for these kinds of closed-loop interactions.

PressPass: What’s the benefit to companies that use the data-mining tools in SQL Server 2000?

Netz: Companies gain valuable understanding of the hidden patterns and relationships in their data. These are things that only a computer can identify because of its ability to examine hundreds of thousands of variables, evaluate a staggering combination of attributes, weigh various outcomes, analyze rules and apply statistical methods. Generating this type of knowledge can help companies make better decisions, automate their decision process, identify and avoid problems, discern business opportunities and optimize their systems. The overall advantage is that companies can increase their top line by selling more, and decrease their expense line by identifying unnecessary overhead and costs.

PressPass: What is unique about Microsoft’s data-mining offering?

Netz: SQL Server 2000 offers two benefits in terms of data mining that are unique in the industry. First, it provides a comprehensive analysis platform that fully integrates with relational databases and Online Analytical Processing (OLAP) databases. Second, it’s designed for an extremely intuitive and easy database administrator (DBA) and developer experience.

PressPass: What is Microsoft doing to help the developer community take advantage of data mining technology?

Netz: Our goal is to transform data mining from a niche technology to a mainstream technology. This can be done only if enough application developers find it approachable. As we integrated data mining into SQL Server 2000, we built a set of interfaces that capitalize on what developers already know. If they understand the SQL language and know how to program in Visual Basic and use ADO [a popular data access interface], they can become data-mining developers within hours, with no prior knowledge of the technology, advanced statistics or mathematics.

We believe that the ability to marry data-mining concepts with relational and OLAP (multidimensional) databases is a huge advantage to commercial and corporate developers because they can easily integrate data mining into their database applications in a natural way, with a flat learning curve. This ensures quick time to market and rapid deployment. We hope that this fast, easy route to data mining will expose hundreds of thousands of developers to this exciting new technology.

PressPass: What data-mining tools are included in SQL Server 2000?

Netz: We support two classes of data-mining techniques called clustering and decision trees. The corresponding typical uses for customers would be segmentation (for clustering) and classification (for decision trees). Both classes can also offer valuable data-prediction capabilities.

PressPass: How did Microsoft decide which data-mining tools to include in SQL Server 2000?

Netz: We determined which data-mining algorithms would be the most popular and useful, especially applied to e-commerce, and we focused on making those very easy to use and highly integrated with both relational and OLAP technologies. As a result of this focus, SQL Server will ship with a scalable clustering algorithm for segmentation and a scalable decision-tree algorithm for classification. More algorithms are under development.

PressPass: What is unique about the data-mining algorithms in SQL Server 2000?

Netz: These are cutting-edge algorithms that have been in research and development for up to five years at Microsoft Research. Some of the world’s best minds in data-mining research work here, and this team developed algorithms that have proved to be extremely scalable, highly accurate and high performing. Our data-mining algorithms were also optimized, tested and validated by the academic community before they went to our product team. This is a great example of how the immense investment that Microsoft has made in its research labs is becoming fruitful in terms of incorporating technology into products.

PressPass: Will you support third-party data-mining solutions?

Netz: Absolutely. In fact, we encourage data-mining vendors to plug their algorithms into our extensible environment and enjoy the platform benefits of security, client/server computing, scalability, user interface and so forth. Because third-party support was imperative to us, we began working with a consortium of data-mining vendors in May 1999 to define a specification called OLE DB for Data Mining. Numerous vendors have already adopted this interface.

PressPass: Can you give us an example of segmentation?

Netz: Say a company is looking at a huge volume of data, such as tens of millions of records of visitors to its Web site. To the human brain, the records are just raw data, but data-mining algorithms can distill patterns by grouping records together based on similar attributes. Soon, distinct customer profiles emerge from these clusters of data. As companies gain the ability to segment and identify certain
“populations”
according to specific distinguishing characteristics or similar buying behaviors, they can begin to present information in customized ways. For example, an online store could recommend products, serve up Web pages or display ads based on customer interests and needs.

PressPass: Can you give us an example of prediction?

Netz: Prediction can work hand in hand with segmentation. Using data-mining algorithms, a company could examine historical data to identify which products or services past customers bought, their credit status, age, gender, city and so forth, then use that knowledge to predict the buying preferences of new customers with similar characteristics. Prediction is commonly used for credit-risk analysis and market-basket analysis [the practice of analyzing which products are commonly purchased together].

PressPass: What else has Microsoft done to make SQL Server 2000 a better overall database solution?

Netz: We’ve focused on several key areas of improvement. First, SQL Server 2000 is fully Web-enabled through XML support. Second, we added scalability and reliability features for e-commerce, line-of-business and data-warehousing solutions. Third, we incorporated enhanced ease-of-use features such as setup wizards, familiar metaphors and intuitive development tools so that solutions can be put in place quickly.

PressPass: When will SQL Server 2000 be available?

Netz: SQL Server 2000 is scheduled to ship this summer.

Related Posts