Ted Kummert: Microsoft Business Intelligence Conference

Remarks by Ted Kummert, Corporate Vice President, Data and Storage Platform Division
Microsoft Business Intelligence Conference
Seattle, Wash.
Oct. 6, 2008

Editor’s note – Nov. 26, 2008 –
The company name Dundas Software is corrected in this transcript.

TED KUMMERT: (Applause.) Thank you. And welcome to Seattle, welcome to the BI conference. It’s my pleasure to be here talking with you today.

I was reflecting this morning on last year’s BI conference. It was last May. I was about five months in this role and this was actually my first major speaking engagement. It was also the place where we came to tell the world what SQL Server 2008 was going to be and when we were going to deliver it. So I’m very excited to be back here again with you, and I’m very excited to be back here again having shipped SQL Server 2008 on our scheduled commitment. Otherwise, you might be speaking to somebody else. (Applause.)

I want to talk to you about a couple of journeys, and a couple of journeys that intersect. Last year, I talked about my team’s mission of building a platform for all data, and we continue to pursue that mission. All data is all types of data. It also means a platform that runs across all tiers from the desktop and the mobile platform and the data center and the cloud. It also seeks to a common platform and rich services to enable you to get value out of your data. That’s one journey we’re going to talk about.

The second one is toward a notion of people-ready BI, and essentially, this is BI that works for everyone in the organization. And that’s from an end user perspective in terms of enabling end users to get their questions answered more directly, but that’s also speaking to everybody in their role from IT to compliance. These two journeys toward people-ready BI and towards the all-data platform clearly intersect.

So we’re going to talk about those two journeys. We’re going to check in a little bit on SQL Server 2008 and where we are so far, and then we’re going to talk about two releases, two releases that we’re going to be releasing in the first half of calendar year 2010, incremental releases that build around SQL Server 2008 and deliver functionality in two key areas, one in data warehousing and taking SQL Server to delivering on data warehouses of all sizes, and the second in the area of end user empowerment.

Guy just talked about self-service BI and us enabling that, that’s what this release vehicle is about. So not only will we talk about it, we’re going to let the code speak in both of those areas, so we’re going to demonstrate a little bit of what we have so far. So let’s get going.

So people-ready BI. It’s about BI that works for everyone. There’s a clear end user aspect to this. As far as we’ve come as an industry, there are still too many questions that require some level of indirection. I need to go ask somebody else for help if I want to get a business question answered. That could be because I need a new model built, I need to do some new analysis, I need a modification to an existing analysis model. I need a new report, I need a modification to an existing report. But every time I have to go ask and somebody else needs to go do work, that costs time and that costs money.

The ultimate vision is about providing a solution that enables end users and the tools they use every day, whether that’s Office and Excel or is embedded capability in the line of business applications I use to do my job in providing the solutions that enable those end users to get their questions answered more directly out of the data they have.

The second part of it is about the infrastructure. Everybody wants to be able to trust the data, trust the integrity of the data, understand the lineage. And the infrastructure needs to fit in, and the manageability and compliance policies of the overall organization. People-ready BI is about providing a solution that speaks to end users, speaks to IT, speaks to compliance. It’s about empowering end users and providing an infrastructure that will enable IT to manage data and manage the infrastructure end to end.

My team’s mission is to build what we call the platform for all data, all data being all types of data, be that structured data, semi-structured and unstructured data, that being the full life cycle of data from birth to archive running across all tiers from the edge to mobile platform and to desktop, in the data center, on the server, in the cloud. And that’s providing a common platform and a rich set of services to enable you to get value out of the data.

We talk about four pillars every time we think about a new product in advancing down this vision, and we made substantial progress in each of these areas with SQL Server 2008. The first pillar, the enterprise data platform is our recognition that you bet your business on these solutions that you build around our platform. It has to perform on the abilities – that’s security, availability, performance – and it has to provide all of that within a framework of low total cost of ownership.

Yes, even today — especially today — total cost of ownership really matters. In SQL Server 2008, we’ve made significant investments here, whether it was in performance, whether it was in reliability and scalability of the product.

A few of my favorite features in the area of security, one is called transparent data encryption, and I like it a lot because it’s exactly what it says it is: It’s completely transparent to existing applications. So, turn on transparent data encryption, and your existing application now gets the benefit of an encrypted store and encrypted backup. Making security simple is another way to help people secure their data.

Another feature was All Action Audit, enabling it to be easy to track all activity within database, and to do it in a low overhead way, and that’s something to enable the compliance solutions within the organization.

The second pillar is one we call beyond relational, taking the database from beyond facts and figures to sights and sounds, because there are new types of data you want to use and can get value of in your business processes and applications.

Some areas in SQL Server 2008, one was the addition of a set of spatial types, geometry and geography. Think of any application or a dataset with a location, getting value out of this type of feature and being able to do analysis in a spatial domain.

Another type we introduced was FILESTREAM, making it efficient to deal with high volumes of unstructured data, and to do so within the database programming model.

Think about imaging data and enhancing a whole class of applications by being able to use that in a performant way within that same programming model.

The third area is about time to solution. We need to deliver a solution that delivers on low TCO, but it’s also about cost of solution, and enabling developers to build richer and richer solutions.

An area of innovation we delivered across Visual Studio and .NET 3.5 and SQL Server 2008 was the Entity Data Model, effectively raising the level of abstraction to the business concept level, so you can deal with code in entities like customer and product, and then the model takes care of the abstraction of the underlying physical infrastructure.

Another area was Language Integrated Query, making it efficient to deal with collection of the data within the code. These two things come together to enable not only quicker time to solution, but richer solution.

Then the last area is Pervasive Insight. And this is about enabling end users to get their questions answered more directly. One place this starts is in the infrastructure, the data warehouse. The data warehouse is the hub of most BI solutions within companies around the world. It’s where the business schema comes together and out there extends the entire BI solution.

We invested in SQL Server 2008 in no area more than we invested in data warehouse scale because of this critical function within the end-to-end BI solution. It is an area where we’ve made a lot of progress. In SQL Server 2005, we like to say that we’ve gone beyond the march to the core data warehouse. We’re no longer just kind of the solution for the surrounding march.

We’ve got several customers with tens of terabyte data warehouses, with thousands of users concurrency, using SQL Server 2005. And in SQL Server 2008, we’ve got many features that are going to take that to the next level, that’s in scaling storage with features such as data compression. That’s in scaling performance with features such as Star Join, improvements to partition table parallelism, improvements in the pipeline like Change Data Capture, all oriented around improving performance in the data warehouse.

Then the third with scaling end user concurrency and a feature we call Resource Governor. And what Resource Governor does, it allows you to basically divide up resources by user in a large work load. The example here is preventing those run-away queries from decreasing the effective availability for the data warehouse. We’re taking big steps in the core, and that core needs to interconnect, so we’ve included line-of-business connectivity to tera data, to SAP, to Oracle, to enable you to extend that solution in a heterogeneous environment.

The last area was in terms of visualization. We announced at the last BI conference that we’d acquired some controls from a company called Dundas Software. And in SQL Server 2008 we delivered rich charges and gauges to the SQL Server product.

So we released SQL Server 2008 to manufacturing in August. So far, we’ve seen unprecedented levels of interest and adoption. We’ve never had this type of downloads at this stage of a release. We’ve got numerous customers in production around the world on applications from the SAP system to the OLTP mission-critical applications, to the large data warehouse.

We’ve gotten a lot of positive feedback from customers on the fit and finish and the feature set that we’ve provided. Another area where we got positive feedback was on the decision we didn’t make. We didn’t decide to increase prices. We didn’t take our new features, carve them out into expensive options. We didn’t decide just to increase prices in the middle of open water just because.

We continued with the same pricing that we had with SQL Server 2005. In the area of decision support and business intelligence, we set some benchmarks as well. In ETL, we set the world record for the one-terabyte data load, loading that in just under 30 minutes, besting the world record prior of Informatic and Oracle at 45 minutes.

And then kind of correlating to our investments in data warehouse scale, we posted our first 10-terabyte TPTH result with Hewlett Packard. And in the industry, we’ve seen great recognition in terms of leadership in both data warehousing and BI.

Now, to bring this to life a little bit, I’d like to invite one of you to come to the stage to talk about your experience with Microsoft BI platform and SQL Server 2008. So I’d like you to join me in welcoming Mark Chafin, chief information officer of Acosta to the stage. (Applause.) Welcome, Mark.

MARK CHAFIN: Thank you.

TED KUMMERT: Thanks for coming.

MARK CHAFIN: Thank you.

TED KUMMERT: Can you tell us all a little bit about Acosta and some of your challenges in providing business intelligence?

MARK CHAFIN: Sure. Acosta provides retail merchandising services to 1300 manufacturers and over 100,000 retail locations throughout North America. We have 12,000 employees, of which 3,000 are mobile users.

Business intelligence for Acosta provides critical insights for both internal and external users. Matter of fact, we have 2,000 regular internal users and 1,000 of our client users accessing the system regularly. Some of the challenges we face include providing BI through the firewall and making sure that our client platforms can handle it.

TED KUMMERT: Okay. Can you tell us a little bit about the Microsoft BI platform and how you use it?

MARK CHAFIN: Sure. We have a four-terabyte relational data warehouse on SQL Server 2008 that we just recently migrated. We have a 450-gigabyte analysis services cube that’s also running on 2008, and we use Excel 2007 for reporting, and SharePoint for sharing and collaboration.

TED KUMMERT: Oh, that’s great. Can you tell us a little bit about your usage and experience with SQL Server 2008?

MARK CHAFIN: Sure. We’ve been using the database engine, analysis services, reporting services, and integration services. We’ve seen a 50 percent performance improvement with reporting services report rendering and some of our analysis services queries are executing eight times as quickly as before.

TED KUMMERT: Wow, that’s great. Thanks for coming.

MARK CHAFIN: Thank you, Ted.

TED KUMMERT: Thank you. (Applause.)

So now I’d like to shift forward and look ahead to where we’re going next, and there are two areas we’re going to focus on. The first is in terms of data warehousing. In our mission statement, we talk about serving the needs of customers of all sizes, and that’s to reflect our commitment to deliver solutions that work for customers from small to large. It’s also a commitment to provide the features that scale to any application from OLTP to data warehousing. And we are on a path to support data warehouses of all sizes. So I’ll talk about that.

Then the second area is in terms of pervasive insight, the journey toward people-ready BI. And we’ve got some technologies to show you that we believe are going to enable an order of magnitude more end users to be able to produce, consume, and collaborate on BI solutions.

So the first area I want to talk about, project management, to be released in the first half of calendar year 2010. We’ve been on the journey for a while now in data warehousing. SQL Server has come a long way. We’re serving the needs of the core data warehouse in many companies around the world. One of the largest HMOs in the world has over 17 terabytes of data in SQL Server, it’s just one example.

We have a great scale-up solution and it’s always been in our vision to take this and scale it to the very largest of data warehouses, and to do so by scale out, that’s always been in our plan. This past July, we announced that we intended to acquire Datallegro Corporation, a data warehousing company delivering those solutions on industry standard hardware.

We saw in DATAllegro a company that had customer successes in the very large data warehouses. TEOCO is one example with a 400-terabyte data warehouse in production. We saw a company with a similar approach to ours in terms of building and enabling solutions that would run on off-the-shelf industry-standard hardware, enabling very low solutions costs of hardware and software together at 1/5 to 1/8 of what others were charging for comparable solutions at the same level of scale.

And we saw a company that had a common technical approach in terms of how we saw SQL Server moving toward a scale-out model, they had a common technical approach. We’re moving forward on this rapidly. In Project Madison, we will be providing a version of SQL Server that will scale to the hundreds of terabytes, and from a manageability perspective, this is SQL Server. It will deliver on low total cost of ownership as we always have. And from a BI perspective and fitting into our overall BI platform, it’s SQL Server.

So since the acquisition closed at the end of August, we’ve been working pretty hard on this, and we already have something to show you. So in order to do that, I’d like to invite Jesse Fountain to the stage. Please join me in welcoming Jesse. Welcome.

JESSE FOUNTAIN: Thank you very much, Ted. Hello, everyone. It’s actually a pleasure and a privilege to be in front of you presenting a very early glimpse of the integration attempts that we’ve made from Datallegro to SQL Server.

This particular effort has been a Herculean effort for us, but it’s been a very fun and a very exciting effort, indeed. In this very short period of time, we’ve managed to build a 150-terabyte database, running as a true MPP, or massively parallel processing system, using 24 separate instances of SQL Server. Also during this time, we’ve been able to take advantage of some of the latest and greatest advances that we’ve made in SQL Server 2008, including Star Join optimization as well as data compression.

This architecture, then, sits on top of a set of standard reference platforms, as Ted had mentioned earlier, and it combined SQL Server and Windows, and then the Datallegro technology which is what ties all of this stuff together is what we’re actually offering as part of our integration.

Let’s take a look at our reference architecture to begin with, this is actually what Project Madison is all about. On the left-hand side, we have ENC storage coupled with compute nodes or servers from Dell in the middle. Together, these two components actually make up what we call a compute node, and this is where each individual instance of SQL Server is actually installed.

On the left-hand side, you see what’s called a control node. And the control node acts as the brain of the system where he actually manages the entire balance of the system, manages and makes sure that each of the individual instances of SQL Server are running fine, and performing as a homogeneous database.

After the appliance is actually built, we then went about generating 150 terabytes of data. And the very first thing you should see up there, and I was told to do this, is one trillion rows of data actually loaded in that central fact table. To the best of my knowledge, this is the largest instance of this particular table in existence today, and then surrounding the fact table, of course, are all the dimensions that we generated to fill out a 150-terabyte database. This is actually a very large database. It took us about 75 hours to generate and to load this data and get everything tuned for a massively parallel system.

It’s all about data distribution when you look at data systems this large. Let’s talk a little bit about how this data gets put out on the system. Starting with the fact table in the middle, what we do is we slice it up into evenly distributed tables across the entire system on each of the nodes. And then we take the smaller dimension tables and we actually copy or replicate them onto each of the nodes of the system. This gives us immediate performance so that all the joins that are running on the system are running individually on individual SQL Server instances.

So that’s enough talk. Let’s now switch over and show you a little bit about how Madison actually works. What we’re showing you here is reporting services and a fictitious company that wants to analyze 150 terabytes of raw data and turn it into raw information. On the inside screen, you’re actually seeing a client, and then on your outside screen, you’re actually seeing what we’re calling the performance monitor. This is a graphic representation of the entire system.

As I said, we had 24 individual instances of SQL Server, so you’ll see 24 little boxes down below on that screen. On the top, we have what’s called the control node, and the little green lines that you see running vertically are actually the CPU cores, and then the horizontal line you see under each of them is the actual disk or the IO that’s running on that particular server. This is truly a shared nothing kind of architecture where things like CPU, memory, and storage are not shared across the entire system, instead, they’re used individually and tuned individually for the individual SQL Server.

So let’s begin the actual demonstration by showing you the query that we’re running on the first core. It’s not a whole lot of fun showing a query, but here we go.

TED KUMMERT: It’s fun for me.

JESSE FOUNTAIN: So the very first thing we want to show you is that this is real. We’re actually going ahead and flushing all the buffers, we’re not doing anything from cache in this particular query. And then what we do is we select the columns that we want and the aggregations that we want from the four tables, joining them together. And then, of course, we are restricting some of the rows to come back so that we’re only getting those people that are profitable over $8,000 worth of profitability, otherwise we’d be turning a whole lot of rows back to reporting services for this demonstration.

So as I start to execute this, you should see on the outer perimeters that once this query actually gets submitted and then starts running on the system, everything goes into work here. We really like this diagram a lot, this is really a lot of fun to show. So what you’re showing is all the cores going into action, all the I/O happening, and notice that you’re seeing very little happening on the control node. He’s really just kind of making sure that everything’s running nice and smoothly. And then because this is a relatively evenly balanced system, as each of those queries finish, we return 250,000 rows back to reporting services, and the report actually renders itself here.

This report we’re looking at is a little more interesting. We’re actually looking at profitability attainment, and we’re trying to look at how much we’re selling by hour of day for children’s clothing. You’ll see that this particular report doesn’t go through as much operation as the last one did, so he goes out and quickly catches the data that he needs, does a little bit of processing in between, and then renders a few sets of rows coming back to the report this time.

And then our last report, which is a little bit more involved, this is looking at a whole lot of categories by size and by gender and by children and so on. And you’ll see there are some different characteristics on this particular report. What’ll happen is he’ll fetch a very large amount of data, we’re actually analyzing a whole year’s worth of data, which each year represented approximately three billion rows, so there was a lot of rows in each of those years. And it comes back very, very quickly. And then renders the report directly.

So that concludes our demonstration, I hope you’ve enjoyed it. Thank you. (Applause.)

TED KUMMERT: Very exciting stuff.

JESSE FOUNTAIN: If you would like to learn a little bit more about how Madison is put together and some of the technical things that we’ve done, we have a room set up down on the second floor. Thank you very much.

TED KUMMERT: Thanks, Jesse.

JESSE FOUNTAIN: Thank you very much. (Applause.)

TED KUMMERT: So what we just saw, 150 terabytes in a SQL Server-based data warehouse, a trillion rows, queries returning in seconds from reporting services, and we’ll have that available in the first half of calendar year 2010. Pretty amazing. (Applause.)

So in terms of how we’re going to deliver this, this past BI conference last year we announced our first set of hardware reference configurations for data warehousing. And this is about working together with our hardware partners to build basically mass sets of hardware and software to enable you to have an easier deployment and management experience of the combined solution together. And we’re continuing that. We’ve got more announcements around SQL Server 2008 that we’re making at the conference.

This is also our approach with Project Madison and how Project Madison will come to market. There are other vendors out there that have chosen kind of a one-size-fits-one approach. They sell you an appliance, they sell you the hardware and the software together as one bundle. Now, there are some things that that delivers on, but seldom when I talk to any customers do I hear, boy, I’d really like something new and different to learn how to manage, some different piece of hardware, some different thing to manage.

This approach gives you kind of the power of choice, multiple hardware reference configurations from multiple vendors where we have worked closely to enable a great experience, the appliance experience, if you will, for deployment and ongoing management. And those two things can come together to enable not only low solutions cost, the choice it enables, but low TCO as well.

So now I’d like to shift to the second area to talk about our journey towards enabling pervasive insight. And as Guy mentioned, our journey toward providing managed self service, a model where end users are getting their questions answered more directly out of the tools they’re using every day. We’re very excited about where we’re headed here. We believe there’s an opportunity for us to dramatically increase the number of end users that are able to get their BI problem solved quickly.

So to illustrate this, we’re going to step outside of the usual slide mode for a bit and go through a fairy tale. So bear with me. I think this fairy tale is going to ring true to a lot of you. So it’s a tale of business intelligence.

As all fairy tales do here, it begins with once upon a time. Once upon a time, there was a company. And like many companies, they had a lot of data. And wanted to be an IT department that everyone loved. I can feel the love, I know you can. The IT team built solutions for anyone who needed them and everyone was happy until one day the HR team needed a new report to compare salary data against industry trends, one of those pesky business questions that needed answering.

And as they always do, they called IT for help. Hmmm, IT said, “Industry trends, huh, that’s not in the data warehouse. You know what, probably shouldn’t be in the data warehouse. And we’d really like to help you, but we’ve got a lot of projects, don’t think we can get that done.” And so HR understood. (Laughter.)

But then someone had a great idea. We could build the solution ourselves. Found some data, we entered in manually. He had a network of friends in IT, they could help him get some of the data. And even when the data was secure, he found a way to get it. I’m sure this never happens. And what application do you think he used? That’s right, Excel. And the result was pretty good. Pretty soon, a lot of people were using it, people you didn’t even know were using it were using it. Some had modified it, but only he knew how it was built, only he knew where the data came from, only he knew what versions of data were in the solutions. IT didn’t even know it existed. So what possibly could go wrong? I think you know, but let’s turn the page.

Well, people may not always be there. Somebody had stopped working for some reason. They could be on vacation, they could have retired. They could be sick or even worse — (laughter) — they might not be coming back. Of course there’s a moral to this fairy tale: As much as IT would like to, they can’t meet all the demands. Users have got to get their jobs done, questions need answering, and they need so in a timely way. They’re going to find a way to help themselves. People are going to work hard and they’re going to bridge the gap on their own, but they’re really looking to us to provide solutions that are going to help them.

They want a solution that works in the tools they use every day, they don’t want to be taught some new area of technology, they just want to get their solution built, they just want to get their job done. And everybody, IT and end users, wanted to fit in with the end-to-end infrastructure, everybody wants to be able to trust the data, and of course it has to deliver on all of the abilities. If you can answer and bridge these two gaps, everyone lives happily ever after, as they always do in fairy tales.

So I think that’s a familiar story, or at least aspects of it are I think familiar to everyone. What I’m about to talk about are some features that we’re on the way to delivering that are going to move toward creating that happy ending.

So I first want to talk about SQL Server Kilimanjaro. So before we get a blog post that says SQL Server Kilimanjaro is the next major release of SQL Server, I wanted to clarify that we’re going to continue on our commitment to deliver major releases of SQL Server every 24 to 36 months, just as we did with 2008 from 2005, we’ll do it with the next major release post SQL Server 2008. That’s not what we’re talking about. We’re working on that, we’re not talking about that today.

What we’re talking about is a release focused on this set of capabilities, Project Gemini in the area of self-service analysis and self-service reporting to additional capabilities — this isn’t rework or rewrite or upgrade of things we delivered in SQL Server 2008, it’s additional and new capabilities to enable this scenario in a release vehicle coming in the first half of calendar year 2010, that’s SQL Server Kilimanjaro.

So we’re just focused on enabling this self-service BI model, and that’s really an end user statement, managed self service saying if IT can also manage the solution, manage the solution while end users are able to get their work done in a self service manner. So let’s first talk about Project Gemini.

So project code-named Gemini, we’ve been working on this since before SQL Server 2008 shipped, so we’ve been underway on this technology for a while. And let’s take part of the fairytale forward. So we’ve been asked by my boss — I worked in HR, I’ve been asked to support some decision making around where salaries should go and I need some data we have, which is where salaries are today, I need some industry data, and I want to bring it together in a BI solution to enable some decision making.

So what do I need. I need something — I need a user experience that works like I expect it to as a part of Excel. So what we’re delivering is an add-in to Excel that will enable you to bring together high volumes of data and build that type of BI solution to do that type of analysis, but not in a way where it’s like a modeling experience. We’re not teaching every end user about star schemas or multi-dimensional models, we’re doing it in a task-focused way that enables them to build their solution without having to understand all the underlying technology. That’s the powerful thing in terms of enabling an order of magnitude more end users to use this technology.

Then the second areas enable high volumes of data to be dealt with in an interactive manner, even down to a desktop. And so we’ve been working on a column-based, in-memory store that will underlie this, and will enable very high volumes of data to be dealt with and perform in an interactive way, even on a typical desktop machine.

And the third area is to enable sharing and collaboration. I want to share what I’ve built with others, I want others to be able to get value out of it, I want them to be able to add value. I want that within a managed infrastructure, and think of that as just like SharePoint. In fact, it’s just like documents. You know, that’s where I go to share and collaborate on documents, it’s also where I should go to share and collaborate with my BI solution, and that’s what we’re enabling, SharePoint from an end user perspective and also from an IT perspective.

IT should view this as just managing another component in the SharePoint infrastructure. So that’s self-service analysis. The next part of it is self-service reporting, and this is a journey we’ve been on since SQL Server 2005. In SQL Server 2005, we delivered the first version of a tool we called Report Builder. And Report Builder was a tool oriented toward end-user reporting and getting some simple reporting tasks done easily in more of an information-worker-focused user experience.

Shortly in the feature pack for SQL Server 2008, we’ll be delivering the second version of Report Builder. And that will unlock all features and reporting services, and we’ve done an upgrade to the user experience, it’s now built around an Office-style ribbon, and we’ve done some wizards to make it easy to get common tasks done.

Now, where are we taking this next? So imagine I’m that user in HR again and I’m going to build a report. And in this report, I also want to include some data from another department. Let’s say it’s the revenue for the company. But I’m in HR, I really don’t want to learn that query, there’s no reason I should have to. I’m not even particularly interested in building a visualization for that. What I really want is to be able to pick up a component and include that in my report easily, and that’s where we’re going next with self-service reporting is enabling a componentized model. Think of a component library. I can share out the components I’ve built for the data I’m the expert in and the visualization I’ve built, and people can build composite reports, kind of a grab-and-go model for building reports. That’s pretty exciting in both the areas of analysis and reporting.

So in order to bring this to life, I’d like to invite Donald Farmer to the stage to demonstrate some of this for you. Donald? (Applause.)

DONALD FARMER: Somebody left this backstage.

TED KUMMERT: Oh, thank you.

DONALD FARMER: I won’t make you hold it. Let’s hope we don’t need it.

TED KUMMERT: Yeah, I’m going to turn you into something else. Actually, we like Donald a lot.

DONALD FARMER: So I have a pesky business problem.

TED KUMMERT: Okay.

DONALD FARMER: And I’m going to solve it with Gemini, and where do I start? I start in Excel, as you said, this is where the power user lives.

So in this version of Excel I have a new add-in called the Gemini add-in that you can see in the top, right-hand corner. And I’m going to build my solution using this. Now, the problem I have is I work for an online movie store where people can download movies, and I’d like to do an analysis to compare the sales of my online movies with what’s happening at the box office. Is there some difference between my market and the classic box office market?

Now, as it happens, IT has provisioned me with a data warehouse. So I have data in the data warehouse, let’s have a look at that. I can go in here and open the Gemini client, which is an add-in to Excel, pull data in from a database and say, okay, let’s go to the Sales table and pull in some sales. I’m going to pull in a few tables at the same time here, some of these are pre-loaded, and here’s the Gemini environment.

Now, the interesting thing about this, of course, is that typically my data won’t be perfect, so I may have to do some data cleaning, and if I need to do data cleaning, here’s the data cleaning ribbon that has the functions for me. Now, this is a demo, so the data is perfect, we won’t do any data cleaning today. But there are some interesting things here. Now, you said very fortunately that the Excel power user also wants to work on a typical desktop machine. This is not running on a server, this is running backstage on a $1,000, eight gig, four quad core machine, the kind of thing I bought two weeks ago myself for just under $1,000.

Let me show you what we can do here. Here is the media table which shows all the different movies. I’ve got geography. I’ve got times. But look at purchases, this is every purchase that has been made in my online store. And let’s just zoom in here and look at that number in the corner. You see it? Twenty million rows in an Excel add-in. I think that’s actually pretty extraordinary. (Applause.)

TED KUMMERT: Yeah.

DONALD FARMER: Tomorrow, Amir is going to be doing a session called New Horizons where he’s going to show you even more data, but I want to get across another point. It’s not just about the volume of data, 20 million rows is great, you’ve got to be able to work with it. And the Excel user, watch this, this column story we’re talking about enables us, for example, to store it. 20 million rows, sort it. I can click over here and maybe do some filtering. Let’s filter it down to the UK, 20 million rows. Filtered. This is incredible functionality for the power user. They can interact with this data extremely easily in this environment.

So this is great for the corporate data. I’ve pulled in my data warehouse data, but of course I also need to get the box office sales, they’re not going to be in the data warehouse. They’re published by Variety magazine or Billboard and they’re not going to put that in the data warehouse. So what I can do is flip through Excel here and I’ve downloaded this from Variety magazine or Billboard. I can just copy that data, go back into the Gemini client and say, paste this in from the clipboard. And now it has my corporate data and I have the data that I’ve created myself on the desktop, so I’m now mashing up the corporate data with the data that I’ve acquired through whatever other means I need to, and it will support many other data sources as part of that.

So I’ve got my data. Now as a power user in Excel, I need to analyze it. Now, where do Excel power users do their analysis? Well, they do it in pivot tables. So this is a pivot table solution as well. This is just classic Excel, this is not anything particularly new here. I’m just going to say okay and give me a pivot table.

Now, in Gemini we have some really cool pivot tables that we can handle. So, for example, let me take the number of tickets sold from the box office. Fill that into the pivot table. Now, I’m going to pull in the media and watch the bottom right-hand side of the screen as I pull in the media by genre. You see that? We infer a relationship between the two tables. We know what you’re trying to do so we can understand the relationship.

Let’s also get in here and look at the purchases from my business and say here’s a number of purchases that people have downloaded. So now I’ve got my data mashed up together, my corporate data and my personal data, and I can say, for example, let’s take a pivot chart of that so that I can compare the two. Now, as that happens, they’re completely out of scale. I’ve got like three billion cinema tickets sold over the years, I’ve only got 20 million sales from my business.

So here’s what I can do: I can take this data and just say show it as a percentage of total, and then I get a better comparison. I can see, for example, that action/adventure does much better in the box office than in my online sales. I can see that thrillers do rather better in my online sales that in a box office. But of course what I’d really like to do is slice and dice this and get some other views of it.

So to do that, let’s slip in here and add a new feature. And this is a new feature in Excel that is specifically made for Gemini, but it’s also available to other Excel users, I’m going to pull in category and add what we call — woops, sorry, wrong one there — drag in category here, and create what we call a Splicer by Category. I can also come in here and say, well, in addition to that, I’d also like to see the rating. And now you can see the way they kind of rearrange themselves there? Very easy to use, very flexible. I can also of course pull in year and I can pull in month and let’s also grab geography and stick that in the vertical.

So now everything’s rearranged for me and I’ve got a nice visual view, and I can start to do analysis such as, well, let’s look at sales in the UK. Turns out that the patterns of sales in the UK are substantially different. Let’s have a look at sales in Canada, and it turns out that sales in Canada are substantially different too.

I can also look at sales by year. Notice this, for example, the slicer for 2006 gives me access to all the months, but these slicers are related. They know about each other. If I look at 2008, you’ll see that only the first six months are available, and I can’t slice by the other six months because I don’t have data for that period yet. So the slicers are intelligent and interactive and also give me this nice visual layout of the data.

So here I am, I’ve created this and what have I created? You saw the relationship appearing in the corner. Well, basically what I’ve created under here, you guys recognize it. I’ve created a star schema. But as an Excel power user, I’m just creating an analysis. Now, I want to publish this and put it out onto the Web for my colleagues to collaborate on. How am I going to do that? Well, again, it’s Excel, so I want to publish it out to SharePoint. Before I do that, I’d like to kind of pretty it up a little bit, so I’m going to apply some graphics and hide the grid lines and make it good looking. So I’m going to click “set theme” and now I’ve formatted this, ready to share.

To publish, I choose to publish to SharePoint, and I can save this up to the SharePoint server. Now actually to push 20 million rows up even in this demo would take a little bit of time, so I’m just going to show you one that we created earlier here. So let’s have a look at this one. And here is the report center with Gemini projects published to them.

Now, you’ll notice that this is kind of nicely rendered and we have many reports here, and effectively because I’m a power user and I’m used to collaborating on a social environment, we have social tools for collaboration. So, for example, we can score the different projects that people have published. I can look at who’s using it, I can even set a refresh rate so that this project will refresh itself automatically from the data warehouse.

TED KUMMERT: Just like SharePoint.

DONALD FARMER: Just like SharePoint. Exactly. Now, when I want to browse this, if I’m a user who doesn’t have access to the original project, I’ll just click on this and choose to see this in a browser, and now I’m seeing my report in a Web browser with a completely thin client. And, you know, the slicers work in the thin client too. So as a power user, I have published a refreshable business intelligence solution that can be browsed on the thin browser throughout the enterprise.

Now, this is incredibly powerful. I think the interesting question is that the other half of the fairytale is to go behind the curtain for a moment, what does this look like to an IT professional? What does it look like to a business intelligence professional? Well, let’s see what it looks like from the IT side.

I’m just going to click on here and open up a little demo of a dashboard that we can provide for IT. And one of the things that’s significant here is, first of all, in the SharePoint environment, when I come to manage this application, well, look what the application is. It’s analysis services. This is a traditional BI solution that I published. We like to say that the BI professional — when somebody starts using Gemini, the BI professional hasn’t lost a daughter, they’ve gained a son. A kind of goodlooking and intelligent son as well, what could be better?

So from this point of view, there’s no new technology stack. This is not an alternative to traditional BI, this is actually business intelligence that is deploying for partners, for IT, for business intelligence professionals, this is analysis services.

Now, the IT department of course in the fairytale, they had the problem. They didn’t know this solution existed, and they didn’t know that people were using this actively. So let’s go in here and look at the operations dashboard we provide IT. You can see which servers are active, they can see which servers will be used over time, how that changes over time. They can see which of these objects that we’ve posted — and we call them sandboxes for now, and it’s essentially the data layer in analysis services and the presentation layer in Excel and Excel services.

Which one of these are getting most popular? And if they’re getting popular, which one of these is actually maybe stressing the server, or which one of these is becoming used so much that the volume of data is growing and the volume of users are growing? And then what can we do about that?

Well, remember that I said this is — it really is a traditional BI solution in many ways that they’re publishing. There’s no new technology here. So when we say that we want to do something about it, if I find that an application is actually becoming so powerful and so widely used in the enterprise, I may say let me formalize that, let me upgrade this not so much in terms of upgrading it for better performance, but formalizing the solution so it’s come from being something that a power user built to being something that IT can manage. And this is what we’re talking about with Gemini, the twin stars are these power users who can work in Excel and the IT and business intelligence professionals who can manage this like analysis services, and like SharePoint. And that’s really the essence of Gemini.

TED KUMMERT: Great.

DONALD FARMER: Before I kind of head off stage, I just want to say one thing: I’ve been around this business for a very long time and I’ve never been so excited to show a product to people. And this isn’t even beta one, this isn’t even our first coding milestone we’re showing you. I’m incredibly proud of the team that built this, they’re doing an amazing job, so thank you very much. (Applause.)

TED KUMMERT: Thanks, Donald.

DONALD FARMER: Thank you.

TED KUMMERT: So the journey continues. I always feel privileged to talk to this particular audience. You and your businesses make differences every day. You know, the work that you do changes the course of the businesses that you’re in. You know, your BI solutions enable the people around you to make better business decisions and change course. We take very seriously our responsibility to provide you better tools and a better platform to enable you to get your job done. We make you better, everybody gets better. With SQL Server 2008 and Office 2007, we’ve taken some pretty significant steps forward. Steps in data warehousing, steps in the visualizations, steps in the integration with SharePoint. In terms of moving down this vision toward not only an all-data platform, but toward people-ready BI.

We’re moving ahead quickly as you saw with Madison, and taking SQL Server to the highest levels of scale, to the hundreds of terabytes and a solution that will be low-cost, hardware and software together and will be low TCO and will fit together as a part of our BI platform end to end. And that’s coming in the first half of calendar year 2010 with the CTP customer technology preview in the next 12 months.

And the second area. We just looked deeply at Project Gemini in terms of self-service analysis. Self-service analysis and self-service reporting coming together in that release, additional capabilities built around SQL Server 2008 and Office to deliver on this end user empowerment, this way of enabling an order of magnitude more end users to produce and consume and collaborate on BI solutions. That too will ship in the first half of calendar year 2010 and we’ll have a customer technology preview in the next 12 months.

So it’s an exciting time and an exciting future. I want to thank you for your time today. I want to thank you for your partnership with us. I hope you have a great time at the BI conference this week and a great time in Seattle. Thanks very much. (Applause.)

END

Ted Kummert: Microsoft Business Intelligence Conference

Related Posts