BILL GRAZIANO: Please join me in welcoming Ted Kummert. (Applause, music.)
TED KUMMERT: All right, all right, good morning, thank you. So welcome to the PASS Community Summit of 2012, it’s great to be here with you again. And welcome to everyone who’s joining us via live streaming. Welcome to Seattle. Welcome to the PASS Community Summit.
You have a fantastic summit experience ahead of you. This is our biggest summit ever. I continue to be impressed about how this community invests in itself.
Earlier this week, I was at a pre-conference function. I was having some dinner with some longtime community members. And they made an analogy that the PASS community is like a family. And the Summit is like the family reunion. So if this is your first visit ever to the PASS Community Summit, welcome to the family. And if you’re back for the PASS Community Summit again, welcome to the family reunion.
We get so much value out of this community in how we build SQL Server, in how we help people using SQL Server be more effective. From Microsoft and the SQL Server family, I want to offer our thanks to all of you, all of you volunteers, the board, everybody who puts their time and talent into the PASS community organization, thank you very much for all that you do. (Applause.)
We’re going to spend a lot of our time today talking about the future, about opportunities we see for you. Now, just to ground ourselves, for Microsoft, these are pretty important times for us. Steve has said this may be one of the most significant years in the history of Microsoft. This is certainly anchored by the recent launch of Windows 8, with Surface, the launch of Office 2013, and we have major releases all across our product and service portfolio.
Now, this includes, of course, SQL Server 2012, which released this past spring in April. I have up on the screen a picture. It’s a picture of the team celebrating the day we signed off on the release. We gathered in the cafeteria, it’s one of these great, warm moments. We’ve been working on this release, building it for you, we gather, it’s a very, very cathartic event to sign off on the release. It’s the kind of thing where I wish we could have had you there because you’re a big part of this. This is a release we built for you. You shaped it. You shaped it. But, instead, this picture will have to do.
We said last year we thought this was one of our most significant releases ever. There are lots of features. I don’t really think of it in terms of count of features, but I think of it in terms of the impact this release is going to have.
One of the absolute best parts of my job is the time I get to spend with customers. It’s been great to talk to customers about the things they built around Always On, the xVelocity Columnstore, how they’re using Power View, how that’s starting to change BI in their organizations. Some great work that’s been done. And the momentum continues.
We’re announcing today that the first service pack for SQL Server 2012 is available. This is aligned with supporting our integration with Office 2013 and all this is fueling the fastest adoption we’ve ever had for a release of SQL Server and our record-setting growth in the business. These are very exciting times.
One of the great things about being in this business for as long as I have is the change you see, the change you get to be a part of. Think back five years, 10 years, 15 years. Think about the tools, the capabilities you had, the type of problems you were able to solve. And think forward. Think about what you’re able to accomplish now. Some of that comes as a result of change. Change can be difficult. But change also brings opportunity.
As an industry, we’re talking about things like cloud, we’re talking about big data, and this are some real things here. There’s change here, but that change brings opportunity, and that’s a lot of what we’re going to talk about today.
So we asked some of you, some of you who have been long time in the community just about the change you’ve seen over your career and some of the things you’re excited about for the future, so let’s roll that video.
TED KUMMERT: There’s a lot to be excited about. It seems like these days everyone in the industry is talking about the work that you do.
Now, we’ve always understand just how important the work you do is. The data is the lifeblood of business. Data is the transactional record of the businesses you’re a part of.
The business intelligence solutions you build help people be better at their jobs, help businesses move forward. The work you do is incredibly important. So along comes big data. You know, what does this really mean? What does it mean for us?
Now, there are some that might say it’s technology and there’s a specific approach, but I don’t think that way. But I do see a larger opportunity here. And I assert we are at a tipping point, a point where things are going to be fundamentally different. And it comes together as a result of four factors.
You know, we’ve been talking a lot about the growing data volumes, comes from new types of data, new sources of data. All of these represent opportunities to create new value. We’ve got great hardware innovation. Storage costs continue to fall. We can store more and more data. As Moore’s Law has continued to move on, silicon densities increase. We’ve got an amazing amount of memory and computational power in these systems.
And these start to change assumptions, critical architectural assumptions, behind how we built things. When you can change those kinds of assumptions, dramatic increases in performance are possible. We’re not just talking order of magnitude 10X, we’re talking 50X and above.
There’s new software technology. Software technologies that have matured in other domains, and now can be applied to a broader set of problems. A lot of these came from large-scale Web applications, the problem being storing vast amounts of unstructured data in distributed storage, and the need to reason over that to, say, serve you an ad or serve you a page.
This is where patterns like MapReduce, Hadoop have been applied. Things have now matured to the point we can apply to broader domains, machine learning, it’s matured to the point we can apply it to a broader domains.
And then there’s the cloud. And the cloud really is a way to build infrastructure. It’s a way of enabling the scale, the elastic scale characteristics you need with an economic model that goes along with that.
These four things together. We’ve got more data to deal with, we’ve got an ability to store more of it, we’ve got now new technologies to harvest unique value from it, and we’ve got an ideal infrastructure to run it on.
So what does big data mean? Does it mean MapReduce? No. It’s bigger than that. It’s about new insights. I think it’s new insights on data you have, the latent value that exists in data you have, and new sources of data. That’s what it’s about.
So one emblematic story. In the healthcare industry, there’s a problem called the readmission problem. This is where you’re a patient, you go in for treatment, and unfortunately, sometimes in short order you might find yourself there again. Not really a satisfying situation for anyone involved.
Microsoft Research took a look at this problem. They were given access to hundreds of thousands of anonymous patient records with 25,000 variables, and they applied machine learning technologies and techniques to see what unique signals they could find.
The ultimately led to a predictive system that was actually able to assist in and improve the standards of care, amazing. A new service powered by data and analytics. And it’s very significant in terms of what it in and of itself was able to do.
Now, why am I telling you this? It’s a great example of the idea there’s data you have and new insights to be gained from it. Way back when, when those patient records were begin gathered, no one had any idea that this value was in there. And some new approaches were able to harvest and operationalize that value to improve things for, in this case, that business.
That’s really what big data is about. It’s about new insights for data you have and new sources of data.
So today we’re going to talk about some opportunities we see and that we’re moving forward on. We’re going to talk about business acceleration. Accelerating business process, accelerating time to insight. We’re going to take a comprehensive look at our in-memory strategy.
We’re going to talk about our strategy to manage any data and enable you to gain value from it. We’re going to give you an update on Hadoop and on parallel data warehouse. And we’re going to talk about how we’re continuing to make insights easier for every end user to gain and to do so on any data. We’ve got some great demonstrations and exciting announcements across all three of these areas. So let’s get going.
This is a discussion I think everyone can get excited about. When your business process is more responsive, everyone is happy. When you get higher transactional volumes through the system, everyone is happy. When the queries against the relational data warehouse return more quickly, everyone is happy. And this is really the potential of in-memory technologies. I think we all understand if everything fits in memory, if the full working set fits and you design around that assumption, transformational performance improvements are possible.
You know, we’ve seen that with Vertipack and PowerPivot on the desktop, we’ve seen that with the xVelocity Analytics Engine and Analysis Services. You’ve seen that with the xVelocity Columnstore and the relational engine.
Now, the true power is unleashed, though, as it’s integrated into a data platform completely, and it also considers the needs of existing applications and how they can exploit these capabilities. This is a good example of why when we brought the xVelocity Columnstore into the relational engine, we implemented it as an index type so that you can take advantage of it without implications to your applications.
Now, it’s one thing to accelerate queries. If you really want to accelerate business processes, if you really want to accelerate the full path to insight, you need to consider all workloads that are involved. In the path to insight, accelerating time to insight, you need to take that through from the desktop to the data warehouse. We cover all tiers with PowerPivot on the desktop, with xVelocity Analytics Engine inside Analysis Server and xVelocity Columnstore on the relational engine.
The same thing for business process. You need to consider the entire architecture of the business application, from the application tier through to the data tier. We have the Windows Azure Caching Service in the mid-tier in order for you to offload transactions from the data tier, more scalability, gives you higher throughput, gives you lower latency. And we’re making an announcement today that we’re bringing an in-memory transactional capability to SQL Server. (Cheers, applause.)
Project code name “Hekaton” will sit in the next major release of SQL Server. This is a fully in-memory transactional engine delivered as a part of SQL Server, the platform you know, the application and programming model you know, and we’ve thought deeply how to enable you to bring existing applications into the in-memory world and to take advantage of the performance that’s possible. And this thing is fast.
So rather than talk more about it, why don’t we show it to you? Would you like to see “Hekaton”? (Cheers, applause.) So not only that, we’re going to show you “Hekaton” and also some capabilities we’re adding to the xVelocity Columnstore. So join me in welcoming Shawn Bice to the stage. Shawn? (Cheers, applause, music.)
SHAWN BICE: Hi. Good morning. Good morning. Thank you for the introduction. So as Ted mentioned, there are a number of in-memory technologies that span the SQL platform, and today I’m going to demonstrate two of those to you, the first will be in OLTP, the second will be in data warehousing.
So when you think about “Hekaton,” in a nutshell, this is a high-performance, in-memory database that has been architected from the ground up to take advantage of the hardware trends Ted was just talking about, and it’s built right into SQL.
Here’s another way I can put this. If you have applications in your environment right now where you feel like you’ve done everything you can to increase the performance but you still need it to go faster, let me show you why you’ll love “Hekaton.”
All right, let’s see here. So for this demo, I’m going to basically toggle between two things. What you see on the screen right now is our OLTP demo app. And then I’m going to switch over to Management Studio.
So what’s behind this demo app is what we’d consider a traditional OLTP app, with characteristics like get, put, and commit. And it’s basically processing a bunch of sales orders. So we’re going to use the tool to jam a bunch of sales orders into it and see what happens from a performance perspective.
So let’s go ahead and start the tool. And you’ll see that straight away, we’re at roughly 2,000 transactions per second. Now, if I expand this diagnostic window, you’ll see this green row that represents CPU usage, and we’re using roughly a third of the CPU. But if you look at that red row, that is all about latching. So we have a ton of latches. So let’s go ahead and stop this. And let’s switch over to “Hekaton” and see how we can increase the performance of the app.
So I’m going to jump into Management Studio. Now, I should pause here for a second and say this is not the final experience. As Ted mentioned, this is coming in the next release, but you’ll see the gist of the workflow.
Now, I actually don’t know where this problem is. You could imagine that we saw some latching, and hey, I want this tool to help me find those tables that can be great candidates to memory optimize. So let’s, in fact, do that. So I’m going to run this. And you’ll see, straight away, the tool is coming back and saying that the sales order detail table is probably the best one that we should memory optimize for performance gains.
So we’ll go ahead and minimize this. And we’ll go back to our demo tool. So the team did us a big favor, they basically wired in the script here, and I’m going to go and migrate that table. Now, what “Hekaton” is actually doing right now is converting that table so that it will basically run in memory.
TED KUMMERT: So there’s no application changes here at all. All we’ve done is really lift that table and the associated indexes and structures into memory. That’s it.
SHAWN BICE: Absolutely. In fact, we believe that’s very canonical. So when we worked with customers, we wanted to make sure we had a very good understanding of what we think is representative so that folks can do exactly what you said, just literally convert or memory optimize the table.
All right, so let’s run the tool again and see what kind of performance increase we get. So I’ll go ahead and start it up. And you’ll straight away almost a 10X performance increase. I haven’t changed the app.
TED KUMMERT: Haven’t changed the hardware either.
SHAWN BICE: All I did — I didn’t change the app and I didn’t change the hardware. In fact, if I expand this diagnostic window, you’ll see that we’re using more of the CPU and latches are gone. (Cheers, applause.) Love this stuff. All right. Let me hide this window and I’m going to stop this because the story gets better. Want to see it?
TED KUMMERT: Sure. (Laughter.)
SHAWN BICE: All right. So let’s switch back over — oops, wrong one. Let’s switch back over —
TED KUMMERT: That’s the next demo.
SHAWN BICE: — the next one. Let’s switch back over here. Look, my app is not just made up of schema and data, I have code in my app. And where I’m going with this is I want to ask this tool to have a look at these stored procedures to see, hey, are there some of these stored procedures that we can memory optimize as well to increase the overall performance of the app. So let’s do that.
So I run the tool, and you’ll see here that the tool is basically saying that the insert order detail stored procedure would be a good candidate for us to memory optimize. So I’m going to go ahead and minimize this. And let’s go ahead and migrate that stored procedure. Now, what “Hekaton” is actually doing is recompiling that stored procedure so it will run native in memory.
TED KUMMERT: Right. So no code changes here.
SHAWN BICE: No code changes. This point, I’ve memory optimized the table. We have recompiled that stored procedure so it runs in memory. You ready to run the tool and see what we kind of perf gain we get?
TED KUMMERT: Absolutely.
SHAWN BICE: All right, here we go. (Laughter.) All right, so I’m going to start the tool, and look at that. Almost 30 times performance increase. I haven’t changed the app. It’s running on the exact same hardware, and that’s why you’ll love “Hekaton.” (Cheers, applause.)
All right. I love this stuff. Let’s go ahead and kill this window here, good stuff, all right.
TED KUMMERT: Columnstore.
SHAWN BICE: Columnstore. Data warehousing. As Ted mentioned, I think he said it perfectly. When things run faster, everybody’s happy. And for this demo, I’m going to toggle between two things. I have basically a data warehousing demo app. I’ll switch between this and Management Studio.
Now, where I want to start is pre-SQL 2012. So I think we’re all familiar with data warehousing queries that would result in full table scans, and it takes a while.
Now, I’m actually going to run a query like that. I’m not going to force us to sit here and wait for this to complete. But I’m going to switch back over here, and we’ll just look at a query, an example of a query that will result in a full table scan.
Now, here’s the behavior behind this. If I’m a business analyst and I ask a question, it doesn’t matter if it’s business or you’re at home or anything, but you ask a question, and if I have to wait for a minute, order minute, or order hours to get an answer back, I’m probably not going to be eager to ask another question.
That’s what motivated Columnstore Index. So when we switch over here and look at Columnstore Index, it’s a very, very straightforward syntax. Now, the power of this is that instead of doing a fan-out query where I have to read all rows, Columnstore Index allows us to pick the columns we care about while not incurring the expense of reading all rows.
Now, here’s the power and benefit to a business user. I’ll go ahead and minimize —
TED KUMMERT: And this is in SQL Server 2012 shipping today.
SHAWN BICE: This is in SQL Server 2012, it’s available to everybody today. Now, let me show you the power of that. So if you look in the lower corner, let me do a control one and go down to the corner there, you’ll see that that slow query took roughly 30 seconds. If I escape out of that, and we run a Columnstore Index query against 404 million rows, look at that, about a minute and a half. That’s a huge, huge —
TED KUMMERT: That’s a second and a half.
SHAWN BICE: I’m sorry. A second and a half, yeah. (Laughter.) A second and a half. Huge, huge, huge performance gain. Right?
But it doesn’t end there. This is where I want to give you some insight as to what we’re going to add to this — this is really coming from your feedback.
TED KUMMERT: This is also in the next major release of SQL Server, what we’re about to show next.
SHAWN BICE: What we’re going to show next is literally going to come in the next release of SQL.
So we added two new improvements to Columnstore Index. The first is we make it updatable so we can continuously load data into that data warehouse. (Cheers, applause.) Right? And then the second we support clusters. So we know that a clustered Columnstore Index will give us good space savings because we won’t have a B tree or heap representation on disk.
So let’s switch over. Let’s minimize this. And what I’m going to do is basically run that clustered Columnstore Index just to see — let’s take a look at the query time. And this one ran in a second. Just over a second, right?
TED KUMMERT: I can see that. And it’s doing it over and over.
SHAWN BICE: And it’s doing it over and over, so this query is running over and over and over.
Now, let’s go ahead and I’m going to basically run this load tool that’s going to take data out of our TP system and just continuously load it into this data warehouse.
So keep an eye over on the left-hand side of the screen, rows added. So I’m going to run the load tool here. And what’s happening right now is we’re taking data, and now we’re inserting it, updating it continuously into that data warehouse. You’ll see that the syntax was relatively straightforward, and you just saw about over a second, we’re reading over 405 million rows. So that’s the power of Columnstore Index.
TED KUMMERT: Great. (Applause.) Thanks, Shawn.
SHAWN BICE: Thank you.
TED KUMMERT: Awesome. (Applause.) So Project “Hekaton,” next major release of SQL Server, the enhancements to the xVelocity Columnstore, next major release of SQL Server, very exciting.
Now, we’ve been working very closely with customers in the development of “Hekaton” to make sure it was right and make sure we had the right surface area. One of those customers, Bwin, has a very impressive transactional system that runs on SQL Server today, it’s the backbone of their business. They’ve been working with “Hekaton,” and so we’ve got a short video of them talking about their experience with “Hekaton.” Let’s roll that video.
(Customer Video Segment.)
TED KUMMERT: (Applause.) Great. I love that video. I could almost watch it again, but we’ll move on.
Three things I’d like you to know about our in-memory strategy. Incredibly fast, wicked fast. Number two, if you know SQL Server, you know our in-memory. Number three, we’ve implemented this comprehensively, thought through the complete path to insight, the complete architecture of your business application to provide all the business acceleration that you need.
Now, you know, I hear about other products, there’s these point solutions out there, maybe you’ve heard of some of them. Sometimes we get a little irritated. It seems like it should take more than an in-memory column store and an in-memory row store to make a data platform. Right? That’s not it, even if you give it a really nice-sounding name. You know, we’ve been at this for a while, you know, since 2010. We’ve got over 1.5 million units of our in-memory technologies in customer hands. That has to be 10 times what Oracle and SAP have combined.
Really, the point here is you really do want something that’s part of a proven platform, part of a platform you know, and that really looks at the end-to-end architecture of the whole thing. And the potential here, though, is the business acceleration opportunity. I mean, everybody’s happy when things go faster.
This part of our strategy goes all the way back to the beginning. We’ve used different words for it, but it’s all about enabling you to manage any data and gaining value from it. All the way back into SQL 7.0, that’s why we incorporated OLAP services, why we added the XML support, why we did StreamInsight, and also why we announced last year that we were bringing Apache Hadoop into the platform.
There are two parts of our strategy there. The first is this is first class on the Windows platform. My enterprise-grade, fully integrated with the rest of the data platform. Think about manageability, the security model, the integration into .NET and Visual Studio for developers then we think all the way through to that end user, using Excel, using PowerPivot, that wants to gain insight from data that started out in HDFS. We’re going to connect all the dots across the platform.
Now, the second is we’re building around Apache Hadoop. And this is because we want you to be able to use everything that ecosystem has to offer. We’ve been working hard on this in the last year, we’ve just recently released two previews on Windows Server. We’ve got the Microsoft HDInsight Server. That’s in CTP now, it’s available for you to download and use.
And in the cloud in Windows Azure, we have the Windows Azure HDInsight Service, that’s also available for you to sign up and explore.
Now, this may not be technology you’re real familiar with. And what I do is just encourage you to sign up for these, download, start exploring, use the samples. And of course along the way, please give us your feedback as we move forward to bring these things to market.
It was two years ago on this stage we announced the release to manufacturing of the first version of Parallel Data Warehouse. Parallel Data Warehouse is a massively parallel data warehouse appliance built on SQL Server to take SQL Server to the hundreds of terabytes. We have appliances available from both HP and Dell.
Since that release, we’ve done three software updates. This has been about scalability, it’s been about performance, it’s been on the compatibility surface area. This includes in the last updates bringing in a new distributed query processor that was built around SQL Server.
We’re announcing today the next release of Parallel Data Warehouse will be available in the first half of the next calendar year. (Applause.)
We’ve got innovations at multiple levels. In terms of the storage architecture, we built around Windows Server 2012 and an innovative feature called Storage Spaces. And the net effect of this is we’re going to have dramatically lower cost per terabyte.
We’re building around SQL Server 2012 and the xVelocity Columnstore with the updates, with the clustered index to enable dramatic improvements of performance. And we have many other features across the surface area of the product.
So I’d like you to join me in welcoming Christian Kleinerman to the stage. We’re going to show you Parallel Data Warehouse 2012. Thanks. Welcome. (Applause.) What do you got?
CHRISTIAN KLEINERMAN: Thank you, Ted, good morning, everyone. We’re going to do a quick look at the next version of Parallel Data Warehouse. And the first thing I’m going to do is log on to the administration console.
Yeah, I’m logging as SA and we know we shouldn’t do it, so do as I say, not as I do, right? (Laughter.)
So here’s what you see on the console. This is where you see the entire state of the appliance. You can see information on sessions, queries running, loads running. We’ve been getting customer feedback on improving it as real-world use scenarios come up. There’s an interesting one on storage, you can see how much space is being allocated to data, log, operating system.
One of the most popular tabs in the admin console is this appliance health page. Here you can see one row for every node in the appliance. Today’s demo, we’re going to be working 40 nodes, 40-compute-node appliance. And you can see integrated health information about everything — software, state of the cluster, cooling, et cetera. You can see that there are actual failures and problems. This is not planned or rehearsed. It is just when you have a very large cluster, hundreds of disks, there’s always something bound to be in a failsafe, but the system has total redundancy that things will work even if there are some problems in the appliance. So this is the admin console, but I’m assuming all of you want to see some more performance in scale. Yes? (Applause.)
So let’s go first to this tool, which you all know by now, SQL Server Data Tools. And you’re going to see that I’m using it to connect to another SQL Server instance. One of the big investments that we’ve been doing is bringing the SQL surface area onto PDW to make sure that everything you know about SQL Server ports to PDW.
So the first thing we’re going to do is just use the right database. And just to establish a little bit of credibility, I want to show you what is the data set we’re operating on. And the first thing is let’s see how wide the rows are in the table I have that is called items, these are typical Y table. This is the max size metadata query, and I can guarantee, most of the rows are pretty wide.
So then the next thing that we’ve always done is — let’s see how many rows we have in that table. And a little bit of a story here, which is in the past when we’ve been on stage here with Ted and we show you — we’ve always done like 100-200 terabytes of data, and we try to do a count to see how many rows are there. And the count is one to two minutes. And it leaves us with this awkward silence, which by the way, two minutes to count 100 terabytes is not bad, right? It’s all in perspective.
So we are in this awkward silence, and Ted and I started doing this little quick banter. And sometimes he throws a question which is supposed to be an easy question and I’m totally stumped. (Laughter.) Right? So we fixed it. We fixed the silence, not Ted. (Laughter.)
TED KUMMERT: That’s a different discussion. We’ll have a one-on-one later and discuss all of that. (Laughter.)
CHRISTIAN KLEINERMAN: That’s the boss. So we made the product insanely fast. Let’s try it again. Let’s see how many questions do you get to ask? We’re going to count. And we’re done. (Applause.)
So what is that number? Is it large enough?
TED KUMMERT: 294 billion.
CHRISTIAN KLEINERMAN: 294 million.
TED KUMMERT: Close.
CHRISTIAN KLEINERMAN: So let’s do a very quick math. The previous number was 3945 divided by — I’m not going to copy this, or I’m not going to type it. And how does — oops. Paste.
TED KUMMERT: This is a critical audience for you to be writing query — (Laughter.) No pressure.
CHRISTIAN KLEINERMAN: So we go from bytes to kilobytes, megabytes, gigabytes, terabytes good enough for you?
TED KUMMERT: You forgot the star. (Laughter.)
CHRISTIAN KLEINERMAN: Oh, my God. (Laughter.) Okay. So we have one petabyte of data. (Applause.) Just what we saw, we counted a petabyte of data in almost a second.
TED KUMMERT: Yes.
CHRISTIAN KLEINERMAN: So I’m sure many of you are thinking, okay, they optimized count star like crazy and that’s how they did it. So let’s try it.
I have a different query, a more traditional data warehousing query which I’m going to tell you what I’m going to do first. This is not the main table, this is just a subset of the table that has the row store. Traditional storage mechanism that you had before we had the column store, and what Shawn showed you, the cluster column storage, what we’re going to show you in the next query.
So this is roughly 6 billion rows. This is 1/50th of the large table that we just counted. And you can see it’s typically data warehousing. Aggregate, some grouping, some order by. And processes like an entire year of data. You see I get 48 rows, the query took 23 seconds.
Now, I’m going to do, for those of you that believe we’ve optimized only count star, let’s do this exact same query on the entire 294 billion rows and let’s see how that goes. I run it, and I get — it’s one to two seconds. (Applause.)
TED KUMMERT: And that’s because of the xVelocity Columnstore.
CHRISTIAN KLEINERMAN: It’s the Columnstore, and the scale-out technologies both combined. It’s really good performance.
And then the other benefit of the clustered Columnstore is that you get storage savings, space savings, right? Because you no longer need to have a secondary index on top of the row store.
So to demonstrate what you can expect in terms of compression and storage benefits, I’m going to do something simple, which is I’m going to extract 1 million rows out of this table onto two separate tables. So I’m going to run this query table. You see, I create a table, it’s called row store, the definition says it’s just a clustered index.
And I’m going to run the exact same query. You see the syntax, all that I changed is that I said it’s a clustered columnstore, and I’ve exported the same million rows onto two different tables. And I’ve exported the same million rows onto two different tables, one is called row store, and the other one is columnstore.
And I could say how much space is being used by the columnstore table, and you get 320 rows which PW returns one row per distribution, there’s eight distributions per node, 40 notes, so you get a lot of rows.
Since there’s a lot of data and we want to go and get insight from the data, why don’t we use Reporting Services to just process that information? So I have a predefined report where you can see that I have it already connected to the same appliance. Worth noting, I’m not connecting to it as if it’s SQL Server Parallel Data Warehouse, it’s not a different thing. It is SQL Server as you know it today.
Connect to it, and I have the definition of this data set, exactly the command, space used, exactly one for the row store, one for the columnstore.
And if I run this report, what you’re going to see is at the bottom I have all 320 rows, which shows you the distribution of the data, and at the top I have a sum of the space. I’ll run this report and you’ll get a summary of the data, how much space is being used by the row store, how much space is being used by the columnstore. And you see 55 megs for the row store, 5 megs by the columnstore, which gives you a 10X improvement in compression, which translates to all the math that you can do on price per terabyte or how many hard drives you need to buy. All this benefit translates to you.
Obviously, your mileage varies based on data and schema and not everyone will get 10X, right now the ranges we have are somewhere between 5X and 15X.
So with this, we just saw the new interface, the administration interface for PDW. We saw compatibility with SQL Server. Hopefully, you believe that it’s a fast product and there are improvements in the compression.
TED KUMMERT: That’s not quite it. You can clap for him, but he’s not done. (Applause.) So I think most of you know the gentleman on the screen, Dr. DeWitt. He’s given great keynote sessions here the past three years. A spotlight session on Friday, so this is not a substitute for him talking to you again.
He also leads the Gray Systems Laboratory, their mission is all about advanced development work for the data platform. They’ve delivered many innovations into SQL Server, including the distributed query processor and the fundamental work behind that that ships as a part of Parallel Data Warehouse.
Not too long ago, we went to David and his team with a question. You know, this strategy of now — you know, all these different forms of storage, processing engines. Our strategy enabled you to gain value out of a query processor in light of all these capabilities we’re bringing into the data platform.
And the answer to the question is a feature we call PolyBase, which we’re announcing today. Query processor, unifying things at the T-SQL across multiple types of storage and processing capabilities.
In first instance, it’s about unifying queries between relational and Hadoop data, data in HDFS. But as we look forward, this is the base to support other types of data in the future.
Now, the reason you stayed out here —
CHRISTIAN KLEINERMAN: Encore performance.
TED KUMMERT: Encore performance, but wait, there’s more. Let’s take actually a look at PolyBase.
CHRISTIAN KLEINERMAN: Yeah, so with PolyBase, what we’re going to be able to do is query data through the PDW query engine that is sitting in Hadoop and HDFS. So I’m going to do a very quick glance, what you’re seeing, this is the dashboard from our HDInsight Server. You can see very similar to the console that we have for PDW, you can see information about the Hadoop cluster. It has an interactive console, so for example I could do — show me the data. Web crawler data.
TED KUMMERT: You misspelled.
CHRISTIAN KLEINERMAN: Wow.
TED KUMMERT: That’s two corrections in one demo. (Laughter.)
CHRISTIAN KLEINERMAN: I don’t know how that one-on-one is going to go. (Laughter.)
PARTICIPANT: What kind of a boss are you?
TED KUMMERT: What kind of a boss are you? That’s a good line from the front row.
CHRISTIAN KLEINERMAN: Thank you. Thank you.
TED KUMMERT: We don’t have that kind of time either. (Laughter.)
CHRISTIAN KLEINERMAN: So I can read from this file. I can show you the content. I’m being careful to not make any more mistakes. (Laughter.) And you can see, this is just sample data of what a forums data would have for a set of Microsoft products, comments, whether they’re positive, neutral, negative, and some comments.
So this is what’s sitting on our Hadoop server. And some of you may be a little bit like me, which is I have heard a lot about Hadoop, but I don’t really know how to query it. And I do know how to use SQL Server, I do know how to use T-SQL, so how about we use T-SQL for that? And then the question is, do I need to learn a hundred new concepts? And the reality is you need to learn one new concept.
So I’m kind of going to connect to this different database, Hadoop DB. And all that you need to do to use Hadoop is create an external table. You give it a name, you give it the schema, and you specify the location of the Hadoop cluster. Create this, and that’s all you need to know.
At this point, how would you go and get the top 10 rows from a table? Easy. Top 10 star. And this is coming from the Hadoop server. Also, probably the more interesting scenarios are when you’re combining both relational and non-relational data coming from Hadoop.
So I have this table just helper table, line item, it has a million rows, which as you can see, based on all our in-memory investments, a million rows is a small thing these days. But I can go and run a query that is doing a very simple thing. It joins the forums crawler table that we just created with a line item table that has a million orders on our products, or a million rows indicating orders, and I can start doing aggregation. I can do admin on the comment status, I can do a count on the number of posts, and it is exactly as you would expect. It is just another data source for the query engine. Some data coming from HDFS, some data coming from relational, and you can see the results of the query.
So with this, you’ve see PolyBase, the way to query across relational and non-relational data. Thank you much.
TED KUMMERT: Great, thanks. (Applause.) So PolyBase is also going to be in the next release of Parallel Data Warehouse. This thing’s built for big data. From the storage architecture, dramatically lower cost per terabyte, in terms of performance with the xVelocity Columnstore, and now we’re introducing PolyBase to unify the relational and non-relational worlds at the query level. It’s going to be a very exciting product.
So sometimes I’ll tell people about my first BI moment. The first question they’ll ask me is: Hey, what’s a BI moment? And I assert it’s that point in time where you have a question that you know is answered by the data that’s out there, and you don’t have it. That’s the moment where you realize, I need business intelligence. And this discussion about insights, it’s really all about those BI moments. It’s about enabling people to get the insights they need so they can move forward.
Our strategy is to deliver BI for everyone. It’s built first on the work of BI professionals, delivering the tools and capabilities for BI professionals to build the corporate BI solutions that are the backbone of business intelligence in the enterprise.
Now, the other part of it is to enable BI for everyone else. Those people who may not be able to understand the technology and don’t want to, they’re having a BI moment, they need some insight. And we delivered a scenario we call Managed Self-Service Business Intelligence, which is all about those kind of BI moments for those end users that don’t want to know the technology and the terminology necessarily associated with BI.
Managed Self-Service Business Intelligence comes together in three parts. It comes together in a user experience, user experience in Excel called PowerPivot. It’s about enabling sharing and collaboration, bringing it into SharePoint so you can discover, build on the work of others.
And then it’s about enabling IT visibility and policy control. You need that balance, you need end-user empowerment and then you need IT oversight and governance, and that’s what really makes this work for the full enterprise.
Since we first released Managed Self-Service Business Intelligence in 2010, it’s been great to see what our customers have done with it. What it’s done for BI within their companies. We’ve been working very closely with Great Western Bank for a number of years now. They use SQL Server for data warehousing, their BI professionals use SQL Server to build their corporate BI solutions. And they’re heavy users also of our Managed Self-Service BI capability. So we’ve got a short video of them talking about their use of our BI technologies. Let’s roll that video.
(Video Segment: Great Western Bank.)
TED KUMMERT: (Applause.) That’s great. We’re continuing forward on our vision for self-service business intelligence. In SQL Server 2012, we delivered Power View. Power View being your experience, that tool to really enable you to visualize and explore your data. We’ll sometimes say your data has a story to tell, and Power View is the tool that enables you to unlock it.
We just recently released Office 2013. We worked throughout the development cycle with the Excel team on an idea. And the idea being how can we make this even simpler for the Excel user? And so what we’ve done with Excel 2013 is we fully integrated Power View and PowerPivot into Excel. Excel is now the complete end-user BI tool.
We want to show you that, we want to show you some of that experience in Excel 2013. And we want to also show you some of what it means to gain insight from some of these other forms of data we’ve been talking about.
So he’s familiar to you. I want you to join me in welcoming Amir Netz, technical fellow from the BI team to the stage.
AMIR NETZ: (Applause.) Hi there. I’m so excited being here today. This is the month where we release Excel 2013 to the enterprise so you can all use it. And Excel 2013 is truly a game-changer. This is the first time we are bringing the full power of BI and big data analytics to hundreds of millions of users, and I’m going to give you a taste of that power today.
As you know, I love movies. So we’re going to use a lot of movie data. So we’re going to learn something about the movies, it’s all going to be real. As usual, no data was harmed in the preparation of the demo. (Laughter.)
And we’ll start here. You know, you see here — let’s have some fun. You see Excel 2013 on the screen, see here a table. And this table holds all the movies that were nominated for the Academy Award for the category of the best foreign movie. So you can see here the year the movie was nominated, the name of the movie, the country the movie originated from, director, whether the movie won or was just nominated.
In order to do BI in Excel 2013, the only thing that you need is to go to the insert menu and press Power View. I just want to show you. It’s just there in the menu. You don’t have to download anything, there are no add-ins to activate, it’s just built into Excel.
So I just go in, click on Power View, and okay, and I have my first Power View ready here, already populated with the table I was working on. There’s no need to go model PowerPivot before that, there’s no need to save to SharePoint, it’s just built into Excel. Just like that.
And I can go in and immediately start doing some manipulation of the data. So let’s go and maybe take that table, let’s go and create a nice, new table here for every country. I want to see whether the movie won or not and which movie it was. Actually, let’s go to the count of the movies. So we have here the count of movies, how many movies were nominated, how many movies won.
And what I want to do here is improve the visualization a little bit. And this is Excel 2013, so we don’t have to deal with just simple visualizations. Look at what I have here in the ribbon. You see the basketball item? One click and I can turn this table into a map. Full interactive maps inside Excel. (Applause.)
And notice that it is interactive so I can zoom in, for example, into Europe and see what’s going on here. Let’s go to northern Europe. Let’s go to Sweden, for example. You can see Sweden. You know, it has 11 movies that were nominated, three movies that won. Now, this is actually part of the new release, we also have pie charts. You know, it’s kind of small, but some people care about that. (Laughter.) So we have the pie charts outside of the map, inside the map.
One thing I notice about Sweden, if you want to win the Academy Award on a Swedish movie, there is only one requirement: You have to make sure that your last name is “Bergman” because only Ingmar Bergman can win movies for Sweden. (Laughter.)
Now, you go in, you could go down to southern Europe a little bit and see, for example, what’s going on in Italy. So you can see here, Italy, 17 nominations. Very high win ratio, 12 wins. Very nice. You click on Italy, you can find the legendary director Fellini, Vittorio De Sica, right? All the history of movies unfolding in front of your eyes.
You can also see who gets the shaft; who really, really doesn’t get anything. Poland, for example. Poland has nine movies and not a single win. And the worst of all, Israel. Ten nominations, not a single win. As an Israeli, I take it personally, I can tell you. (Laughter.) These are actually really good movies.
So visualization and interactive exploration inside your Excel. But this is not just nice on small data here locally in the sheet. I want to show you another worksheet. And this one here, what I’m going to do is I’m going to connect to a server model. So we build here a big data server model in Analysis Services and I’m going to connect to it. So I’m going to go here to data. It’s the same thing that you’ve done all the time; going to an existing connection, you can see here connection to Analysis Services.
And I’m going to connect. So I’m going to do connect. And you can see the regular dialogue that we always had here which is show me the pivot table report, that’s how you start pivot tables connecting to Analysis Services. But notice that the fourth item here is Power View report. So you just connect to that model in Analysis Services, ask for Power View report, and at this point we are just working live against Analysis Services with a big model out there.
In this case, it’s a big data model. What do I mean big data? We actually worked inside HDInsight to collect a bunch of tweets about the movies in the first half of the year. So I can show you, actually, how many tweets we have here. We have here 12 million tweets, and we actually have all those tweets in the model. I can just go here, change it into a card so we can all see it more clearly. So just resize it a little bit.
So see we’re starting with 12 million tweets. 12 million, 140 character strings. And let’s just go and do something interesting. So, first of all, let’s go change it into a column chart and look at the pattern over time. And you can see here that the tweets will be collected from February to May. And you can see some really interesting patterns. You can see, for example, we have a big spike here on March 23rd. You can see we have a big blimp here in May. And we might want to figure out what’s going on.
Now, using term extraction in HDInsight, we can go and do some really interesting things. For example, I can go and take a look for each movie how many tweets mention that movie. And just sort it a little bit, make it into a bar chart and you can see here, this is the first half of the year. Familiar movies, right? “The Hunger Games,” lots and lots of tweets. Now, can “The Hunger Games” explain something? Click on “The Hunger Games” and immediately I can see that “The Hunger Games” is responsible for the big spike on March 23rd. And March 23rd is when “The Hunger Games” was actually released to the theater. And you can see also the anticipation of the movie building up because of the books. A lot of people read the books and were waiting for the movies.
Now, “The Avengers,” another big hit, it is responsible for the blip that we have here in May. And you can see it kind of took the industry by surprise, not a lot of anticipation before the release of the movie. So just like that, you can see how we can go and get interactive big data exploration directly inside Excel.
Now, we can go further than that. I’m going to go here and you can see here a list of 20 actors on the screen. And if you look at those actors, you know, the name might resonate, but something is common to all of them. And you forgot, but these are the 20 actors that were nominated for the Academy Award at the beginning of the years. So leading role, supporting role, males and females.
And what we’ve done is using the same kind of terms extraction, we actually were able to go and count for each one of those actors how many tweets mentioned their names. So you can see here, clearly, you know, the A list versus the B list. Brad Pitt and George Clooney and Meryl Streep with lots of tweets, and then everybody else much less.
And we can go and imagine that you are a brand manager in a marketing department. You manage brands. Actor names are brand names, they’re worth a lot of money. They show up in commercials, they show up in ads. You know, it’s really important for you to know how to manage your brand. And awareness is one of the things we most care about. So let’s see how awareness affect those brand names.
So I’m going to show you here the number of tweets over time that mention those 20 actors. You can see a big peak here on January 22nd. Something really, really special happened on January 22nd for those actors. Anybody want to take a guess? It was the nominations to the Academy Award. So the nominations were announced on that weekend, and suddenly all of Twitter is full of tweets about those actors. And you can see this big spike here.
But if you’re the brand manager for one of those actors, you may get a very, very different picture depending on what brand name you’re managing. If you manage Brad Pitt, click on Brad Pitt. You can see this is the tweet pattern. For Brad Pitt, January 23rd was just another day in the office, nothing changes. Brad Pitt has so much — yeah, he doesn’t need nominations for Academy Awards. So many people are tweeting about Brad and Angelina and Jenn — you know, he doesn’t need that. (Laughter.)
But look at Melissa McCarthy here. That’s a very different story. You’re managing her brand, this is what you see. Barely anybody is tweeting about her, and then January 23rd is coming and a big spike of awareness. And then quickly everybody forgets about her, right? (Laughter.)
If you’re the brand manager, you probably say, oh, I missed a big opportunity, I didn’t make that announcement stick. You know, nobody remembers her.
Now, we can go in and if you’re still the brand manager, you don’t care only about awareness, you also care about sentiment. And sentiment is kind of what people feel about your brand. And what we’ve done here, again, using the term extraction and HDInsight, we are able to get good words and bad words that showed up in the tweets. So if somebody said “good” and “wonderful” and “terrific” that’s a good thing. If they said “bad” and “horrible” and all sorts of F-words, this is a bad thing. Right? (Laughter.)
And then we can weigh those things in and try to get how much — you know, when people talk about an actor, how many good and how many bad and get a sentiment score.
So we’ll start here with — these are male actors that I see here. And we’ll start seeing what’s happening with Max von Sydow. On the X axis, we’re seeing here the awareness score, and then the Y axis, the sentiment score. And Max von Sydow has been nominated for a supporting role in a movie called “Extremely Loud and Incredibly Close.” Max von Sydow is already 85 years old, so that should do something good for whatever is left of his career. (Laughter.)
So let’s see what happened here. So I’m going to go and — okay, let me just stop it for a minute. I can just do it again. So let’s see what’s happening with Max von Sydow. So we get January 23rd and you can see this big spike in awareness, big spike in sentiment. The sentiment goes down, but it’s generally a much, much better sentiment and much better awareness than what he started with. And of course you can imagine that being nominated for the Academy Award is a really good thing for somebody’s career, and he should end up in a much better, you know, point that what you started with before the nomination. Well, that’s what you would think, but it’s not always the case.
Let me show you somebody else. This is Demian Bichir. Demian Bichir was nominated for a leading role in “A Better Place.” “A Better Place” was actually, I checked it on IMDB, it got a score of 6.5, wasn’t a very successful movie. And Demian Bichir, before the nomination, had not a lot of awareness, but pretty positive sentiment. And then we start going and you can see we get to January 23rd and a big spike in awareness, but look at the sentiment, it is actually going down. It’s going down consistently. You’re going to see that, in fact, people for some reason did not like the fact that Demian Bichir was nominated for the Academy Award. And he ended up with much worse sentiment than what he started with. I don’t know why, but if you’re the brand manager for Demian Bichir, wow. You better submit your resume. Okay. (Laughter.)
Now, the last one I want to show you here is — you know, let’s look at this. This is Brad Pitt again. Brad Pitt, again, January 23rd comes and goes. It doesn’t really change anything. Everybody already has made up their mind about Brad Pitt. They love him, they hate him, nomination doesn’t change anything, awareness is there, completely constant. If you’re Brad Pitt, you can relax, don’t worry about nominations. Enjoy life, enjoy Angelina, get ready for the wedding, right? Very, very easy. (Laughter.)
So you’ve seen how we took Excel and not just made it useful for people to just use as a training wheel on data, but the same skills are done with teaching hundreds of millions of users, you can use Excel as a front end to big, complex, big data analytical model that you have in the enterprise that can use the same sets of skills. Again, you don’t have to deploy anything new.
The last thing I’m going to show you here is another report. This one I’m going to go to the browser. So let me just make it a bit larger so the guys in back can see well. No, right? I know, last time you told us you cannot see well. So we actually went back to the team and said, hey, we have to have a presentation mode scaling capability. So we just made it available, so you can just go and make the fonts larger. (Applause.) Yeah, we read the tweets, we know.
So what do we have here? We have here a list of movies with the domestic gross sales of those movies. And what I want to do is I want to do some analysis about the performance of actors. In fact, actresses. So I want you to shout to me the name of an actress, any actress you want that is going to be married and has six kids, mostly adopted. (Laughter.) Yes? I’m waiting. Yes, Angelina Jolie, good choice. Okay. (Laughter.)
So let’s go with — let’s show Angelina. Angelina Jolie. So these are all the productions of Angelina Jolie. I’m going to make it into a column chart here and sort by the gross revenue of the movies.
So imagine that you are now a producer in Hollywood and you want to cast your new movies. And you’re thinking, well, Angelina Jolie, I’m looking at a lot of blockbusters that I see here, I should cast her to my movie. And you’re going to do that, but you think, you know, back of your head, maybe I shouldn’t just look at the sales, I should also look the profits, you know, basically the sales minus the budget of the movie. Right?
So let’s see how is Angelina Jolie doing on the profit side? So I’m going to go here and go for the profit and loss, replace the domestic gross with profit and loss, do the sorting again. Okay. Now the picture is not very pretty, certainly not as pretty as her, right? There is — you know, there are quite a few profitable movies, but a lot of barely break-even and even some losses here.
At this point you say, well, I need to think. But maybe that’s kind of how the movie business works, right? You make a bunch of blockbusters and a few kind of barely break even. But the best way to check that hypothesis is to benchmark her against somebody else, another actress that she knows, that she doesn’t like. Ideas? (Laughter.) Ideas? Jennifer, very good. We’re going to go and check and compare Angelina Jolie to Jennifer Aniston. Okay?
Let’s go for Jenn, let’s see this on Jennifer Anniston. So this is the picture for Jennifer Anniston, the profits and losses for Jennifer Anniston. Ooo! (Laughter.) Yeah, at this point, I’m thinking I made a big mistake. (Laughter.) I need to call Angelina and tell her, Angelina, I’m sorry, it was the wrong number, you both were on the speed dial next to each other, you stay at home, get ready for the wedding, enjoy Brad, I’m going to go with Jenn.
But before I make that decision, I better check to see what Jenn is actually doing there. So I’m going to go and take Jenn, take Angelina, and I can see that Jenn is doing all these reddish movies. Jenn is the queen of comedy. If you’re producing a comedy, there is no better choice than Jenn. You cannot lose money with Jennifer Anniston. She’s money in the bank. In fact, if you think about Angelina Jolie, you know, two losses, one win, not a winning team there, right? But Jenn is really, really good.
But what is Angelina really good at? Angelina is really good with the blue things. Angelina is really good with animated movies, right? This is kind of surprising. The best way to make money with Angelina Jolie is to have movies where nobody can see her. (Laughter.) It’s not me, it’s the data. (Applause.)
Now, we actually spent a lot of time also making sure that these can really, really look good. So when we look at those — we spend a lot of time to enable you to change styles, set some background images, and even control the transparency of the images so that you can really make the whole thing pop out.
So I already have a background image to put in, it’s already 100 percent transparent. I’m going to reduce the transparency. And you can just make the whole — (Laughter.) too much skin, too much skin, too much skin. Okay, that’s it. (Laughter, applause.)
Okay. It was fun and I showed you a few cute, little features here. But there’s one big feature that is actually hiding behind this last report. It’s not just for the fun aspect of it. I want to take a look here at what we have.
Notice the icon set that we are using here. I’m just going to close it a little bit so we can see. So I have here two icons that we’re using. There is one that looks like a table, another has the sigma sign. What does the sigma sign symbolize? It symbolizes, let me show you, measure groups. Because the whole report here was built on a MOLAP cube. This is a complete support of Power View to MOLAP cubes. And the way we’ve done it, we actually went and implemented full DAX querying support on top of MOLAP cube, so you can see here, we are sending DAX queries to your MOLAP cubes.
So now you can go and use Power View not just for personal data, not just tabular model, but to MOLAP cube, all the same experience available to you. So from personal data to team data to the largest enterprise data, all these great experiences. (Applause.) Thank you.
TED KUMMERT: Well, there’s a tough act to follow. (Laughter.) And you never know, when you come to the PASS Community Summit, you never know what you’re going to learn.
Sometimes after these keynotes, Amir and I do spend a lot of time on stage together. People will come up to me afterwards and say, “Is he always like that?” And the answer is, yes, his passion, his energy, his technical excellence, one of the best parts of my job is the folks I get to work with like Shawn, Christian, and Amir. It’s just fantastic.
Now, what did we see there? How we’re continuing to make BI simpler by integrating into Excel. We saw a great demonstration of additional insights, additional value that was created by using other sources of data. Data all the way down there, in this case, unstructured data, it’s Twitter data. Producing additional insights, additional value you’re going to be able to create. And that it’s part of a complete platform. The solution for the BI professionals and also for all end users.
The world of data is changing and we’re changing with it. And, yes, change is difficult, but change brings a lot of opportunity. I hope you saw in what we discussed today some opportunities for you. I really hope what we do today is not only show you some cool stuff, but really inspire you on some things that you’re going to be able to do with what we’re bringing forward.
Business acceleration, bringing more performance to those business processes, bringing quicker path to insight, quicker ability for end users to iterate. That’s exciting. Being able to bring other data into the solution, it adds richness, it adds additional insight.
And then as the video said, BI makes heroes. And we’re going to make it simpler by integrating it and landing that on any data.
Now, PASS is changing, too. They just announced this morning, we’re having a business analytics conference in the spring in Chicago. We’re very excited about that, we hope to see a lot of you there.
So I want to thank you for your time today. I want to thank you for your partnership with us, how this community works together, how this community adds value to what we do. It’s incredibly valuable to us. I hope you have a great week in Seattle and at the PASS Community Summit. Thank you very much for your time. (Applause.)