The Query Store's Open for Biz, w/ Microsoft's Conor Cunningham

15-12-2020 • 1時間 19分

Conor Cunningham is a Microsoft Architect and he knows a great deal about data, storage, and everything Microsoft. He and Rob go way back and were even roommates for a couple of years during Rob's Microsoft days. There's quite a bit of history between Rob, Tom, and Conor. A history that fosters an insightful discussion and a little bit of busting chops! We discuss quite a bit including:

What a Microsoft Architect does
Organized storage and Data lakes
Conor's celebrity status
How the Query Store was almost killed, but Conor didn't give up on it
Remote presentations and working remotely
And so much more!

Episode Timeline:

2:25 - Conor the Celebrity
4:40 - Conor The Architect, and his unusually organized kitchen
9:40 - Speaking of meticulous organization, we talk Organized Storage and the transition from row and column models to....something else
16:10 - Data Shape and Data Volume in selecting databases, and are the old dusty data warehouses being replaced
21:50 - Curves and Cliffs in warehousing, some avenues are easier and cheaper than others
25:25 - The Data lake has changed the analysis process
29:50 - Rob pushes Conor into a long-awaited breakthrough, some failed projects from their past, and how the name "SharePoint" was almost given to a different product
34:55 - Let's go back to Conor The Celebrity, some good stories from Tom's Microsoft MVP days, and how Conor had no fear of the salty MVPs
46:50 - The "Query Store" isn't a place you go to purchase queries
55:00 - Keeping up with the competition: AWS and Babelfish
1:02:00 - The (ORIGINAL) Windows Phone debacle, and Windows Media PCs
1:06:20 - Doing presentations in the COVID-19 era and the future of the remote workplace at Microsoft
1:14:30 - Redmond's new buildings, and getting caught in the infinite loop that was the old Redmond facility design

Episode Transcript:

Rob Collie (00:00:00):
Welcome friends to another episode of Raw Data. Today's guest is Conor Cunningham, thee Conor Cunningham. Some people in the SQL community simply refer to him as Conor. Conor is an architect at Microsoft on the Data Platform that spans a lot of ground as you'll see a during the podcast. He's long been involved with SQL server. He's up to his eyeballs in everything Azure-related that's not why I know him. Conor and I met as friends at Microsoft when we were both youngsters, we had both just shown up out there and we actually ended up, the two of us plus two other guys we ended up renting a house together for like a year and a half. It was a very interesting time in our lives. So as you might expect, this conversation is a blend of the personal and the professional. I think you'd expect nothing less of us at this point.

Rob Collie (00:00:50):
We talk a lot about some of my favorite topics, things like the evolution of storage from purely rectangular to curly and how that interacts with the worlds of data warehousing, and how it interacts with our world of analytics. We talk about presentation styles and authenticity. We talk about a lot of things. He's delightfully nerdy, and he explained to us how his SQL architecture sort of background influences the layout and the design and how he stores things in his kitchen, which was awesome. But he is a really interesting guy, a funny guy, a nice guy, really a great friend and I really enjoyed this. So let's get after it.

Announcer (00:01:34):
Ladies and gentlemen, may I have your attention please?

Announcer (00:01:38):
This is the Raw Data By P3 Podcast with your host Rob Collie and your co-host Thomas LaRock. Find out what the experts at P3 can do for your business, go to powerpivotpro.com. Raw Data By P3 is data with a human element.

Rob Collie (00:01:57):
Welcome to the show. Conor Cunningham, how you doing, man?

Conor Cunningham (00:02:01):
Pretty good, Rob. How are you today?

Rob Collie (00:02:03):
Oh, fantastic. You know, it's been too long. We need to invent things like this, like podcast really the podcast, the real purpose of it is it is a forcing function for talking to old friends. That's what we're doing here, maybe not 100%, but it seems to be working.

Conor Cunningham (00:02:17):
I like it. I like it.

Rob Collie (00:02:19):
Yeah. Unlike many of our guests, you and Tom already know each other.

Conor Cunningham (00:02:25):
Yep.

Rob Collie (00:02:25):
This is also usually the vehicle by which I introduced Tom to all kinds of people that you know all kinds of weirdos that he doesn't know. But today we have a weirdo that he does know. And we're all weirdos here so as you know, we're not picking on you particularly. But I think there was a moment years ago where I was talking to Tom and I said something about having lived with Conor and Scott and whatever. Anyway he goes, "Whoa, whoa, whoa, whoa, wait, wait, wait, wait, wait. Conor-Conor Cunningham? I'm like, "Yeah." And he's like, "You know Conor Cunningham?" And I'm like, "Yes, yes. I know Conor Cunningham." Tom's voice took on this like hallowed tone. I think honestly that's when Tom really decided that he liked me was that I knew Conor.

Conor Cunningham (00:03:13):
I'm glad I was your in.

Thomas LaRock (00:03:15):
No, it gave you legitimacy is what it did.

Rob Collie (00:03:18):
Yeah. I know him that doesn't mean that-

Conor Cunningham (00:03:21):
It's good enough, I guess.

Rob Collie (00:03:24):
... Conor would Knight me. I thought that was hilarious because I had no idea really. I didn't know that the Conor that I knew was somehow like a celebrity in the SQL space, but it makes sense. You know, I just didn't know.

Conor Cunningham (00:03:41):
I've been pretty lucky that people kind of all know my name now since I have been doing keynotes for my boss, Rohan. We have this thing called the Bob and Conor Show, we do it pass every year. And so it's rare that you get known by just your name, your first name. And I guess I've been lucky in the sense that at the right time I was able to get an in front of thousands and thousands of people. And now it's just thee Conor. I don't necessarily let it get to my head it's only in a very specific community after that I'm not really worth anything, but I think within that group, they seem to like me.

Rob Collie (00:04:12):
Yeah. You're like Alice Cooper, you go on stage and then you're thee Conor. But when you walk off stage, you put your pants on like everybody else, one leg at a time, you know?

Conor Cunningham (00:04:23):
Probably. Yeah, if I were wearing pants during this pandemic.

Rob Collie (00:04:25):
Just to know yeah, I know. I think I'm actually wearing pajamas right now. So I don't know if that qualifies as pants, but it's a good COVID day for that. So when people ask you what you do? What's your job at Microsoft? What do you tell them these days?

Conor Cunningham (00:04:39):
It's a little odd, my title is architect, but that means different things to different people. And what you think it means is only maybe 10% of what my job actually is. So architect at Microsoft means different things in different parts of the company. It's maybe a little bit of a glory title, but really the way to think about it is you're a senior engineer you're respected for what you do. You often are given these amorphous problems that don't really have solutions and in that sense, it's kind of what an architect does. I'm not really drawing like class diagrams or anything about how you put software together, but I am often trying to make sure that the whole picture of what we build for SQL or any of the other data offerings that I work on works together for the customer. So that often can be different things like when GDPR came in, I had to go figure out how do you get GDPR, the European privacy rule to work across all of our products so that we didn't hopefully get sued by the EU?

Conor Cunningham (00:05:39):
And that's not probably a typical architect task but that's the kind of thing that would be normal for me. So I work on getting large companies to adopt our platform and to sort of solve all the issues that come up with that. Sometimes technical, sometimes training, sometimes legal, sometimes whatever. And usually it's like a joker in a deck of cards so you just be... You're throwing these weird problems and then you kind of get to go choose your own adventure and hopefully solve them. So it's a variety of different things and on any given day, I like to think of it as you don't really know what the day is going to be, but there's probably going to be a bunch of fun problems. And then you just have to kind of keep up because the pace is relentless.

Rob Collie (00:06:23):
Yeah. Well, I do remember that about Microsoft for sure. It's actually one of the things that I, in hindsight, I've credited with a lot of my development. I mean, really most of my development came at Microsoft really, but the pace of it, the number of decision that you're forced to make per day that are going to stick, these are decisions that are going to impact the software. And some days you're making easily mid-double digit decisions about the products. It's crazy, it's a high-pace of decisions with enormous impact, potentially enormous act anyway. And so you've really got to be on your game and it really rewires your brain in some ways that I never really anticipated.

Conor Cunningham (00:07:05):
You have to get in their head space to do it and so one of the challenges that I face is when I get done with work, I have to figure out how to turn the brain off in order to go back to doing normal human things like making dinner or whatever. You typically need to drive home and during the pandemic, you don't really get to leave the house, right? So you end up needing just to stop and give it five or 10 minutes to "drive home" so that you can just let yourself unwind a little bit and then become hopefully normal human Conor to be able to go watch football or whatever it is that you want to do that's for fun.

Rob Collie (00:07:41):
I'm envisioning you like cooking dinner and going all like Rain Man lining up all the Tater tots on the tray.

Conor Cunningham (00:07:49):
I will admit that I have taken every single thing out of my kitchen and organized it by function.

Rob Collie (00:07:56):
I mean, you've got to be able to store and retrieve those utensils now don't you? And you would not want to be inefficient about that.

Conor Cunningham (00:08:06):
Nope. I have too many colanders, so I got rid of some of the colanders and everything is partitioned by serving, baking, et cetera. I had so much fun doing this. It was too many years without doing it and now that I've done it, I am so happy that my kitchen is ordered.

Rob Collie (00:08:24):
You know it's all indexed, you know.

Conor Cunningham (00:08:27):
Pretty much.

Rob Collie (00:08:28):
Do you have failover clusters? Do you have a backup kitchen?

Conor Cunningham (00:08:32):
I have a second pantry.

Rob Collie (00:08:34):
Oh.

Conor Cunningham (00:08:35):
The first pantry's a little small so I have this room in my house it's not quite big enough to be a room it's like where you'd put a second fridge, I guess, but I don't. I have a wine fridge in there, but it was just big enough to kind of walk in. Originally, I tried to put like a treadmill in there but it was just too big. So I made it a second pantry and now it's like a big walk in pantry and it actually works out pretty well.

Rob Collie (00:08:57):
So you've got an onboard cash pantry?

Conor Cunningham (00:08:59):
Yeah.

Rob Collie (00:09:00):
And then sort of a secondary, more offsite type of... Yeah, I understand I get you.

Conor Cunningham (00:09:04):
Level two cash for the pantry definitely.

Rob Collie (00:09:06):
Mm-hmm (affirmative). Yep. It's just basic, it's pretty elementary really when you get down to it.

Conor Cunningham (00:09:11):
It took me a few years to figure out that was the right plan, but now that have settled on it, I'm extremely happy.

Rob Collie (00:09:16):
I can totally see you getting into that. We should do like an MTV Cribs type episode where you talk about your kitchen and check us out, man, over here. All right. Well, until we get the budget for that, we're just going to continue with the podcast. So when you and I first met, you were working on Jolt, the old, a DB driver for the jet engine, that powered access. And then as the years went by, and I'm sure there's some interesting things to talk about with Jet. I just don't know what they are, so we'll have to circle back. But one of the number one things I want to talk to you about is when I knew you at Microsoft, it was basically Pre-Hadoop. There's pre-Hadoop and there's post-Hadoop. And there's two different eras in our professional career when it comes to organized storage.

Rob Collie (00:10:05):
And actually we have talked about this on a previous podcast about how Microsoft's sort of halfway anticipated this move to semi-structured storage. In the first iteration completely I think completely got the implementation wrong, which was XML blob storage in SQL. You know, it was anticipating a trend that data wasn't going to be just rectangles anymore. It was going to be curly as a certain architect referred to it. So when you say architect I think about these old white papers that I used to read with a certain degree of ceremony you would read these because they were hallowed, you know. Then you look back years later and kind of chuckle at how off they kind of were, even though they were at the same time pretty infused with foresight.

Rob Collie (00:10:55):
So I would really like to talk about just in general, this transition from this rigorously Roman column model, which still obviously is very, very common and useful today. It's not gone at all, but there's all this other stuff that's happened since then. And I think you've now been dragged into that. You've been dragged out into those deep waters as well. Like you're in the middle of all that too, aren't you?

Conor Cunningham (00:11:23):
Yeah. I work across all of the Azure Data offerings that we have both SQL server on-premises and also our cloud offerings, including the recently released Synapse Analytics platform. And that works over data in a logical data lake and it can be whatever format you want. It could be CSV files, it could be XML files. It could be whatever, parquet and that lets you kind of query over data. Where the structure isn't necessarily forced on you by a database and its schema, it's something that you tell the database what that structure is and then have it interface with it sort of indirectly.

Rob Collie (00:12:01):
Yeah. So in terms of a personal journey, I kind of remember you as a SQL snob. There was a purity to the SQL and a purity to everything. When I say SQLs now let's say so lovingly, but I've never got a chance to talk to you about like that was a real expansion of the world and the fact that you're involved in all sides of it now, what was that journey like for you sort of opening up to this new stuff? Like in the beginning did it seem kind of, ah, this will never work.

Conor Cunningham (00:12:34):
You know, when I was younger I would probably have said that, right? That doesn't solve all the problems that you want and you guys are crazy, leave me alone. As I've gotten a little older, I've had to just really catch myself the first time someone says a new idea. I try really hard just to bite my tongue and listen for a little bit more longer because there's usually a kernel of truth in the crazy new idea. And if you let yourself sort of sit there and listen to it and stew over it for a little while you figure out what is interesting about it. So the general trend about how we've got to the scale out, no SQL, maybe not necessarily strict schema view of the world is partially based on just the explosion of the amount of data that's out there now.

Conor Cunningham (00:13:23):
And at some basic level you can scale up a computer to a certain size and then all of a sudden when the problem is too big for whatever the biggest computer is that you can buy, you need a different approach. And the world got to the point where that much data exists and we actually do want to process that. And once you start scaling out, then you don't necessarily need to have a bunch of really expensive computers to do that. You just need a whole bunch of computers with the right scale out algorithms. And that world gives you a lot of flexibility. It's also a little bit more complicated. The core algorithms, the area where I play to go do architect Conor stuff are the same when you get down to the bottom of it. It's just a question of how do you make sure that you have a cohesive technical solution in whatever it is you're trying to sell to the customer so that you can hopefully solve their business problem.

Conor Cunningham (00:14:09):
And a lot of the world that I play with these days is some combination of scale up and scale out or structured data and semi-structured data, depending on where people are focusing and figuring out how do we apply the right algorithms generically across all of those. And it's a fun problem space, but it's also a very big world. It used to be very simple inside the SQL Org. We would just go build more SQL features and life is grand. And we still do a lot of that and that's still a very valuable business for us. But it also is an area where you have to be open to go meet the data where it is because there's so much of it now, just loading it into a database can take too long.

Conor Cunningham (00:14:48):
So you need to think differently about how to solve problems like that and that gives you opportunities to kind of rethink well, what could you do? What is a database? Well, a database is just a thing that answers questions for you and then maybe provides you transactions and maybe does this and that. If you don't need all of those, then you can build a "database that maybe works differently" and solves problems better, cheaper, faster, whatever it is that you need to really meet customers where they are today.

Rob Collie (00:15:13):
Yeah. It's the volume of data is exploding, but it's not just that it's also right the shape of it isn't quite... It doesn't want to like fit into neat rows and columns quite so cleanly anymore. And if it did, it would fit into the world's most complicated ad hoc schema. Can we explore that a little bit? This problem, and this type of storage sprung up in response to basically trying to store the internet, right? Like it was like internet search engines need to index the whole internet, how do we save that? So certainly tremendous volume, check and also though the properties of an individual page or whatever, a webpage doesn't just squish down into SQL, it doesn't squish down into tables and everything. And so to the extent that we can, this might be an impossible question, but to the extent that we can, how much of it is volume and how much of it is shape that drives not the introduction of these tools like data lakes and things like that, but their usage today?

Conor Cunningham (00:16:18):
Yeah. It's a good question. I think that one way to look at it is even if you just had purely structured data, you would still need all this scale out stuff. We have scale out versions of traditional database engines and there's several commercial company that provide them including Microsoft and that exists. But you're right, Rob that the amount of data that is not formally structured is much, much larger than that. And therefore, if you want to go process that data, you need a different toolbox of stuff. And some of that you can build a new database engine, sometimes it's on the side or complementary. Sometimes it's just something completely different. Say I'm going to embrace that and just store random content so Cosmos DB is a new SQL store that we sell and it basically just lets you pass in JSON and you can store whatever you want, but up to a certain size.

Conor Cunningham (00:17:07):
And then you can try to find ways to store different kinds of content, as long as you can figure out the interface that you want to use over it to make sure that you can write code to process it however you want. And that can be HTML, that could be images, it could be whatever it is that you want, but in each of those domains, there's probably going to be more specific tools because the amount of data is big enough now that it warrants having tools that kind of go after each space separately.

Rob Collie (00:17:32):
It's just so fascinating. So like as I've been trying to, and I think reasonably well adapting in the way that you have, I've also had to adapt to this new world. And I did have a little bit of exposure to it when I briefly worked for Bing with their storage. Let's talk about data warehouses, one of my favorite things, my favorite straw men to set on fire from time to time. When you think about the term data warehouse, right? It's supposed to store like you can see the dust in the crates like the Indiana Jones, the end of the first Indiana Jones movie like this is where you put data so it doesn't get lost. Because your operational systems don't have a need to keep around the five years of history. Your operational systems are just really mostly concerned with the now and the recent.

Rob Collie (00:18:21):
And so the original name given to these things implies, just don't lose anything. It does not imply that name anyway does not imply the priesthood that evolved around it. It became this like tower of Abel type of enterprise, it doesn't even really matter what the data warehouse is used for. The data warehouse is a thing that we're going to spend careers on and decades on. And one of my sort of pet theories has been that these semi-structured storage techniques have been eating into the data warehouse space, even though they don't do what we think of a data warehouse is doing, right? Like really neatly organized, highly structured, intelligently materialized storage.

Rob Collie (00:19:09):
They don't do that. They don't replace that, but they absolutely replace the don't lose anything goal. Right? Like they do that, Right? And it's like a garbage disposal you just like almost like throw anything at it. It's like no, no. Yep. Got it. Got it. It's not going to be lost, not going to be lost. Part of my agenda for today is to run some of my pet theories by you and get your opinion on them. And if you disagree, if you think they're bad ideas, it's no big deal. We'll just edit them out. It's no problem.

Conor Cunningham (00:19:39):
It's an interesting problem. Like enterprise data warehouses have existed at the center of large companies for like the source of truth for running those businesses. And there's a trend generally that we've seen where either just because they wanted to get off of expensive enterprise data solutions and move to Hadoop, they've been able to move to a solution that's more scale out. I think that there's a chicken and egg question about is it because they wanted semi-structured data or because they just wanted to get off of the maintenance fees for a particular vendor. I suspect that the maintenance fees and the size of the data are probably driving it more strictly in my mind than just whether it's semi-structured or not. But once you go down that path, you have the option of storing hierarchical data or data that's structured differently. And that then becomes a tool that's in your quiver that you can kind of play with all the different options that you can with I want to store more data there.

Conor Cunningham (00:20:34):
And I still think there's a lot of structured data that's going to play in that world, but you can mix in semi-structured data, chase on payloads on the side, whatever it is that you want more easily without having to worry about, oh, how long is my backup going to take. The other nuance that's kind of interesting is in clouds today, the cost to store things that aren't being touched is generally very, very close to zero per bite. It's pretty cheap. Like we store petabytes and petabytes of data just to run one server in our public cloud. And that's just fine. It costs a fair amount of money to store things at that scale, but it's completely possible to do so. And we use it data to run our business effectively. Any big company can do that as well, even any modest size company can store tons of data cheaply compared to what it would've cost in one of these earlier solutions.

Conor Cunningham (00:21:26):
And I think that that's part of it is that it democratizes the ability to get at lots of data. And then you can do lots of different things. Whereas before it was so expensive that you had to structure it, you had to spend all this time. Now you can choose if you want to cook that data into a structured format or not choose it if it's not worth your time right now.

Rob Collie (00:21:44):
Yeah. There's a big difference between curves and cliffs and traditional enterprise data warehousing was a cliff. It's like, if you want to store anything, just write down a transaction. First, you have to read this thousand page Bible. It's like the cost of entry, even for the simplest things, is that you've really... It's all up front. And the ability to sort of scale how much you want to think about it and to be able to turn a knob rather than just flip a switch. You know it's not binary all or nothing. I think that the implementation cost even just figuring it out, what the schema should be and all of that kind of stuff, even that is part of the cost of ownership. And of course, I think you're right, like cost drives everything. So if that goes hand in hand with like reduced licensing costs like hey, sign me up.

Conor Cunningham (00:22:39):
Absolutely. I think that the space here for proper enterprise data warehouse design still exists. Even in the scale out world, you can benefit from Kimball type 2 dimensions and fact tables that are as narrow as you can make them and all that type of stuff, because that does reduce the time to get results in some cases. It's not required in as many cases and the ability to add scale out processing can also be another lever to go after that same problem without necessarily having to resort to the same exerting rules that you would have in order to play the data warehousing game effectively.

Rob Collie (00:23:15):
People ask me, what's your opinion on data warehouses? And basically I say to the extent that you already have one, they're awesome. They're the best thing in the world. They make everything easier. When I get into trouble with people when we start getting into arguments is when I say it's just not worth waiting on you build one. Analysis and actual business value can proceed and actually should proceed even if all the data that you need isn't already in the EDW. When it comes to going to work with our clients if ahead of time, if I could sort of magic wand sort of choose, I would say, hey, it would be great if this client already had everything in their enterprise data warehouse, just lower friction, we're off and running. You know I would love that. It's just never reality. It's always lagging behind the realities of the business by years really, in terms of what's been incorporated and digested, there's always so much that's not stored in there. That's relevant.

Rob Collie (00:24:12):
I love that we've reached the point in this profession where we can, we can hybridize, we can use data from your data warehouse if you have one, but at the same time, there's also things being pulled from elsewhere and modeled together. And it's kind of a happy place to be.

Conor Cunningham (00:24:25):
Yeah. I mean, just last week we released an update to our Synapse Analytics platform that lets you just throw all the data into a logical data lake and then you create some external tables on top of that and you can query it on demand. And instead of having to provision the data warehouse upfront, it charges you by the terabyte processed not any provision thing. It just goes away when you're not using it and it just costs storage and that's it. So there's going to be lots of work to make that easier and easier for people to get value out of their data with less investment to build that data warehouse upfront. Partially just because of the size you can't load it all anymore. And partially because that's where customers need help. They want time to solution for each team, not just the team that's in central IT for a large organization to have control over that data warehouse. If you want to spin up a new one and just have your own data lake on the side, that's completely possible now and it's really easy.

Rob Collie (00:25:16):
Oh, that's great. That's great. This is probably my last theory of this sort. The game of data warehousing, old data warehousing was in order to store it, you had to design the tables to store it. You had to put it into rectangles in order to store it. Whereas now with data lakes, you don't have to go through that process if you don't want to. You can store it in sort of non-rectangular curly format, closer to its original format. But when it's time to analyze, which is typically where our company comes in, when you're analyzing across lots of "records or entities" you're not as concerned about the ways in which those entities are structured differently. You tend to be analyzing them across dimensions that they share. It's their commonalities that are interesting. And so you end up with rectangular-powered analysis like we still get to tables. When we're building our data model, tables are still absolutely 100% the way to go even if that data's coming from a format that wasn't stored as tables.

Rob Collie (00:26:23):
So it's like you delay the rectangularization of the data until query time in a way because in every query is different. Every query would require you to rectangularize the data in a different way, right? And so it's kind of back to that curve versus cliff thing, right? You can kind of delay tomorrow's decisions until tomorrow when you actually know what they are rather than trying to anticipate every possible decision tomorrow in your schema that you design today.

Conor Cunningham (00:26:52):
Yeah. I mean there's truth to that. I think the way that most of these logical data lake things work now are sort of tied to this runtime binding of the idea of here's the format now, and this is the subset I'm going to look at. So you can definitely push that decision out. There is still a trade off between like how fast is that query or how much does it cost to run if you're going to do that every single day, maybe it makes more sense to schematize it to get that to go faster or be cheaper or whatever. But that's a trade off that you can choose if you need to, as opposed to having to do it to be able to start playing.

Rob Collie (00:27:34):
Right. No crystal ball delaying the work until you actually know what it is, is a tremendous advantage. And I completely agree with you. Let's say we walk up to an unstructured storage like a data lake and we start extracting things from it for analysis. There's going to be a lot of experimentation and iteration in that process. So as the project proceeds, we even refine the rectangles that we look for, right? But then after a while we dial it in and this dashboard, whatever it is that we built you're right, Conor like it's now running every day, powering business decisions and at refresh time or maybe we're using real time pass through queries or whatever, at refresh time, the performance of "rectangularizing" paying that every single time might suck. And so now that the spec for this solution is essentially stabilized. It's almost trivial I would think to define those rectangles as like a cache format that you can pull faster, you can optimize at that point rather than trying to optimize for every possible answer in the future, which is really a losing game, isn't it?

Conor Cunningham (00:28:43):
Yeah. We definitely see people do that at different layers of the stack. You get like results set caching for example, or indexed views, materialized views are things that you can see inside the database to help you with that as well to kind of delay, even on top of these external formats to say, you know what I'm going to pre-process it once and store it here to make the layers above easier to serve. And you can all also do that within your serving layer as well. So I think there's lots of different options to try to reduce that cost to decide when you want to make that investment to go and really schematize things and you can choose to do so or not now in a lot more cases than you used to be able to.

Rob Collie (00:29:21):
What a glorious world we live in. And you know what else is glorious is that Conor now you and I actually have a place where our careers meet. For the longest time like we were just both nerds working in tech at Microsoft, but like we'd come home and talk to each other because we did. There were four of us who shared a house together for like a year and a half, a very interesting year and a half. But we were never on the same page in terms of what we were working on. Now, I do remember though, you telling me that the MSI files that we used in Windows Installer that I was not allowed to call those databases.

Conor Cunningham (00:29:56):
Yeah. That was back in my earlier dogmatic days. You can call it whatever you want now, Rob.

Rob Collie (00:30:01):
Well, we all come from where you're coming from, what you're talking about. I mean, I have my own versions of that with age and experience and humbling experience, right? Is where you start to get a little bit more flexible, a little bit better about these sorts of things. Okay. So we could call an MSI file database now. Okay, here's the real test. Does Exchange run on a database, Conor?

Conor Cunningham (00:30:28):
Yes.

Rob Collie (00:30:29):
Yes.

Thomas LaRock (00:30:31):
The look on his face.

Rob Collie (00:30:33):
It was an important moment we had a breakthrough. It was a running joke where I'd run around and say, I forget it's JET Red and JET Blue, right? I forget which JET you worked on, but one of them was better than the other one.

Conor Cunningham (00:30:47):
You know, the worst part is that the other one is more successful than the red one that I worked on but-

Rob Collie (00:30:52):
Oh, bitter, bitter, bitter, bitter pill.

Conor Cunningham (00:30:55):
No, I don't feel so bad about that anymore. I think that it's definitely the case that blue got used in a number of different things, including the storage engine backing for Exchange. And it's used in a few other places as well across Microsoft, but it's also a very specialized engine and it has a different sweet spot. It's not a general purpose database engine that you can use to solve any problem. So SQL is a far more general thing. I've worked on the main SQL server code base now for well say a very long time and that has some flexibility to do a lot of things. And I kind of enjoy that space because it is so flexible.

Rob Collie (00:31:35):
Well, don't worry, Conor the single biggest failure in our personal circle that has anything to do with the Exchange Store is my project that I worked on for two and a half years, Office Designer, never saw the light of day got killed mercifully, was killed before it ever went to market. And believe it or not at one point in time, that awful, awful thing we were working on that was going to just be terrible. It was going to really, really suck. That thing was in contention to be named SharePoint, the SharePoint name was up for grabs. It was like Office knew this was a hot name and it was going to apply to either the technology that the Office web server folks were working on, or it was going to apply to the stuff that we were working on with Exchange. And it was like they were just waiting to see which one of us deserved it and they picked the right horse in that battle.

Conor Cunningham (00:32:28):
Yeah. SharePoint is an amazing business. They're not a normal database app. They run on top of SQL server, but they don't really follow normal rules for rows and columns the same way that a normal app would. But it's a very, very large business for Microsoft and in some sense that's proof that you can take a problem area that's not quite normal row and columns that are fully schematized and derive a lot of value for customers that way.

Rob Collie (00:32:56):
I remember when I was still there like the SQL org almost like regarded the SharePoint product as like an invader in a way, like a barbarian because was not... SQL was never intended originally to power something like that, but the SharePoint folks figured out how to make it work. And I remember, I don't know if this is still true, but like when you would go to Save a file like to a document library on SharePoint, there was like this ridiculous series of stored procedures that would run that would like tear that file into a million pieces and store them in individual places. And of course it took forever to say Upload a file to SharePoint because of it. And I suspect that that's kind of calmed down and things have kind of gotten more streamlined over the years, but originally it was kind of like the rogue effort in a way.

Conor Cunningham (00:33:41):
Yeah. It's definitely a unique application on top of SQL in the sense that it doesn't follow the normal design rules that we would use for any traditional relational database application. That doesn't mean that they can't make it work and that they're not getting value out of the layer that SQL is they can use us to do backups. They definitely run queries. They happen to work, but it's not like we use SharePoint as an example for how to build an app. That said they have a huge investment in the enterprise readiness about how they run say SharePoint Online and they are an excellent partner of ours internally to be able to go drive us to be better at making sure that all the core keep the lights on things that we have to do to run our cloud service are being stressed every single day. Both on just the metrics of delivery and also on cost. So we have that running inside of our cloud today, so SharePoint Online runs on top of SQL Azure and they do that at scale with all of the different SharePoint sites that you have on there running. And it's a huge enterprise and it's one of our big internal customers.

Rob Collie (00:34:47):
Let's go back to Conor the celebrity.

Thomas LaRock (00:34:51):
I got some stuff.

Rob Collie (00:34:52):
Yeah.

Thomas