Kubernetes Podcast from Google: Episode 90 - CockroachDB, with Peter Mattis

#90 February 11, 2020

CockroachDB, with Peter Mattis

Hosts: Craig Box, Adam Glick

Peter Mattis is a creator of the CockroachDB open source database and co-founder and CTO of Cockroach Labs. His history in open source goes back to the creation of the GIMP image editor and UI toolkit Gtk at university in 1995, and his history at Google saw him work on storage and build systems. Hosts Craig and Adam ask him about all of the above.

Do you have something cool to share? Some questions? Let us know:

CRAIG BOX: Hi, and welcome to the Kubernetes Podcast from Google. I'm Craig Box.

ADAM GLICK: And I'm Adam Glick.

[MUSIC PLAYING]

ADAM GLICK: So Craig, how's the weather?

CRAIG BOX: Well, funny you should ask. It's been a little windy here the last few days, with Storm Ciara having blown in across the UK and blown a few things around. Lovely little picture I saw on Twitter of a trampoline thrown at the side of a train.

ADAM GLICK: Which one bounced?

CRAIG BOX: It's hard to tell. The train seemed like it was stopped. But in saying that, trains stop in Britain for any reason they feel like. Quite often, you get a lot of train stopping for reasons like, "there's leaves on the tracks". Saw another great tweet. Which was basically, well, the tracks are still attached to a tree which fell over. So that's the reason that train stopped.

ADAM GLICK: [LAUGHS]

CRAIG BOX: On the BBC news, we have video from Heathrow Airport, which shows a jet trying very, very hard to land. I won't spoil the surprise for anyone who's interested in the story of big jet aviation. They'll see this video and be shocked at what weather can do.

ADAM GLICK: I love the video that you shared -- it's in the show notes. But I was watching that and all I could think is, this has got to be the same guy who saw a double rainbow like a decade ago. Like he's moved and he's now a plane spotter.

CRAIG BOX: He's very enthusiastic about it.

ADAM GLICK: He is. He's remarkably enthusiastic. It's fantastic. You should watch it. It's definitely the best use of like 30 seconds of your time.

CRAIG BOX: Has your KubeCon preparation begun?

ADAM GLICK: It has. Looking forward to KubeCon. It's going to be exciting. My first time in Amsterdam. And I was taking a look. Normally, they have a 5K. And I've decided that, you know, maybe I should go for it this time. I've always looked at it, never done it. I can possibly make it through about one mile, which is about a third of a 5K, before I collapse in a gasping and heaving mass. And I was like, I should try and go for it.

So I'm using the podcast as a commitment device here to try and get myself to go run the 5K at KubeCon this year. So if any of you are out there listening, and you're going to go run it, come run it with me.

CRAIG BOX: Well, hopefully it will be around the canals-- a lovely, flat ground to do a 5K on, I'm sure.

ADAM GLICK: Yes. If it's through the Alps, it's completely different.

CRAIG BOX: Let's get to the news.

[MUSIC PLAYING]

ADAM GLICK: Docker has released the Docker Index, a collection of anonymized data based on the Docker Desktop and Hub tools. The index provides insights into the Docker community and their usage patterns, including top repos, like Nginx, Postgres, Ubuntu and Node. It also shows growth in image polls, which sit at an impressive 30 billion in total, as well as a breakdown of platforms their developers use, which showed 61% of users running Macs.

It's a fun dive into developer systems and appears to be targeted to remind people of the reach that Docker has after their pivot towards dev tools.

CRAIG BOX: Apache Aurora, a Mesos framework for running long running services and cron jobs, has voted to archive itself in the Apache Software Foundation.

Aurora was created by Twitter in 2010 and open sourced in 2013. Twitter is currently moving from Mesos to Kubernetes. Committee member Stephan Erb said that with great sadness he had to report the community had diminished so significantly that the overhead of the Apache Software Foundation umbrella outweighed its benefits.

The project reported only a single contributor since September 2019. An official fork of Aurora continues on GitHub under the name aurora-scheduler.

ADAM GLICK: The CNCF has posted another project journey report, this time taking a victory lap for containerd, the topic of Episode 71 with Derek McGowan. Highlights include 157% increase in companies contributing to the project, a 300% increase in contributors, and an almost 150% increase in documentation contributors. Congratulations to Derek and the many other members of the community. We wish you continued growth and success.

CRAIG BOX: Three weeks ago, we brought you the news that Fedora CoreOS was now generally available. Now, the original CoreOS also known as Container Linux, has announced its upcoming end of life. The final release will be made on May 26th, and on September the 1st, the update service will be turned off and the creation of new instances on clouds will be disabled. Existing images will continue to work, but you are encouraged to migrate to a supported system as soon as possible. Suggested upgrade paths include the aforementioned Fedora CoreOS or Flatcar Linux, a CoreOS fork, which you can learn more about in Episode 79 with Chris Kuhl.

ADAM GLICK: Terse Systems wrote a wonderfully counter-intuitive blog this week about testing and production. Their main argument is that trying to duplicate a full production environment is expensive, difficult, and rarely the right way to test things in a service-based architecture. The blog goes on to discuss what the challenges are, and to talk about ways to siphon off small amounts of traffic to test services with production traffic, to speed up development and ensure integration works.

As with many counter-intuitive things, the post acknowledges that this can be messy and difficult. It's likely to generate a lot of debate, but it's an interesting argument for a very different way to build services.

CRAIG BOX: Thanos is a storage system for storing Prometheus metrics in perfect balance. The team at Banzai Cloud use it to monitor many Kubernetes clusters and have been maintaining a popular Helm chart for it. They've now announced the alpha release of a Thanos operator and have released it to their stable of operators to encourage a community to grow around it. Banzai Cloud has also written a summary of a new feature in Kubernetes 1.18, well you can mark a container in a Pod as a Sidecar, to control the order in which the containers are created and removed.

ADAM GLICK: There's a new certified Kubernetes distro on the net thanks to Intel. Clear Linux is Intel's cut down Linux distro focused on fast startup times, minimal components, and a reduced attack surface. Clear Linux has a Cloud Native Basic Bundle, and that bundle has passed certification as a Kubernetes distribution. Congratulations.

CRAIG BOX: If you're in the process of upgrading from Helm 2 to Helm 3, there are a few gotchas to look out for. Dawid Ziolkowski points out required changes and non-obvious bugs, including failures that aren't failures, so you can have a better chance of success.

ADAM GLICK: Can't get enough of the stateful versus stateless debate? Nor can the folks at MinIO. Nitish Tiwari has posted a nice intro into the different kinds of storage, why people use each, and where he thinks the industry is going. If you can't get enough of one of the great storage questions, then this post may serve as your fix for the week.

CRAIG BOX: Brian Carey is working towards a Cloud Foundry developer certification. Not wanting to take the road well traveled and use OpenStack, or the hosted version from Pivotal, Carey thought, "it's a Kubernetes system, I know this!", and has shared his experiences. He found kubecf, built by SUSE, which uses the cf-operator from the Cloud Foundry Quarks project.

Carey says that he hasn't run into any significant limitations yet and has been able to complete his coursework with no OpenStack required.

ADAM GLICK: If you did want to stick with OpenStack, the Kubernetes blog has a very specific post this week on deploying on OpenStack with kubeadm. If you're working on prem with Kubernetes and OpenStack, then this walkthrough is just for you as it goes step by step through the deployment on top of OpenStack VMs, including the CloudProvider integration with the Cinder storage system.

CRAIG BOX: The CNCF has posted a case study for Falco, the open source container security project originally written by Sysdig. Frame.io uses Falco to secure their media processing pipeline, and the write-up is an overview of how a user can meet their compliance needs at a much lower cost than with a commercial product.

ADAM GLICK: Are you a full stack developer interested in Kubernetes? Digital Ocean might have just the voluminous tome you're looking for. They've released an e-book for full stack developers to help you understand Kubernetes. The e-book is available in PDF and e-pub formats, which I for one thank them for, as reading a 637-page PDF sounds like the opposite of developer friendly.

CRAIG BOX: Finally, the CNCF has posted a reminder that the day 0 events for KubeCon are open for registration. The pre-day is Monday, March the 30th at the RAI Amsterdam and runs from 9 AM to 5 PM.

Events include Cloud Native Security Day, Serverless Practitioners Summit, and ServiceMeshCon. Do note that there are additional fees for some Day 0 events.

However, if you haven't purchased your ticket to KubeCon, you can use our discount code to get 15% off your ticket and maybe put that saving towards the Day 0 events. You can find the discount code in the show notes. And please be aware that there's no cost to enter the 5K run.

ADAM GLICK: And that's the news.

[MUSIC PLAYING]

ADAM GLICK: Peter Mattis is the co-founder and CEO of Cockroach Labs, the company behind Cockroach DB. Previous to his founding of Cockroach Labs, Peter worked at Square and Google and was the creator of the open source image editing program, GIMP. Welcome to the show, Peter.

PETER MATTIS: I'm glad to be here.

ADAM GLICK: You were successful with open software very early on, with the GNU Image Manipulation Program, which most people probably know as GIMP. And you did that with your roommate, Spencer Kimball. What made you create something that was basically an open source alternative to Photoshop?

PETER MATTIS: The word "boredom" comes to mind. That was probably the primary reason. We were a little bit bored of our classes. We never actually had aspirations, and we didn't see how large it was going to grow at the time we created it.

And we got started, and we just started fiddling around with some stuff, fiddling around some software, some image manipulation. We put a UI on it, and it just kept on going and going. It kind of snowballed from there.

Certainly if we had known how much work we were going to invest in it up front, we might never have gotten started. I kind of think this is true for most things you do. If you know how bad the marathon is going to be, you would never start running the marathon.

But when you're like far into it, you're like, uh, I've come this far. Let me just do one more thing. Let me just go another couple of minutes. Same is true of starting a company.

Yeah, in hindsight, I look back and I'm like, wow, I can't believe people are still using this. Nothing's ever replaced it. But I'm also pretty thrilled when people come up and tell me that they use it, they still use it, they, you know, gnashed their teeth on open source early on by looking at the GIMP source code. That's a pretty big thrill for me.

CRAIG BOX: GIMP was the progenitor of the GTK Toolkit, which then went on to underlie a lot of modern open source desktop software. Do you think that the toolkit or the application is the achievement that's more likely to be remembered?

PETER MATTIS: Well, I think people remember the application. People say, oh, you wrote the GIMP. And this is for the user community. Users know about GIMP. Developers know about GTK. They know about the tool kit. And I think GTK has kind of had a larger, broader impact than the GIMP has.

I mean, even that's questionable. I mean, I think the GIMP was like that kind of the first serious, and user application in open source, you know, there was operating systems, there was Linux, there was the GNU compiler. But that was something for experts and developers. And the GIMP was kind of the first end user application that really hit it in the open source community.

ADAM GLICK: At the time when you launched GIMP, a lot of the software that was out there was proprietary. And that was the predominant model. And you chose to take a very different tack with that. Open source was still somewhat of a nascent kind of community out there.

What made you decide that open source was the right model you wanted to do for the program you were building?

PETER MATTIS: Well, there's a couple of factors. One of them, we were using Linux at school. We were using GCC, the GNU C Compiler. And we kind of felt we wanted to give back.

Another aspect is, we didn't really know what we were doing. So we didn't want to have the chutzpah to say like, hey, you know, we're going make this proprietary software, you should buy it. We were just college students at the time and immature in our abilities. And it felt like, hey, we'll do this thing, we'll fiddle around with it. We don't want it sitting as an albatross around our necks for the rest of our life. So we'll kind of release it out and see if it lives or dies. And it turned out to live, but I wouldn't have predicted that at the time.

CRAIG BOX: After you graduated from college, you ended up in the world of professional engineering, spending some time at Google and working on a bunch of technologies. With that background in open source that you had developed at school, how did you go about deciding where you wanted to work?

PETER MATTIS: Well, it's interesting. I first-- before I ended up at Google, I ended up at this other company, Inktomi. One the founders of Inktomi was one of my college professors. And it was just happenstance I ended up there. But then kind of the way my career has developed over time, it's like you meet someone at that first company, they hook you up with your next gig. That next gig happened. I founded a company back in the dot com boom. That company never made it.

But during the time period-- during the founding, I happened to meet Larry and Sergey, who were the founders of Google. They said, hey, you should come interview at Google. And so it was like this whole meandering path.

If I hadn't gone and worked at Inktomi, I wouldn't have done the startup. I probably wouldn't have met Larry and Sergey, I probably wouldn't have wound up at Google. So I kind of see that meandering path happen very frequently in my career and other people's careers, as well.

CRAIG BOX: Would your college professor in question happened to be Eric Brewer, our guest on Episode number 49?

PETER MATTIS: That would be the professor in question. That's correct.

CRAIG BOX: So you worked on a number of systems at Google. Let's talk about those. First of all, the Colossus system. Tell us about Colossus and why it was built.

PETER MATTIS: Colossus is a distributed file system. As far as I know-- I've been gone from Google for about eight or nine years now-- it is still the distributed file system in Google. It had a predecessor called GFS, the Google File System. GFS and Colossus, they were meant to run on thousands of machines. So they were built at a time before there was like HDFS and some open source alternatives. And really, those first alternatives were inspired by Colossus.

The rationale for having them is that people didn't want to be managing storage on kind of an individual machine basis.

GFS actually preceded Colossus. And the reason for wanting to build Colossus, and the need for Colossus, was GFS was running the scaling limitations. I mean, GFS did something amazing. It showed that you can have a distributed file system of thousands of machines. And then Google was finally like, well, that's not big enough, we actually want bigger data centers. And there was just fundamental design decisions inside GFS that prevented it from scaling.

So we went and tackled those. And I was scaling the-- you know, essentially, what is the master node. In GFS, there's a single master node. In Colossus, it's distributed. And then the other big addition was the addition of something called Reed-Solomon storage, erasure coding storage. And I don't even know if I can give a concise description of this. But it's a way to have kind of redundancy and reliability without actually having replication.

CRAIG BOX: You also worked on Google's build system, which people may now know today as Bazel. What do you recall from your time on that project?

PETER MATTIS: That was super interesting. Inside Google, there's kind of several different build environments. When I got there, they were working something called google2. Before google2 was google1. Both google1 and google2 were just ginormous make files. I think there was actually a series of make files. And at some point, you know, some very senior people got together and said, this is not going to scale, not with the growth rate of Google.

And somehow, I got conscripted into helping out there. I was always kind of interested in build systems. So I got conscripted into helping out. And what we ended up doing initially was essentially trying to replace "make" and the Makefiles with something that generated Makefiles, and that was the genesis of google3, and it described this structure. You had BUILD files instead of Makefiles, and the original system was called gconfig. And it took those BUILD files as a big old Python script and it output a Makefile.

And then, kind of over time, some other people came in and were like, you know what? We really shouldn't be actually outputting on a Makefile if that's a limiting factor. And that's where this system inside Google called Blaze got built. And when they open sourced Blaze, it became Bazel.

ADAM GLICK: You're quite the polymath in terms of the kinds of technologies that you've worked on. And you made an interesting shift from a front end application, to a number of back end systems, to the build systems that actually helped create those systems. What was your thinking on why you took on such varied challenges in the software pipeline?

PETER MATTIS: I like to take on things that are affecting me in some way. Working on the GIMP initially was because I actually had just bought a scanner. And it was back in the days where they were kind of rare, and I just wanted to do some manipulation. And in high school, I'd actually had a license to Photoshop through the school newspaper. I got to college, I didn't have that license, and the tools were limited. So I wanted to build something to actually just kind of futz around with my photos I was scanning in.

You know, I'd get into-- like why did I get into build systems? Well, developers are interacting with a build system every day. And you run into frustrations. And I also found various aspects of "make" and whatnot confusing. I'm like, eh, you know, I can fix this and make my life better.

How did I get involved in Colossus? Well, before Colossus, I was involved in Gmail, and I really like working on-- like I wanted to work on Gmail because I wanted to make my email life better. I wanted to have faster search on email. I wanted to have large amounts of data storage.

But Gmail itself was running on top of GFS, so there's kind of a natural transition from working on the backend for Gmail over to working on the distributed file system that sat underneath Gmail.

CRAIG BOX: When you go onto challenges outside Google, we've heard from a number of people and a number of projects in the past that they say, "I don't have all those tools anymore. I no longer have access to Colossus. I no longer have access to the Blaze build system", for example.

You work on a project which reimplements one of those systems. Before we talk about the project specifically, what were you doing at the time that had that same need?

PETER MATTIS: My exit from Google found me at another startup. And I was working on a mobile photo sharing website and application. And this was not Instagram, so we didn't succeed and they did. It was not Snapchat. We didn't succeed, and they did.

But we knew we needed a distributed storage backend for the system. And we looked at the tools that were available in the open source world, and we kind of examined a lot of them in depth and came away kind of feeling that they were lacking, and that we really wish we had access to Bigtable, and in particular Spanner.

Spanner was still being developed when I left Google. But we were well aware of the reasons it was being developed and kind of the aspirations for that system.

ADAM GLICK: Your most recent project is CockroachDB. For those that aren't familiar, what is CockroachDB?

PETER MATTIS: CockroachDB is a horizontally scalable, geo-distributed SQL database.

CRAIG BOX: There's a lot to unpack there.

PETER MATTIS: Yeah. Traditional SQL databases are monolithic in the sense that they usually are running on a single node. If you want to scale the database, you just have to have bigger and bigger hardware on the node. You know, going from one CPU to four CPUs up to 64 gigabytes of data, terabytes of storage on one single node.

But if you actually get to the limits of what can be fit on a single node, then you have to do something called sharding. It means you have to have multiple databases. And at the application level, you have to kind of spread your data across those databases.

And this has been the traditional path for people using SQL databases for the past 20 or 30 years. What we want to do is make it so all that kind of just happened internally to the system.

With CockroachDB, you have a SQL database. It can run on one node. Usually, though, you run it in a multi node configuration. If you need more capacity, you just add another node, add another node. And it's very seamless to scale up capacity horizontally.

I also mentioned it's geo-distributed. And what that refers to is the fact that you could actually put the nodes in the clusters in various geographic locations, and that allows you to survive data center outages and regional outages. It also allows you to push your SQL data closer to the users for that data. You can imagine how this would come up in an email system where you want an email user in New York to have their data co-located on the East Coast. You want a user in Hong Kong have their data co-located somewhere near Hong Kong being in a Chinese data center.

This also frequently affects Facebook and others. But the need for geolocated, co-located data isn't just affecting the big players anymore. It's super easy. I know a startup here in New York. They do recruiting software. And they have a ton of US users. And then they have users over in-- I think it's Singapore or China. It's just super easy as a startup now to have global users, and you need to put your data close to those global users.

CRAIG BOX: Now you mentioned that you were at a photo sharing company. Is that where the software was first built?

PETER MATTIS: No. As we were getting started on this photo sharing startup, we were looking at what technologies to use. And it was actually where the name Cockroach came up, when we were at that startup. That was named by my co-founder, and he actually started sketching out the design for what the system should be. And like, hey, we should build this system, it'd be pretty cool. And I'm like, no, we're building a photo sharing application. We don't need to do this.

It just happened to be happenstance that, right around that time, about the same month, Amazon announced DynamoDB.

CRAIG BOX: Mhm.

PETER MATTIS: We looked at it. We took a good look. It didn't give us everything we wanted, but it gave us enough that we just decided to build that photo sharing app on top of DynamoDB.

CRAIG BOX: Which then came first? The company or the software?

PETER MATTIS: The software came first. That startup, which failed-- sometimes starts fail by just going extinct, and sometimes they get acquired or acqui-hired, in our case. So we got acqui-hired by Square. And we got inside Square and we saw that, wow, they were having some of the same challenges with charting MySQL and charting databases that we'd seen externally. And they were also having challenges trying to use NoSQL database software-- challenges that make it difficult for applications to be built on top of them.

So my co-founder proposed to Square management like, hey, can I work on this? Let's make it open source. They give him thumbs up on doing that. And so we actually got started on the software-- I think that was in 2014 if I'm remembering this correctly, about a year before the company was founded.

Then it started getting enough traction. He was doing stuff on the open source side, some external contributors start contributing. And I can't remember the details, but somehow he got a news article written about it. And once that news article got written, suddenly there was like a bunch of interest externally, and people pushing us, like, you guys, this is what you should be doing. Square is a nice company. It's a great company to be at. But just, you got to go and try to make it work with CockroachDB.

CRAIG BOX: You said that you were inspired by Spanner and F1, though you weren't working on those projects directly. Did you learn from the papers? Did that help kind of motivate how you think about the challenges that these databases solve?

PETER MATTIS: Certainly. We definitely learned from the Spanner paper. We learned from the F1 paper. I'd say we're more inspired by Spanner than by F1, that we've actually taken probably more concrete tangible actions and designs from the F1 paper.

I have this mild frustration sometimes with the way these papers are written, in that they have to focus on something that's unique and novel. But that ends up leaving out a ton of interesting implementation details. It's hard to go from the Spanner paper and actually just build that system. They leave out way too many implementation details. They've had follow on papers that have gone into some more details. But still, it's like, there's so much there that you have to fill in between.

So we say that CockroachDB is inspired by Spanner. There's just a vast swath of differences, though, when you actually get down to the implementation. And, actually, from the user's perspective as well.

An example of that is, very early on, we went full in on SQL. And SQL was kind of-- to some degree, an afterthought for Spanner. And eventually, F1 was actually a SQL system built on top of Spanner.

And they eventually kind of merged the two together, I believe. That's my understanding. Again, I'm speaking from rumor here, because I don't work at Google any longer. I don't know the truth.

CRAIG BOX: There's a long history of implementing Google papers in open source with the Hadoop ecosystem. Obviously, it came from the MapReduce paper and a number of them. Do you think that's an apt comparison in this case?

PETER MATTIS: Yeah. I think that's a fair comparison. And actually, we actually heard from VCs that they were well aware of how that had happened, that various papers were written by the Google folks, the open source community embraced them, implemented stuff, spawned these huge ecosystems. And we knew of VCs that were aware of the Spanner paper, and they were waiting for some open source folks to go ahead and implement that kind of open source version of Spanner.

It was actually very nice to have that during our initial VC conversations. Because we could basically say like, yeah, we're just kind of the open source Spanner. And a lot of VCs would be like, got it, OK, I understand. This is probably going to be big if Google--

And so it gave us some cred that we didn't have to explain things in a nitty gritty way.

ADAM GLICK: Your old professor Eric Brewer is famous for his work on the CAP theorem.

PETER MATTIS: That's correct.

ADAM GLICK: I'm curious. One of things when, as a developer, you're thinking about choosing a database-- you know, you're looking at all these things you have to think about. Do you want to do SQL or NoSQL or sometimes NewSQL? And taking a look at latency versus consistency.

I know you have a focus on transactions. And if you have something that's transactional, but it's also distributed, how do you make the trade-offs with transactions versus latency?

PETER MATTIS: I want to first touch a little bit on CAP itself-- kind of generally two areas of CAP. You have consistency and availability and partition tolerance. Those are the areas. And the eventually consistent systems, their priority is availability. We're always going to have the data available even if it's inconsistent.

For CockroachDB, we focused on consistency. And what that actually ends up meaning though is, in the event of a network partition, the minority part of the partition is going to have unavailability for the data. You're not going to able to access it.

The other part of your question was about transactions and latency. There have been a number of papers written over the years about how to do distributed transactions. And we had an original distributed transaction protocol inside CockroachDB. We've since been evolving that over time. And our general feeling is, we have to lower the latency of distributed transactions down to the minimum, whatever the minimum is.

I don't think we've reached the minimum yet. We've gotten it down to this point, with the most recent release of CockroachDB, that we can do a distributed transaction-- in one-round trip of distributed latency.

But there's some asterisks behind that and some other things that happen in the background. And we're still working to reduce that further. There's some research papers that say like, hey, you can actually do some forms of transactions, distributed transactions, in half a round trip of latency. Just the one network hop. And there's certain restrictions that they put in place in order to make that happen.

But this is something we're paying attention to. The lower the latency is, the more flexibility it gives application developers, and the less they have to think and plan upfront.

One of the things we're still discovering is, even when we give people kind of the tools and flexibility, application developers, they're focused on their business needs and not necessarily understanding the details of how distributed transactions work and what the trade offs in how you partition your data and move your data around is. And so we just have to make all the abstractions in our system as seamless and easy to use and high performance at the same time as possible.

CRAIG BOX: When describing the eventual Cloud Spanner product, that Google offers, Eric Brewer said that, effectively, when your availability becomes such that you have so few occasions of network partitions, then it's not C, A, and P all at the same time, but it can effectively appear as such.

Does Cockroach have the same properties?

PETER MATTIS: Yeah, the same properties apply. I mean, really, network partitions still occur. But nowadays, you're finding it more and more that people just need kind of the flexibility-- like we want the consistency, because dealing with eventual consistency, it's just a massive, massive load and burden on application developers.

And we've seen that kind of time again in industry. I've seen it in my own experience. Google saw this as well. And what you do have to do is provide kind of strict consistency. And the network partitions-- there's enough networking infrastructure in place now that when those things occur, it's better to have the consistency and put that burden on maintaining good networking with the network engineers-- network engineers have done a great job.

I think kind of the remarkable thing that makes both Spanner and Colossus and a lot of these other systems including CockroachDB possible now is that network speeds have gotten so vastly faster now than they were when databases were first designed.

I mean, it used to be the case that you wanted to read something from disk. You're talking 20 milliseconds. You want to read something from remote machines, and you're talking hundreds of milliseconds.

And now to get something from a remote machine in the same center, it's sub-one millisecond. Is probably like 300 microseconds in Google. It might even be lower now. I don't know what their actual network latencies are, but they're very, very low.

CRAIG BOX: We can neither confirm nor deny.

PETER MATTIS: [CHUCKLES] Yeah. Well, we test some stuff on the clouds. Both Amazon, Azure, and Google Cloud, they're all like the hundreds of microsecond ranges. That's oftentimes faster than you can actually get it if you're reading it from an SSD. There's like this change that's occurred that enabled this new class of applications. And big thanks to the networking folks who made this possible.

ADAM GLICK: Is there a set of architectural guidance that you provide to the people who are setting up CockroachDB as they think about reliability and consistency?

PETER MATTIS: Definitely. We have a large amount of documentation on how they should be thinking about distributing their data. I mean, it's an unfortunate reality. We kind of give this illusion that your data can be spread anywhere in the cluster, and that you can do a read, you can read the data and whatnot. And you have that illusion on the one hand, and then that illusion gets pierced by the geographic latencies.

So you have to be well aware of your geographic latencies and thinking about that in how you lay out your data.

We have a lot of documentation on how to do that. We have mechanisms in place that control where the data resides, in order to help place it close to users. We have some automatic mechanisms inside the system to move the data around. But we still do not have the final answer for just making this easy and obvious for end users.

CRAIG BOX: One of the key takeaways that a lot of people took from the Spanner paper was about the atomic clocks. That Google had developed a system where they had super precise time available to the machines. That's something that CockroachDB doesn't require. So could I ask your opinion, first of all, on that being the thing that people thought was necessary, and then your implementation, which clearly does not require that hardware?

PETER MATTIS: I think part of the reason the Spanner paper focused so heavily on the use of atomic clocks is that that was the really novel thing about the system. And for some of these academic papers, novelty is just a gigantic factor.

I think if you ask the engineers-- I mean, I was working right next to the Spanner team for a long period of time before they started working on the atomic clock part. And 90%, 95% of the system is about everything besides the atomic clocks.

So I don't know if I quite have time on this podcast how to get CockroachDB deal without having atomic clocks. We kind of inverted things. They have some stricter guarantees they provide on consistency inside Spanner by having the atomic clocks. Our guarantees are slightly weaker but not hugely weaker. And it's basically just allowing there to be more clock skew.

I do think another kind of interesting thing that's happened over time though is, there's kind of this expectation at the time that, in order to get very accurate timekeeping, and time synchronization, you needed to use atomic clocks.

There was a paper a couple of years ago, which is just fascinating. It came from Stanford. I can't remember the name of the paper, but they've actually formed in a company called Tick Tock Networks. And they're basically able to get even lower clock offsets by doing some fancy kind of machine learning algorithms to reduce the clock scew.

And this actually has some real benefits. If some of these systems start getting deployed everywhere and available in public clouds, it kind of opens up the door to-- you kind of get faster than light coordination between systems on opposite sides of the world.

ADAM GLICK: Are they communicating through short videos?

PETER MATTIS: Hah! No, I don't think so. That's hilarious. Different Tick Tock. I think it's spelled different. This is actually the full T-I-C-K T-O-C-K.

CRAIG BOX: CockroachDB predates Kubernetes. Not by much, but the Spanner system at Google, as everything else, runs on top of Borg. So what was the plan for running CockroachDB without Borg when you started developing it?

PETER MATTIS: Our plan from the get-go was there would be other orchestration tools out there. Mesosphere was out there. I think it's Apache YARN was another one. And then people just running like Chef and some other of those tools.

I mean, basically we wanted though to make it very easy. Like our design from the outset for CockroachDB was it just can be a single binary. The binary could adopt different roles, but you wouldn't have to run different sets of binaries and services in order run CockroachDB. You just had the single binary, and everything would be super easy to run.

And we kind of took that on and we made that happen. And then Kubernetes came out and has actually made it vastly easier to run complex services.

That ended up making CockroachDB fit very nicely into Kubernetes, because we had done all that work upfront to just make it super easy to run.

But our plan, if Kubernetes hadn't come out, people would probably be running it via Chef and other tooling like that.

ADAM GLICK: There is a very animated discussion that starts up anytime you start talking about running databases on top of Kubernetes.

CockroachDB is a cloud native database and can run inside, on top of Kubernetes. What are the benefits of that design?

PETER MATTIS: The thing that enables CockroachDB to run so smoothly inside Kubernetes is the fact that we kind of have replication just built right in.

So the problem with writing something like Postgres or MySQL or just a monolithic database is your data is just only stored on one node. And Kubernetes kind of wants to abstract you away from the node.

I want to be able to run these services. I can just add on additional containers to the service. And that just doesn't fit in with the way traditional SQL databases have been written.

With Cockroach, you're able to like-- you know, you have your Kubernetes cluster. I'm going to run my Cockroach cluster within there, and it's going to be running on-- let's say it's five nodes inside the Kubernetes cluster. And I can easily just add on additional nodes.

I mean, this is the architectural advantage of CockroachDB, is that horizontal scalability that I talked about before. Need to add some more capacity into my database? Boom. Really just a change to the Kubernetes config, those YAML files or whatever else you're using there. And it's minutes to add on that additional capacity.

CRAIG BOX: Now that cloud native databases like CockroachDB exist, should anyone still be running monolithic traditional SQL databases on top of Kubernetes?

PETER MATTIS: Well, I think no.

CRAIG BOX: You heard it here first.

PETER MATTIS: [CHUCKLING]

CRAIG BOX: We've spoken with the team behind Vitess, for example. There are people who have made those things fit the more cloud-native model. But is the compatibility of things like CockroachDB such that you can drop them, and now as a replacement for Oracle SQL Server and so on?

PETER MATTIS: Yeah. I mean, there's this general challenge with databases, is-- in SQL databases, they all speaks SQL. But the dialects of SQL are different. There is unique properties to SQL Server versus Oracle versus Postgres versus MySQL.

And you can't replace just dropping one for the other. And CockroachDB falls into this.

So CockroachDB strives to be PostgreSQL compatible. If your application is written on Postgres, depending on what parts of Postgres you use, you might be able to drop CockroachDB in as a replacement.

But the surface area of SQL databases is huge. And when people think of Postgres, they don't think of the base Postgres. They think of Postgres and all the extensions that come along with Postgres. And CockroachDB doesn't support those extensions. And we also have a few extensions of our own that other databases don't support.

It's a little bit of an unfortunate world we live in there are no kind of perfectly compatible SQL databases.

CRAIG BOX: Back in June last year, you relicensed your software. This was a time there were a number of software vendors found themselves competing against large cloud vendors, quite often the kind that also sell books, who are running their open source product as a service.

Those vendors are finding that the cloud is monetizing the software in a way that they are not able to. Take us back to that time and how you came to that decision.

PETER MATTIS: For a long period of time, open source seemed to have found a viable business model. And the name for that business model is open-core. The open-core business model is, you're going to have-- you know, the core of the software is going to be open source under something like the Apache version 2 license. And then you will provide enterprise kind of enterprise functionality around that core, and you'll sell that enterprise functionality.

And the enterprise functionality might be closed source. It might be licensed under a more restrictive license than Apache. And essentially, that worked well from a long period of time.

And then this online bookseller came along. Well, before even the online book seller coming along happened, something started to shift in the industry. And that is, everybody wanted to start consuming their software as a service. And this is just a general trend. We see this happening.

I mean, we consume our hardware as a service now. No one runs their own hardware except the big cloud providers and legacy companies, which have their own databases. Instead, we consume hardware by just starting up a VM in the cloud, in Amazon, or Azure, or Google Cloud.

So there's this general move that everyone wants to consume their software as a service. And this move is also starting to encompass databases and other kind of database like software.

So what happened is this online bookseller is like, OK, I'm going to take the open-core part of these systems, and I'm actually going to start reimplenting the enterprise functionality so I can provide the full kit and caboodle to my customers.

And that was kind of a wake up call that just having that enterprise functionality didn't protect the businesses as much as we thought it was going to protect them.

We made the transition to what's called the BSL-- the Business Source License-- in order to provide an additional layer of protection.

The reason we did this early on wasn't because we knew of an active threat, but it kind of was putting a moat in place for something that might happen a couple of years in the future.

The problem is, as soon as one of the behemoths in the industry turns their sights on you, it's a little bit too late. You want to have prepared yourself before then.

ADAM GLICK: We're not lawyers. And certainly, this isn't a show for legal advice. But for the technical folks out there, when we think about these various licenses, like what's the way to think about it?

PETER MATTIS: The Business Source License is kind of interesting. It's a time limited license. And after that time limit expires, I think-- we had to put in what we want the limit to be. It can't be more than four years. And I think we said four years.

When four years goes by, the software that's under this license, under the BSL, will revert to another license, another open source license.

So for CockroachDB, for our last release, which was the 19.2 release, which happened in the fall of 2019, four years from that date, in the fall of 2023, all of that CockroachDB 19.2 source code will switch from being licensed under the BSL to being licensed under Apache.

I think there's actually kind of this nice aspect to a time limited license like that, in that it's a little bit like short term patent protection. It's on a time frame that like the internet moves. After four years, we will have made significant strides in advancing CockroachDB, or we should have, and this definitely aligns our interests with the interests of our users, that we will be continuing to enhance it. And that's why you should actually engage in an enterprise license with us.

The BSL, we actually also get to put restrictions on how we want people to use it while it's still licensed under the BSL. And we put it fairly relaxed restrictions there.

We still want people to use it for developers. We still want enterprises to use it. We just don't want enterprises competing with us and providing CockroachDB as a service. And we were kind of careful to spell out exactly what that limit is.

ADAM GLICK: What's next for Cockroach Labs as a company?

PETER MATTIS: Well, we do have CockroachDB as a service. Our service is in its infancy, and we really want to see that service grow. We think there's a lot of very interesting things to do to basically make database as a service more ubiquitous and easier to kind of spin up very lightweight database clusters and geo-distributed clusters.

We also have a variety-- the SQL feature set is long. One of the features that is just coming up now, and looks like we're going to be working on it very soon, is geospatial indexing. I think this is our most requested feature than CockroachDB right now.

And then we're continuing to put a lot of effort into improving our compatibility with Postgres, and in particular, to improve our compatibility with ORMs. It turns out that almost nobody actually interacts directly with the database. They all interact with it indirectly through an ORM.

And so, if there's any sort of small incompatibility there with the ORM, it shows up. And above and beyond the small incompatibilities, we just want to make that experience fantastic for our users.

ADAM GLICK: For people that might not be aware, what is an ORM?

PETER MATTIS: It's an Object Relational Mapper. They're basically libraries that sit inside the client application that often abstract away some grungy details of interacting with SQL.

So very frequently in ORM, we'll provide some kind of query building language, and it'll provide a way to extract data from the results set.

So SQL is returning result sets and tables, and you might want to extract that into an object.

CRAIG BOX: Finally, earlier on, you blamed one of your co-founders for the name of the database and the company. The implication, as I understand it, is that cockroaches will survive the impending nuclear war which will otherwise consume all of us.

But that felt like it might be a risky thing for something that's so unpopular with so many people, to name a technology and a company after?

PETER MATTIS: Yeah, no, we've had a number of people tell us that we should change the name. Right up in our face. I had someone very early on in the lifetimes of Cockroach Labs saying that "no CIO ever would sign a check made out to Cockroach Labs". I can tell you that that was mistaken, and they have signed checks made out to Cockroach Labs. Or actually, I think they just do wire transfers. Maybe that makes it easier.

Yeah, it's a little bit risky. But on the other hand, you know, there's a lot of bland names out there. Especially in the database landscape and even elsewhere, there's names that are-- just like they're non memorable.

And my co-founder, Spencer, who came up with the name Cockroach, he actually describes the name CockroachDB as this little mental worm. It gets inside your brain. You can't forget it.

I mean, I can tell you-- like it doesn't-- database names right now, the one you're going to remember, not just because we've been talking about Cockroach for the past 20 minutes, but the one you can remember is Cockroach, because it just kind of sticks in there.

To the people who kind of really get offended by it and are like, you know, you get used to it over time. And actually, I kind of look fondly-- whenever I see the word cockroach somewhere, I kind of smile and take a closer look at it now.

CRAIG BOX: You don't have any plans to name any features or any other companies after, for example, the huntsman spider?

PETER MATTIS: [LAUGHS] Yeah, no. We have some jokes that maybe we can just name this like soft fluffy database or whatnot-- no, I think we're going to stick with the-- we call a lot of internal tools, we put like-- we have roach-prod, roach-test, roach-dash.

We kind of embrace it. We call employees Roachers. Everybody here has gotten used to it at this point.

In fact, for Christmas, one of my friends gave my daughters little robotic cockroaches and they just got a thrill out of that.

CRAIG BOX: Well, thank you very much Peter for joining us today.

PETER MATTIS: Yeah, it was my pleasure. Thank you, guys.

CRAIG BOX: You can find Peter Mattis on Twitter at @petermattis, and you can find Cockroach Labs and cockroachlabs.com.

[MUSIC PLAYING]

CRAIG BOX: Thanks for listening. As always, if you've enjoyed the show, please help us spread the word and tell two friends, one more than last week. If you have any feedback for us, you can find us on Twitter, @kubernetespod, or reach us by email at kubernetespodcast@google.com.

ADAM GLICK: You can also check out our website at kubernetespodcast.com, where you'll find transcripts and show notes along with that discount code. Until next time, take care.

CRAIG BOX: See you next week.

[MUSIC PLAYING]

View More Episodes

CockroachDB, with Peter Mattis

Chatter of the week

News of the week

Links from the interview

Transcript