Kubernetes Podcast from Google: Episode 81 - Vitess, with Jiten Vaidya and Sugu Sougoumarane

#81 November 26, 2019

Vitess, with Jiten Vaidya and Sugu Sougoumarane

Hosts: Craig Box, Adam Glick

Vitess is a cloud native database clustering system for horizontal scaling of MySQL. It was built for YouTube, open sourced, and has recently graduated from the CNCF. Two members of the team who wrote and ran Vitess at YouTube, Jiten Vaidya and Sugu Sougoumarane, are CEO and CTO of PlanetScale; a company they founded to support Vitess commercially. They join Craig and Adam to talk databases.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

News of the week

Links from the interview

Vitess
- About Jiten and Sugu
Graduated from the CNCF
Database shards
Vitess history
- YouTube acquired by Google in 2006
Go; 10 years old
Google storage systems:
- Bigtable
- Colossus
Scaling MySQL in the cloud with Vitess and Kubernetes and Cloud Native MySQL Sharding with Vitess and Kubernetes by Anthony Yeh, Google Cloud
Case studies: Stich Labs, Hubspot, JD.com
Vitess at KubeCon:
- Vitess: Stateless Storage in the Cloud by Sugu Sougoumarane
- Geo-partitioning with Vitess by Deepthi Sigireddi and Jiten Vaidya
- How to Migrate a MySQL Database to Vitess by Sugu Sougoumarane & Morgan Tocker
- Gone in 60 Minutes: Migrating 20 TB from AKS to GKE in an Hour with Vitess by Derek Perkins from Nozzle
Postgres support
PlanetScale
Announcing PlanetScale’s CNDb
The name
- Voltron
- Strong Bad’s advice on naming things
Jiten Vaidya and Sugu Sougoumarane on Twitter

Transcript

Show full transcript

ADAM GLICK: Hi, and welcome to the Kubernetes Podcast from Google. I'm Adam Glick.

CRAIG BOX: And I'm Craig Box.

[MUSIC PLAYING]

ADAM GLICK: How was your KubeCon?

CRAIG BOX: It was great. It was fantastic meeting people. Thank you so much to everyone who came up and said hello. We've come to the conclusion that it's a little hard to come and say hello to people that you know the sound of, but you don't necessarily know what they look like.

ADAM GLICK: [LAUGHTER]

CRAIG BOX: We may have to put up posters.

ADAM GLICK: Yeah. We'll have to find a solution for that for KubeCon EU. But for those of you who came and found us, either at our meet-up, or to the dozens of folks that we ran into on the show floor, just thank you for listening. It was great to get your feedback, and to talk to you folks, and meet you in person.

We really enjoyed it. We do it for you. And we are glad to get the feedback on it.

CRAIG BOX: Were there any particular booths that stood out to you at KubeCon?

ADAM GLICK: Well, if I think about who won the Twitter Chatter Award, I think it's probably the WeWork booth.

CRAIG BOX: Oh, yes.

ADAM GLICK: Did you manage to wander past that one and notice the sign there?

CRAIG BOX: I did. There was a little sign on it that someone who may or may not have been WeWork staff put on it. It said, free co-working space. But aside from that, there was no obvious sign of life around it.

ADAM GLICK: Yes. I saw someone staffing it the first day. And obviously, they've had some unfortunate news recently. And so I think someone was making the best of that situation with the sign. But that's probably the one I saw people talk about most.

CRAIG BOX: I do hope that person wasn't employed on the first day and then unemployed on the second.

ADAM GLICK: I certainly hope not.

CRAIG BOX: Been playing any games since you come home?

ADAM GLICK: You know, I had a little time on the plane. And so I, of course, decided to stick with the transportation theme. I tried out a game called "You Must Build A Boat", which is kind of like if you created an 8-bit version that mashed up an endless runner and a match three game.

And it has some kind of character-building elements. So they're fairly light.

CRAIG BOX: Why must you build a boat?

ADAM GLICK: You must build a boat because it is there. It tells you. It literally puts it in the biggest letters possible on the screen. You must build a boat. And you just have to accept that as a premise.

CRAIG BOX: I played a great game a few years ago called "You Have To Win The Game", which I think has a similar premise. It was a retro room-based platform game, like you might have had in the CGA era. And it hearkened back to that aesthetic with the lovely cyans and browns and so on. But it was a fantastic little game. And you can find a link to both of those in the show notes.

ADAM GLICK: Let's get to the news.

[MUSIC PLAYING]

Google Cloud made a number of announcements last week at their Next conference in the UK. Migrate for Anthos, Google Cloud's tool for automatically moving VMs into containers, is now generally available, as are Cloud Code, the free ID plug-in for writing Kubernetes apps and Visual Studio code or IntelliJ, and Apogee's hybrid API management offering.

Support for connecting any Kubernetes cluster to Anthos, first announced earlier this year, is now in public beta.

CRAIG BOX: Also at Next UK, GitLab announced serverless deployment to Cloud Run for Anthos. This feature gives users the ability to have GitLab provide their CI/CD pipeline, and then deploy serverless directly to their Cloud Run for Anthos environment running on GCP, with on-premises support to follow soon.

Documentation, a sample app, and a walk-through are all provided on their site.

ADAM GLICK: VMware has announced Project Antrea, an integration to use the Open vSwitch data plane with Kubernetes to provide pod networking and help enforce network policies. Using CRDs, and custom controllers, Antrea is able to program the multi-platform Open vSwitch for faster networking with portability and operational improvements.

Control of Project Antrea is available as a kubectl extension, or as a GUI via plugin for the open source project Octant.

CRAIG BOX: IBM announced that Managed Istio is now generally available on IBM Cloud Kubernetes service. This lets you install and run Istio 1.4 on Kubernetes 1.16, including integration with IBM's logging and monitoring services out of the box.

IBM co-founded the Istio project with Google and Lyft, based on their experience in running service mesh internally. Product manager Ram Vennam says that the service uses the new Istio operator under the covers.

He told the Kubernetes Podcast that IBM, Google and RedHat put a lot of work into the new operator in a very short amount of time, and that he is excited to see upcoming improvements in the operator, including a single place for configuration, faster multi-mesh configuration, and more.

ADAM GLICK: IBM also announced two new open source projects, Kui and Iter8, with the number 8.

Kui is a meta-CLI tool designed to help give users one place to access their different CLI tools, like kubectl, Helm, and Istio control. Kui also provides visualizations to help developers view their data in a world of hybrid and multi-cloud Kubernetes deployments.

Iter8 is a tool to do comparative analytics across the Istio API. This can be used for canary testing, A/B testing, and troubleshooting, giving users the ability to detect and address problems with an application earlier in the process.

Additionally, IBM announced integration of the open source Tekton build tools into their cloud continuous delivery service. For more on Tekton, check out episode 47 with Kim Lewandowski.

IBM also announced that their Razee multi-cluster continuous delivery project now runs on OpenShift and in their Cloud DevOps service.

CRAIG BOX: Solo.io has announced Autopilot, an open source project to make changing your service mesh configurations automatic, to avoid operator toil and mistakes.

Solo is coining what they call an adaptive mesh as a term to describe an automated service mesh control system. Autopilot provides an opinionated way to control the service mesh via operators, GitOps, and the Service Mesh Interface.

ADAM GLICK: Cilium has announced Hubble, an open source, fully distributed networking and security observability platform for cloud native workloads. Hubble is built on top of Cilium and eBPF, to enable deep visibility into communication and the behavior of services, as well as the network infrastructure, in a completely transparent manner.

This project allows you to visualize how services are connected, as well as providing network monitoring and alerting at layer 4 and layer 7, but stops short of describing itself as a service mesh. The tool also provides reporting and application failure rates, and security connection errors, due to network policy.

CRAIG BOX: ByteBuilders has announced Kubeform to provision Kubernetes resources with custom resource definitions in Terraform. Kubeform is an open source project built on Terraform, and supports cloud providers as well as Helm versions 2 and 3.

ByteBuilders say they started the project as the Kubernetes service catalog, and Open Service Broker APIs were not as useful to them as CRDs.

ADAM GLICK: CloudBees introduced GUI for Jenkins X last week. Jenkins X is the Kubernetes-focused version of the Jenkins CI/CD tool that otherwise requires command line control. The web UI is part of the free community distribution, though it's not clear if this is open source.

CRAIG BOX: Juniper Networks has announced updates to the Contrail Project to enable software-defined and orchestrated networking to pair with Kubernetes-orchestrated compute. Contrail has added support to encrypt traffic between Kubernetes nodes, joining their existing features of deep packet inspection and a web application firewall.

ADAM GLICK: The CNCF this week posted a case study of how Slack uses the newly graduated Vitess project. You can learn more about Vitess in this week's interview.

CRAIG BOX: Kubernetes has become the standard deployment tool for GitHub. But sometimes, their applications don't perform as expected, especially with regards to network latency.

Senior production engineer Theo Julienne tells the story of debugging one such problem in a blog post this week. After working down through the layers of network encapsulation, the problem became apparent. It's an easy and accessible explanation of a complicated problem, but well worth the read. So I won't spoil the ending.

ADAM GLICK: Volterra has started a blog series talking about their decision to build their own multi-cluster control plane. The first post calls out why they decided to build one. The other control planes they evaluated were not yet feature-rich enough to provide for all of their current needs. They called out RBAC, security, and PaaS needs amongst the things they felt weren't currently available.

In general, this post shows the challenges that organizations face when adopting new technologies and deciding if they will wait for the community to catch up to their needs, or if they will create their own tooling and thus accept the technical debt that can result when the community eventually provides those offerings.

CRAIG BOX: Startup Gravitational closed a $25 million Series A round this week, adding to their $6 million in previous funding, to help bring their tools from multiple environment Kubernetes app deployment to more users. Their tools allow people to build on top of upstream Kubernetes, packaging an entire application, including things like Helm charts and settings, into a single file. This package can then be deployed wherever the user wants, in cloud, or on-prem.

ADAM GLICK: As we reported last week, Datadog has released their 2019 report on container data. Some of the interesting findings in that report include that Google Cloud continues to lead the major cloud vendors in terms of Kubernetes adoption, while Azure is showing the greatest percentage growth. Most customers on AWS still prefer to roll their own Kubernetes instead of using EKS.

Orchestrated containers churn at about twice the rate of un-orchestrated containers, and Nginx and Node.js are both leading technologies in containerized environments.

CRAIG BOX: Last week, container security company Aqua Security announced they are expanding into cloud security posture management with the acquisition of CloudSploit, an open source and hosted security and configuration monitoring tool for cloud environments. Details of the deal were not disclosed.

ADAM GLICK: Finally, congratulations to our recent guests, Lachlan Evenson and Katharine Berry, who are the 2019 Top Ambassador and winner of the Chop Wood Carry Water awards, respectively. Frederic Branczyk won the Top Contributor award for his work on Prometheus and Kubernetes. Congratulations to all the award winners.

CRAIG BOX: And that's the news.

[MUSIC PLAYING]

Sugu Sougoumarane is the co-founder and CTO of PlanetScale, and one of the co-creators of Vitess.

Jiten Vaidya is a co-founder and CEO of PlanetScale. Jiten managed the DBA and SRE teams at YouTube that operated Vitess. Welcome to the show. Sugu.

SUGU SOUGOUMARANE: Hi.

CRAIG BOX: And Jiten.

JITEN VAIDYA: Hello.

ADAM GLICK: Vitess is the latest project that has graduated from the CNCF. First of all, congratulations to both of you and to your teams.

SUGU SOUGOUMARANE: Thank you.

JITEN VAIDYA: Thanks.

ADAM GLICK: Let's pretend that we were in an elevator together. What's the short version of what is Vitess?

JITEN VAIDYA: Vitess is basically a sharding middleware program that gives your applications the view that it's talking to one humongous database, when actually it is talking to multiple shards. And this allows you to horizontally scale your relational databases the way you typically would associate with a key value store. So you get to keep all the relational goodies like transactions and secondary indexes, and so on. But it allows you to really scale your databases without any limits at all.

Also, it allows you to run your databases in Kubernetes as well, which is an important property these days.

ADAM GLICK: When you say your databases, is it any database? Or is Vitess both the front end that does the sharding as well as the database that sits behind it?

JITEN VAIDYA: Vitess is both. Your experience is basically as a single database system. But Vitess sits on top of MySQL, and so your application is talking the MySQL binary protocol. It also supports gRPC if you are interested in using gRPC clients. But mostly, we recommend that people use the MySQL binary protocol.

ADAM GLICK: Would it be safe to say that it is a horizontally-scalable MySQL-compatible database?

JITEN VAIDYA: That's absolutely right. I would also add the two words "cloud native" to it, because it runs very well in Kubernetes.

CRAIG BOX: Vitess was created at YouTube. Let's go back and have a look at the technology at the time.

So first of all, YouTube was an acquisition by Google. I assume that it wasn't running on Google's native database stack, like some of the rest of Google may have been.

SUGU SOUGOUMARANE: Exactly. Yeah. We were running on our own data centers at that time.

CRAIG BOX: Traditionally, you will scale a MySQL instance by sharding or using replicas. Did you do that? And why did you need to do something different?

SUGU SOUGOUMARANE: We actually started off sharding Vitess in the application. So YouTube was actually already sharded before Vitess was born. And that is when we started seeing the huge limitations that the approach of using application-level sharding were posing to us. And we had to start thinking about how to keep scaling, because once you shard with the application, scaling beyond that is a real challenge.

JITEN VAIDYA: Feature velocity suffers when you have to carry the burden of the sharding logic in the application. And we were experiencing that firsthand at YouTube.

CRAIG BOX: Quite often, people will consider resharding a database as "a take everything offline, do some work, bring it back" kind of thing. And I assume that that's not something that was going to work at the kind of scale you were operating at?

SUGU SOUGOUMARANE: Exactly. And the other problem was, having sharded it, we knew that we were going to run into our limits very soon. There were also other problems that were just related to managing a large number of MySQL instances. And we needed a way to automate all that management. So that, combined with the ability to manage the sharding, is what gave rise to Vitess.

CRAIG BOX: What year was this?

SUGU SOUGOUMARANE: This was 2010.

CRAIG BOX: Did you just take the MySQL source and look to see how you could make it solve these problems? Or did you look to build a utility from the beginning?

SUGU SOUGOUMARANE: It was actually a utility. The one thing that we realized was, changing the MySQL source code is challenging, because there are versions of MySQL you have to keep up with. There are security bug fixes that we have to adopt.

So we decided not to change MySQL at all, and instead, solve every problem possible with a layer in the middle, which is why we sometimes call ourselves a middleware.

CRAIG BOX: Or a sidecar, perhaps.

SUGU SOUGOUMARANE: Or sidecar, yes. Both.

JITEN VAIDYA: And an interesting point that I think, Sugu, you should talk about, is the choice of Go as the language for developing Vitess.

SUGU SOUGOUMARANE: Oh, yes. Yes. In 2010, Go had just been announced. Even 1.0 wasn't out. At that time, we were looking for a language to implement this layer. We had the choice of C++, Python, Java. And Go was kind owqw aqaf beginning to be talked about at Google. And we chose Go just based on who the authors were. [CHUCKLING] We looked at Rob Pike, we looked at Russ Cox and Ian Taylor, and--

CRAIG BOX: Ken Thompson, obviously.

SUGU SOUGOUMARANE: Ken Thompson and Robert Griesemer. It felt like they were really solid people who knew what they were doing. And we really felt that they were going to take this to the end.

So we said, we are going to trust them, and we are going to commit ourselves to writing this in Go.

CRAIG BOX: 10 years later, you think that's a bet that paid off?

SUGU SOUGOUMARANE: It's one of the best decisions we have ever made.

ADAM GLICK: So this originally started as an internal project within YouTube in 2010. It's now an open source project. When did that happen? And how did you make that decision?

JITEN VAIDYA: I think it has been an open source project from fairly early on. I mean, I remember the discussions inside. And what we were solving was a scaling problem. And that was sort of independent of YouTube.

So from early on, we decided that, let's have the code in a GitHub repo, which is in public domain, and then build and deploy in YouTube data centers.

And early on in 2010, we were still in YouTube data centers. So we didn't have to integrate with Chubby, Stubby, and other Google internal services. So in the beginning, it was more straightforward.

SUGU SOUGOUMARANE: Yeah. Yeah. In 2010, GitHub wasn't popular, actually. So we deployed it in code.google.com, which was--

JITEN VAIDYA: That's right.

SUGU SOUGOUMARANE: The recommended way that Google recommended.

CRAIG BOX: Like SourceForge, but better.

SUGU SOUGOUMARANE: Yes. [LAUGHTER]

So, yeah. Initially, it was pretty straightforward because all we had to do was take some config files as input and run. But in 2013, we were required to move all our data within Google.

At that time, everything changed about -- the entire identity of Vitess as a project changed, of what it was then and what it became later.

ADAM GLICK: So it went open source in 2012, if I remember correctly.

SUGU SOUGOUMARANE: Until then, what we actually did was, we never submitted that code into the Google Perforce repository. We were actually running our code within one of our home directories as a Mercurial repository.

So the intent was to open source from the beginning, which is why we thought if we submitted it into Google, it would be a challenge. We didn't know much about how Google operated. We were like--

CRAIG BOX: It was a little bit of a skunk-works project?

SUGU SOUGOUMARANE: [CHUCKLING] Yeah. And somebody said, you're running this entire thing from your home directory?

We used to deploy it into YouTube, built off of one of our home directories. And then we said, OK, we need to be more formal about this. And that's when we open sourced it first.

ADAM GLICK: If I look at that timeline, that puts you well ahead of when Kubernetes was even created within Google, let alone made an open source project externally.

SUGU SOUGOUMARANE: Exactly.

ADAM GLICK: So what were you running it on at that point?

SUGU SOUGOUMARANE: Until 2013, we were running it on YouTube's on-prem data centers. In 2013, there was a mandate that all data need to be moved inside Google data centers.

And that's when we started this migration project. And we ran into a very interesting scenario, which is, you look at Google's Borg. Most engineers wrote applications that were stateless, and for storing state, there were well-defined APIs for you. And you had to choose one.

You say, I want to use Bigtable. I want to use Colossus. But there was nothing for MySQL. And we were running on MySQL.

And there was no mounted block storage, even. So the only option that we had was to run Vitess with MySQL on local disk, which in Borg, is considered ephemeral, which means that if Borg rescheduled you, it would just reclaim all the data. It's not accessible for you to use.

ADAM GLICK: Borg is the predecessor to Kubernetes, as we think about it--

CRAIG BOX: Spiritually.

SUGU SOUGOUMARANE: Yes.

ADAM GLICK: Spiritual predecessor.

JITEN VAIDYA: We call it the blueprint. [CHUCKLING]

ADAM GLICK: Yeah. Was that how you started to think about this becoming part of the cloud native world and part of the Kubernetes ecosystem?

SUGU SOUGOUMARANE: Yeah. We essentially stumbled upon it. Because we got Vitess running in Borg, as a stateless application. Essentially, Borg did not know that it was running a storage when it ran Vitess. It just thought it was running something like a front end.

And when Kubernetes came out, because it was based on Borg, it had the same properties. Kubernetes initially was tuned to mainly run stateless applications. But because we were able to run Vitess on Borg in that mode, we actually announced that we were ready for Kubernetes before even 1.0 came out.

JITEN VAIDYA: And so Kit Merker and Anthony Yeh wrote this blog post. I think it's 2015-- which talked about running databases in Kubernetes.

SUGU SOUGOUMARANE: It's a blog from the Google Cloud Platform. So which is a proud moment for us.

JITEN VAIDYA: And I think one detail that I would like to add about how you can actually run stateful databases for a large-scale system like YouTube on Borg, where the disk is ephemeral. So the way we did it is by enabling semisync replication in MySQL, where the durability was basically guaranteed because in semisync replication mode, the master does not acknowledge that a commit has succeeded until all the data associated with that transaction is in the relay logs of one of the replicas.

CRAIG BOX: Vitess is widely used on Kubernetes, even though at the time of its invention, Kubernetes didn't exist. And even at the time of its release, Kubernetes wasn't the target platform that you wanted to run it on yourself. Was it targeted to run on Kubernetes from the very beginning?

JITEN VAIDYA: Vitess acquired the properties of a cloud native database service starting 2013, because of all the work that we did to have it work well inside Borg. And the fact that it ran well in Kubernetes was a side effect.

SUGU SOUGOUMARANE: And actually, even though we did not run Vitess in Kubernetes, it was strange that a few people caught onto that message and essentially asked us, like, "do you think we can run this on Kubernetes?"

And we didn't know any better. It's only much later that people started cautioning people about running storage in Kubernetes. Because we had the confidence that we could run it on Borg, we told them, yeah, sure. Go for it. And people did launch it. And touch wood, the oldest instance that runs Vitess on Kubernetes was launched in 2016. So it's about over three years old now.

JITEN VAIDYA: Stitch Labs, right?

SUGU SOUGOUMARANE: Stitch Labs. They were the first ones.

JITEN VAIDYA: Stitch Labs, HubSpot, JD.com - I mean, these people bet their companies on databases running in Kubernetes on Vitess.

CRAIG BOX: When you have a MySQL database and then you split it into two, we now need a way of addressing those two things logically, as if they are one. And I believe that's called a key space in Vitess.

JITEN VAIDYA: That's correct. So a logical database in Vitess is called a key space. And if your key space has a single shard, then it's identical to a MySQL database. But otherwise, there are multiple clusters spanning the whole key space.

CRAIG BOX: For someone who has experience with MySQL as running a single instance of the database, moving to a sharded model where you have portions of the data in different places, as one set of mental leaps that they have to make, and then also using something like Vitess that does different things, is a different set of leaps.

How would you describe the process of moving from single instance on single machine through those scaling models?

SUGU SOUGOUMARANE: The first thing that we recommend people to look at is look at their transactions, and what rows they are touching, each of their transactions.

And we also make them look at what kind of joins they are doing between their tables. And those are the strong factors that indicate how you want to group your data.

And the third factor is, what's the highest QPS that you have on your tables.

CRAIG BOX: If we have a set of tables, would you traditionally take different tables and put them in different databases? Or would you say, the tables that are grouped together, and you join against them, should be grouped together and we should take this big table, get split into 10, and now we have 10 different replicas that include a piece each of that data.

SUGU SOUGOUMARANE: Absolutely. That's how we recommend you shard your data.

CRAIG BOX: The second one.

SUGU SOUGOUMARANE: Yeah.

JITEN VAIDYA: The way to do it is that you look at your tables. And typically, what ends up happening is that you have a few tables with tens of millions of rows. And then you have a few tables which typically are reference tables, which have 1,000 or 10,000 rows, which you use to do joins, right?

So you asked the question that if, how do I start thinking about a sharded database and migration to Vitess, starting from a monolithic database? Right?

So in a monolithic database, all these tables end up being in the same schema. The first step that you would take is to identify these reference tables. And Vitess allows you to do table migration while continuing to serve your data, which we just move these reference tables into a separate key space.

Application doesn't need to know about it. We do the internally queries across two different databases to give you the same results. But now that these reference tables have moved, now your large key space only contains tables that have millions or tens of millions of rows, which now you can start thinking about how to shard best. And then you horizontally shard those tables.

ADAM GLICK: So if I were to give an example, let's say you might have a user database. And when you can have a single monolithic database, it has all of that at a certain point. You get enough users that you don't want them all in one table. And so you might say, split into two. Thus, two shards.

JITEN VAIDYA: Correct.

ADAM GLICK: And one might be, at least in the English language, A through M, and the other one might be N through Z users, for instance.

JITEN VAIDYA: That's precisely right. So let's say that you have 200 million users. 0 to 100 in one shard, 100 to 200 in the second shard.

ADAM GLICK: As you grow, so as YouTube grew, and as people who are using Vitess grow, does it automatically re-shard in order to handle the growth and scale of users or data in the system?

JITEN VAIDYA: Typically, the way this process worked at YouTube, was that we would have a capacity planning meeting every month. And we would look at the rate at which the databases are growing, and our current size, or amount of infrastructure and so on, and decide whether we need to re-shard specific key spaces.

And that's what we recommend people do also. So you decide that you want to shard a particular database. If you have chosen a sharding key which has high cardinality, most of your shards are about the same size, in which case, we recommend that you just split all shards.

But if you are sharded, say, using a vendor ID or something like that, and you have a hot shard--

ADAM GLICK: Hotspots.

JITEN VAIDYA: Yeah, exactly. Then you just split those shards. So the decision to shard is made by the operator. But all of the actual workflow of being able to do it while continuing to serve traffic at really high rates is taken care of by Vitess.

SUGU SOUGOUMARANE: So there's two things to take into consideration. One is automatic sharding is a feature that people have asked us for. But there are also people who do not specifically want it. The reason is because if there is an accidental overload, and the system decides to re-shard you, it's something that people don't want to happen automatically. So they'd rather control that.

And on the other side is, every time you re-shard, at least at YouTube, we essentially doubled the capacity. So at some point of time, re-sharding leaps ahead of the growth rate at which any company would grow. So it is not something that is going to be a burden on you forever. At some point of time, you are going to be so far ahead in terms of capacity of what you need to serve that it becomes a no-stress situation.

ADAM GLICK: What would you consider the consistency model here? Is this an eventually consistent model?

JITEN VAIDYA: We think that the right layer at which to decide whether you want read-after-write consistency or whether you want eventual consistency is in the application.

So we allow you to make the choice about whether if you want read-after-write consistency, you just give the name of the key spaces, the name of the database. If you want reads from replica, where eventual consistency is OK, you say database at replica. And then we load balance your reads across replicas.

And the one example that I give is that if you are making changes to your own profile, you want read-after-write consistency. But with other parts of the app, where I am browsing your profile, it's fine if it's a few milliseconds or seconds delayed. And then you can use the eventually consistent reads.

ADAM GLICK: The classic examples would be something like a social feed can be eventually consistent, because you don't need it. But things like doing a transaction for a purchase, you want to make sure are strongly consistent.

JITEN VAIDYA: Precisely.

ADAM GLICK: How do you handle the fact that if you're going to be strongly consistent in something that's distributed, then latency does become a part of that system? And so if you're all located in one location, that's OK. But if you want, for availability, to be spread out, then you have to think about where those replicas are and what the latency is between them. How is that handled?

JITEN VAIDYA: The largest key space that we had in YouTube was 256 shards. That means 256 masters. And YouTube traffic is very read-heavy. So each master had between 80 to 100 replicas distributed across 20 data centers all around the world.

And what Vitess allows you to do is, the vtgate layer, which is the stateless proxy, can load balance reads across the replicas in that data center.

So all of our masters ended up being in Mountain View. I mean, that's how we had architected it. So all the writes, you went across the Pacific. In my mind, I'm thinking about Japan.

But all of your reads were local reads. So all of that, because we needed to run at scale, was built into Vitess as like first-class infrastructure.

CRAIG BOX: You gave a talk at the recent KubeCon about jurisdiction-aware clusters using Vitess. You're able, I guess, to configure this in such a way to support globally-replicated workloads, and have people use the most local replica to them for their reads, for example?

JITEN VAIDYA: Exactly. One of the coolest things about Vitess is the ability for you to have a lot of control on how to co-locate your data. And the way we do it is that for every table that gets sharded, you get to choose the sharding column, and the sharding function based on the type of that column.

And you can also write your custom sharding functions. And for data locality, we have written a custom sharding index. We call it 'vindex'. That allows you to locate data in different jurisdictions without the application having to be aware of it. I think that's a pretty cool way to do it.

CRAIG BOX: How does Vitess support cross-shard transactions?

SUGU SOUGOUMARANE: We actually have about three modes. The mode that we recommend that you run the most is group your data such a way that most transactions are within one shard. That's the best way to scale a system indefinitely. So basically, the more loosely coupled your components are, the higher you can scale. And Vitess actually has a very powerful way of specifying your configuration in such a way that you can group the data that is supposed to be together in the same shard. And that is actually the most common configuration that people run on.

There are people who use cross-shard transactions in a way where they sequence their commits in such a way that if a particular commit fails, then the application can tolerate it. And then there's the final feature of Vitess, which is to provide two-phase commit, in case you really, really want consistency, and you want transparent cross-shard transactions where the application doesn't want to manage this. Then you can use two-phase commit protocol.

ADAM GLICK: As Vitess has grown and graduated, one of the criteria is the number of people using it and how big the community is that is involved with the project. Are people using Vitess the way that you expected to? Are you seeing them start to use it in ways that you would never expected them to? What have you seen?

JITEN VAIDYA: One cool way that people are using it in a way that I had not expected to see is what Nozzle is doing. They are a startup. They had a bunch of credits from one cloud provider. And they were using their application servers and their databases in Kubernetes using Vitess.

And basically, this has allowed them to think of their cloud provider as a commodity. And when somebody else provided them more credits, they were able to migrate tens of terabytes of data in less than 60 minutes.

CRAIG BOX: As a cloud provider, we hate it when people do that.

[LAUGHTER]

SUGU SOUGOUMARANE: I can reveal the names. I think I reveal it in the keynote. They actually migrated from Azure AKS to Google's GKE. So they got a better deal from Google. So they moved there.

ADAM GLICK: Does any version of Vitess care which version of MySQL is sitting under the covers, so to speak?

JITEN VAIDYA: We specify that you should be using MySQL 5.7 with GTIDs turned on. But that's for the MySQL version. But it can be MySQL Community, MySQL Enterprise. It could be Maria. It could be Percona. It could even be hosted MySQL systems like RDS or Google Cloud SQL.

We need the GTID because the re-sharding workflows depend on that.

CRAIG BOX: If a new version of MySQL came out, would you need to update Vitess to support new features in it?

JITEN VAIDYA: Very minimally. So when MySQL 8.0 came out, probably 20 lines of code changed.

SUGU SOUGOUMARANE: Yeah. We had to update a few configuration files. And MySQL 8.0 introduced some new SQL syntax. So we had to accommodate that. So those are the changes we had to make.

JITEN VAIDYA: Right.

CRAIG BOX: Do you have any plans to support Postgres in Vitess?

JITEN VAIDYA: There is nothing in Vitess architecture that would stop us from supporting Postgres, because it's a true middleware.

If you think about how we would go about supporting something like Postgres, it can be thought of in two ways. First is, just query routing at the vtgate layer. And to support Postgres, we would need to extend our parser support for the Postgres-specific SQL syntax.

So, yeah. Postgres-specific SQL syntax we will need to change our parser. And once we do that, an already sharded database can be abstracted by vtgate.

The second part, which is the more complicated one, is the re-sharding workflows. And to do that, we need to start understanding the Postgres write-ahead log that they use for replication. Because parsing the replication logs is how Vitess does live re-sharding.

And that's the more complicated part. I once sat down and tried to figure out how long it would take. I think that it's like 9 to 12 man-months worth of work. And if there are people out there who really want to contribute to Vitess and get this work going, we would love to support them.

SUGU SOUGOUMARANE: Yeah. Short answer is, we would love to make this work for Postgres.

JITEN VAIDYA: Yeah.

ADAM GLICK: In 2018, you both became founders of PlanetScale.

JITEN VAIDYA: That's right.

ADAM GLICK: What is PlanetScale and what's its relationship to Vitess?

JITEN VAIDYA: Basically, by 2017, we had started receiving queries about whether there is a commercial entity that people could pay to get Vitess support. And that was sort of the genesis of PlanetScale. And indeed, PlanetScale's goal is to make it as easy for people to use Vitess as possible, right?

But what we are building at PlanetScale is a database as a service which is cloud provider agnostic, and which is built on top of Kubernetes. So if you are just using a database as a service, it doesn't really matter to you whether it's built on top of Kubernetes or not.

But what we are planning to also do is allow this new feature that we call internally BYOK, which stands for Bring Your Own Kubernetes. And what that would entail is that you would give us access to your Kubernetes cluster, you would use our control plane to define your Vitess clusters, and when you say deploy, the actual resources would get deployed in Kubernetes.

And the bet that we are making is that as people move their stateless workloads to Kubernetes, they would want to have their databases running alongside, adjacent, in the same database Kubernetes clusters.

CRAIG BOX: Do you think that bet goes against the cloud providers who are offering hosted SQL services next door to Kubernetes environments?

JITEN VAIDYA: We believe so. I mean, we think that, to be able to do that, basically, you're dealing with two different ways of doing things. And I think that this unification helps.

SUGU SOUGOUMARANE: Yeah, the counterpoint I heard from a large number of people is, cloud providers are more concerned about making their hardware get used, whether they use a hosted service like Cloud SQL or RDS is mainly a way of attracting customers.

So as long as they are getting customers, they don't feel that threatened by somebody making it work in their platform. As long as they make it work in their platform, they are extremely happy about it.

JITEN VAIDYA: But also, I mean, as a user, the data has gravity, right? You don't want to be locked into a single cloud provider. And so if you're running on top of Kubernetes, it becomes that much, at least theoretically easier for you to not be locked in.

CRAIG BOX: Does Vitess make it easier for me to move data between different instances or providers?

JITEN VAIDYA: Yes, it does. In fact, the multi-cloud database as a service that PlanetScale is offering-- we want to get to a point, and it will happen in the next six weeks or so, is that you could actually create true multi-cloud clusters where your masters are running in GCP and replicas in AWS, or vice versa. So yes.

ADAM GLICK: Does your service currently run in multiple cloud providers?

JITEN VAIDYA: It currently runs in GCP and AWS, and Azure to follow.

ADAM GLICK: Vitess wasn't the original name of the project. What was the original name?

SUGU SOUGOUMARANE: [CHUCKLING] Vitess was named Voltron. The idea was that we are building multiple pieces that come together to form this awesome entity.

And it was a cool name. As a matter of fact, if you go inside YouTube's internal sites, there is actually a picture of Voltron on crutches, which was our view of Vitess, which is basically, it's still being constructed.

[LAUGHTER]

ADAM GLICK: And the important part. Cars, or cats? There were two Voltrons, the one that was made up of 30 cars, and the one that was made up of five lions, or cats.

SUGU SOUGOUMARANE: The five lion one. Yes, the five lion-- the cats one.

JITEN VAIDYA: It was the lions. The cats.

[INTERPOSING VOICES]

SUGU SOUGOUMARANE: So what happened was, we actually started naming our files with VT, because it meant for Voltron. When it came time to open source, we go to the open source office of Google and say, you can't use Voltron for your project name. [LAUGHING]

CRAIG BOX: Codename.

ADAM GLICK: Someone else stole that name.

SUGU SOUGOUMARANE: So we had to find something that had V and T in it. So that's one part of the story.

The other part is, some people may remember Strong Bad. So there's one skit of Strong Bad where he says, the way to invent a cool name is to take a normal name and completely misspell it. Like 'limozeen', with a Z and stuff.

[LAUGHTER]

So we thought, OK. We need VT. We need a name that we need to misspell. And there was another criteria, which is, if you did a Google search, there should not be too many hits.

So using these three, we came with Vitess, which meant speed in French, but if you drop the E, there were actually not that many hits.

So we said, VT, all criteria are met. We are going with this.

CRAIG BOX: Well done. You heard it here first.

ADAM GLICK: Thanks Jiten, Sugu. It was great having you on the show.

JITEN VAIDYA: Thank you very much. Thank you for having us. It was a pleasure chatting with you guys.

SUGU SOUGOUMARANE: Thank you.

ADAM GLICK: You can find PlanetScale at planetscale.com, and Vitess at vitess.io. Jiten is on Twitter at @yaempiricist, and Sugu is on Twitter at @ssougou.

CRAIG BOX: And you can find all those links in our show notes.

[MUSIC PLAYING]

ADAM GLICK: Thanks for listening. As always, if you've enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter at KubernetesPod, or reach us by email at kubernetespodcast@google.com.

CRAIG BOX: You can also check out our website at kubernetespodcast.com, where you will find transcripts and show notes. Until next time, take care.

ADAM GLICK: Catch you next week.

[MUSIC PLAYING]

View More Episodes