Kubernetes Podcast from Google: Episode 95

#95 March 17, 2020

etcd, with Xiang Li

Hosts: Craig Box, Adam Glick

If you’re running Kubernetes, you’re running etcd. The distributed key-value store was started as an intern project at CoreOS by Xiang Li, who is still maintaining it but now working on infrastructure at Alibaba. Xiang joins your hosts to discuss.

Do you have something cool to share? Some questions? Let us know:

ADAM GLICK: Hi, and welcome to the Kubernetes Podcast from Google. I'm Adam Glick.

CRAIG BOX: And I'm Craig Box.

[MUSIC PLAYING]

CRAIG BOX: Remember when all we had to worry about was if they were going to cancel KubeCon or not?

ADAM GLICK: Yes, those wonderful days just weeks ago, right? Are you staying safe?

CRAIG BOX: Yes, I've obviously had to adjust my schedule. I wasn't able to do my usual "fly into Seattle, record with you in person, and fly back" weekly routine.

ADAM GLICK: [CHUCKLES] Yes. It's a shame the Concorde isn't here anymore.

CRAIG BOX: I have to take a moment to talk about an interaction I had earlier today. I've been going out to the supermarket the last few days. They've been out of everything, but especially toilet paper, which is the new currency in the post-apocalyptic world we now live in.

But there was a gentlemen at the end of my street today as I was going out on my bike, and he was carrying a roll-- a roll of that magic stuff. And I did the thing which you don't normally do in Britain, which is make conversation with a stranger. And I asked him, where did you get that? And so we looked at each other furtively, and he said, oh, the newsagent down the road, they've got some.

And it turns out that if you go to those little shops that people normally look over in their goal to go and buy the nine-pack from the supermarket, whatever, you can just pop down the little corner shop, and they've got everything you need.

ADAM GLICK: There's your tip of the week right there.

CRAIG BOX: I can't guarantee they'll have it for long, but if anyone's looking for it, I can point you at a shop in my neighborhood.

ADAM GLICK: [LAUGHS] Good enough. My tip of the week for folks is that-- I enjoy a fair amount of electronic music, as I know you do, as well, Craig. And there's a tool out there-- a couple of tools, actually-- that the manufacturers have made free for people just this week. So if you're listening this week and you want to go get them--

CRAIG BOX: Very kind of them.

ADAM GLICK: --you can check it out. One of them is Mini Moog Model D, available on iOS, which is a little Mini Moog machine. Normally it's a few bucks, and they're offering that for free.

And then there's Kaossilator-- or, on iOS, iKaossilator-- from Korg. And that's normally a $20 app. I actually purchased that app and have used it for a couple of years and really enjoy it. It's a fun little loop creator, but that one's available free.

So if you like just kind of mixing up loops and beats, it's fun and easy to use. It's a touch interface to just kind of-- you move your finger around to make different sounds. I'd check it out.

A fun little note on that is, when we did our initial preview of this podcast, when we were starting it out, the music was actually a little different than the music you hear each week. And that music was created on a Kaossilator.

CRAIG BOX: Wow.

ADAM GLICK: And if you want to go hacking, you can hack and find that one. It is still out there, but you may not find it in the feed. You'll have to do a little bit of haxx0ring.

CRAIG BOX: I enjoy a YouTube channel, which I'll link in the show notes. But basically, it's a person who is using GarageBand-- or "Gar-aj Band," on the iPhone, and who is sequencing songs. So they'll take the drum loops, and the guitar, and so on, and they'll gradually build up to a song like you would have heard in the studio. It's well worth checking out if you like the idea and all this electronic music gadgetry that Adam has shown you, but you'd rather watch someone with a bit of skill do it rather than mess it it up yourself.

ADAM GLICK: Shall we get to the news?

CRAIG BOX: Let's get to the news.

[MUSIC PLAYING]

ADAM GLICK: Last week, VMWare announced vSphere 7, the first version of their hypervisor that can also manage Kubernetes clusters. The announcement comes as the company appears to be pivoting their cloud-native work all under the Tanzu brand. VMware Tanzu is being promoted as their application modernization platform, with three new products joining three that have been rebranded.

The new Kubernetes Grid, a distribution of Kubernetes, provides a common way to provision clusters across environments. Mission Control was released in preview last August and is VMWare's management experience for Kubernetes clusters, while App Catalog which was previously named Project Galleon, brings the open source container images work of their Bitnami acquisition to customers.

A further three products have been rebranded as part of this announcement-- Pivotal Application Service is now called Tanzu Application Service, Wavefront will now be called Tanzu Observability by Wavefront, and NSX Service Mesh, which is powered by Istio, has become Tanzu Service Mesh.

CRAIG BOX: The new slimmed-down Docker gave us a first look at its intentions last week. VP of Products Justin Graham shared their goal to focus on developer experience through Docker Desktop and add new features to Docker Hub to make it a place to manage all application components. The new Docker will be available as a monthly subscription. Along with these announcements is a new roadmap by way of a GitHub project board, which Graham invites everyone to contribute ideas and votes to.

ADAM GLICK: Many startups these days have a second life after closing. Containership was one of the first companies in the cloud-native space to close, with a blog post in September 2019 announcing their end of operations. This week, Containership's assets have been acquired by storage vendor Hitachi Vantara. The new owners promised to integrate the Containership Kubernetes Engine into their platform and will announce more over the next quarters.

CRAIG BOX: In another second life story, two weeks ago, we brought you the news that the team behind Finnish company Kontena was hired by Mirantis. That acquisition didn't include Kontena's IP, and the original team has regrouped on nights and weekends to keep it alive. The team are now operating as Lakend Labs, after their Finnish home. They have taken Kontena Lens, previously a commercial product, and released the source code on GitHub. Users are still required to accept an end-user license agreement to run the packaged binaries, but the source code is MIT-licensed.

ADAM GLICK: Two new projects have joined the CNCF sandbox. The first is KEDA, or Kubernetes-based Event-Driven Autoscaling, a service to expose metrics to the horizontal pod autoscaler to allow scaling based on events. The second is the Service Mesh Interface, a lowest common denominator API for programming service mesh functionality.

CRAIG BOX: Amazon customers have been looking with envy at purpose-built Linux distributions like CoreOS Container Linux or Google's Container Optimized OS. Now, the EKS team has joined the party with their own container-focused Linux distribution called Bottle Rocket. Like the aforementioned systems, Bottle Rocket removes the concept of package management in favor of image-based updates and container application deployments. Bottle Rocket has been released as a developer preview on EKS, and the code has been released on GitHub.

ADAM GLICK: In other news from AWS, their proprietary App Mesh now supports using AWS Certificate Manager or customer-provided certificates to provide TLS between services. Amazon also announced that EKS now supports Kubernetes 1.15. We hope there is another update coming soon, as 1.15 will fall out of support later this month.

CRAIG BOX: Two Twitter threads have talked about the pros and cons of modern sandboxing technologies in the Kubernetes context. Micah Hausler from AWS talked about the Firecracker VM. As it virtualize an entire computer, it doesn't suit the container runtime interface in Kubernetes, and changes to the Kublr would be required to make full use of it. Ian Lewis from Google Cloud replied to contrast the gVisor approach, where you virtualize just an operating system, and how this model handles storage in a more Kubernetes-native fashion.

ADAM GLICK: Kublr, with a K and no E, has released version 1.16 of their cluster management tools. The most notable feature of the release was the addition of rolling updates capabilities, which are designed to ensure updates with zero downtime.

CRAIG BOX: Two security-related changes at Google Cloud this week. GKE supports management of SSL certificates automatically via its Ingress Controller. The TLS provider for these has changed from LetsEncrypt to Google Trust Services, their own internal certificate authority. Next, Workload Identity, the GKE feature which securely maps Kubernetes service accounts to IAM identities, is now generally available.

ADAM GLICK: Are you challenged with how to manage Redis? Would you like to learn from others who have walked the path already? A post from Flant engineer Vasily Marmer is probably destined for your reading list. Vasily talks about the challenges of a number of different Redis management operators and about how he decided on RMA as his management tool of choice. It's a fairly in-depth write up, so if you're looking at the Redis Enterprise Operator, or other operators from the Amadeus IT Group or Spotahome this might be a good way to get a user's perspective on each.

CRAIG BOX: Speaking of operators, Bank-Vaults, the Hashicorp vault operator built by Banzai Cloud, has hit 1.0. This release adds support for hardware security modules, or HSMs. It has been tested with HSM software and physical USB devices, though anything supporting the PKCS number 11 standard should work. You can also now purchase support for Bank-Vaults from Banzai Cloud separately from their pipeline platform.

ADAM GLICK: As they have done for many years, the CNCF have joined Google Summer of Code, with more than 30 projects spread across 15 separate CNCF projects. Applications are being accepted starting on March 16 and need to be submitted by March 31. So if you're interested, you should sign up soon. The CNCF have also announced another Kubernetes case study with LifeMiles, a Latin American loyalty card program provider serving millions of customers.

CRAIG BOX: Finally Rancher Labs, whose CTO was our guest on episode 57, has announced the closing of a $40 million Series D funding round. The investment was led by Telstra Ventures and brings Rancher's total fundraising to $95 million. The money will go to growing their offering in the fleet management, telecom, and edge computing spaces.

ADAM GLICK: And that's the news.

[MUSIC PLAYING]

ADAM GLICK: Xiang Li is a senior staff software engineer and director at Alibaba Group, as well as the founder of and core contributor to the etcd project. Welcome to the show, Xiang.

XIANG LI: Hi, nice to be here.

ADAM GLICK: Today we're looking at etcd, which is a project at the core of Kubernetes and probably how a lot of people use etcd. To frame our discussion, how would you describe what etcd is?

XIANG LI: etcd is a disputed key-value store. You use it to store the most critical meta data for your system and use it to coordinate the critical components in your system. So the major focus of etcd are consistency and the partition tolerance. etcd also provides a set of high-level features like mini transactions, watch logging. We hope etcd can make it easier for developers to build a reliable disputed system in an unreliable world.

ADAM GLICK: You mentioned a lot of pieces that will impact today as we talk about etcd. It's not like there's a big focus on consistency. Is etcd something that people use directly within Kubernetes, or is it an underlying technology that helps the whole system run well?

XIANG LI: etcd is the underlying technology. It's used by a couple of infrastructure levels of systems. Developers typically don't really interact with etcd.

ADAM GLICK: Since etcd sits under the covers and developers don't interact with it, what is actually going on in etcd? What is it doing for Kubernetes?

XIANG LI: Kubernetes stores all its metadata inside etcd, including the ports metadata, including nodes metadata, and also including your deployment status and the replica site status. Basically, all the metadata inside Kubernetes are in etcd.

ADAM GLICK: So if we were to think of it as, it stores all the information about your cluster rather than about your application, is that a safe way to describe it?

XIANG LI: Correct. So it's cluster state.

ADAM GLICK: If we think about how this whole story starts, you were actually an intern at CoreOS way back when it was a YCombinator company, correct?

XIANG LI: Yes.

ADAM GLICK: How did you end up with an internship at CoreOS? What made you interested in that particular role, and how did you end up landing that?

XIANG LI: There is an interesting story. So I was a student at Carnegie Mellon University, and the tuition of a private school was pretty high. So my strategy was to select as many courses as I could, but then I had very little time to prepare for the actual internship interviews. For example, many people applying for Google, and then the campus recruiter of Google asked us to spend time to prepare for the interview questions or the chance of getting Google will be very low.

So then, I saw that, OK, I always wanted to build something from scratch. Why not just join a startup? So then I see the CoreOS job post. And there, two things really caught my eye. First, it mentioned that their business people can code and can probably code better than you. I wanted to figure out why. But then it turned out to be the other way around. Polvi, the engineer, was the CEO of the company.

ADAM GLICK: [LAUGHS]

XIANG LI: And second, they mentioned that they paid very horribly, and they really did. I had to live in a garage, converted studio, since I can barely pay for the rent in Palo Alto. At that time, though, I think few people will actually apply for the position anyway if they pay really low, and they will feel guilty to ask me to prepare for the interview questions.

But the interview was actually interesting. They asked me about, what happens when you push the power button of a computer? I talked them through the BIOS systems, the bootloader, the OS initialization, up to the init system. And then I got an offer a few days later and accepted the offer. That's how I get into CoreOS, the company.

ADAM GLICK: What do you think that they were looking for when they asked you what happens when you turn on the power to a computer?

XIANG LI: I think they wanted to understand whether I know the basics of computers and how deep I know about computers in general. And CoreOS is an operating system company, so they want me to be able to deal with the details of the kernel.

ADAM GLICK: Clearly, it worked out well, so congratulations. And obviously, CoreOS moved out of YCombinator and had a successful exit with their acquisition later on. So it all has worked out very well.

XIANG LI: Yeah, thank you.

ADAM GLICK: You talked about starting in a garage, kind of the proverbial Palo Alto garage. There's actually an early blog post from the CoreOS team where they show a picture of an actual garage. Was the company actually running out of a garage in that phase?

XIANG LI: Initially, we actually started in the house, not the garage. But we wanted to mimic the Silicon Valley culture, so we moved into the garage later on.

ADAM GLICK: [LAUGHS] So the garage was the upgrade, moving out of the house?

XIANG LI: Yes.

ADAM GLICK: Nice. When people think about CoreOS, a lot of people will think about the operating system that CoreOS built, and one that is narrowed down, and really looked at moving fast and the ability to run containers. Where does etcd fit into that, if you think about what the company was building and doing?

XIANG LI: CoreOS really focuses on security, and we want to enable updates, so if there is a security issue, people can update their CoreOS operating system accordingly. But updating the core operating system needs some coordination. For example, in your data center, if I update all the operating systems at the same time, all the machines will be rebooted, and your application will lose availability. So we want the auto-updates to be coordinated, and etcd plays a significant role there.

So when you want to restart your machines, the machine needs to grab a lock, and then it starts a reboot process. After it finishes reboot, it releases the lock so that the next machine can do it over again.

ADAM GLICK: This was really a technology that you built in order to help one of the core features of CoreOS, and that ability for it to not take every machine down within a company at once, but actually move its way through machines to increase reliability? Is that a fair statement?

XIANG LI: Yes.

ADAM GLICK: And so you needed a new database to do that. Is there any reason you didn't use one of the existing database technologies that was already available?

XIANG LI: Oh, yes, there are quite a few key-value storage systems like Redis, MemcacheDB, Riak. Back then, they focused on different scenarios. Some of them focused on end-to-end latency, some of them focused on scale-out abilities. But there are actually not that many key-value stores that focus on consistency.

Zookeeper and Doozer are two products with similar goals, but Zookeeper is kind of heavy for us because it relies on JVM. So our goal is to have a very small operating system that can be updated very easily, so we cannot afford to ship JVM on top of CoreOS. So we had to find something similar and lighter. So we looked at Doozer, but the product was not super stable at the time, and the community was inactive at that moment. So we started to think about building a new open source product and a strong community around it.

ADAM GLICK: So you looked at Doozer and Zookeeper, decided those weren't the right things out there. What about something like MySQL?

XIANG LI: MySQL, theoretically, you can use MySQL. But MySQL itself more focused on the single node scenario, more focused on the SQL features, how to process the data side. It doesn't really focus on the consistency or availability for a distributed system.

ADAM GLICK: I believe there is also a project in a paper called Chubby. Can you talk to me about the relationship between etcd, Doozer, and Chubby?

XIANG LI: Chubby is a logging service for a loosely cobbled distributed system. It is developed at Google internally, and later on, Google published a paper about it. So it got very popular in the distribution system world and inspired a couple of products. And I believe both Doozer and Zookeeper are somewhat influenced by Chubby, and then these two products influenced etcd.

Also, before I started the etcd project, I did read the Chubby paper and its sibling paper, "Paxos Made Live," which is also from Google. So I will say etcd is more similar to the Paxos DB described by the "Paxos Made Live" paper. Also, etcd adopted some of the testing methodologies from the "Paxos Made Live" paper. So etcd lock API is inspired by the schematics from the Chubby paper.

I also met Kurt, a Chubby developer at Google, a few times later on. He shared with me some licenses to learn about Chubby inside Google. I think it's super helpful. For example, he discussed with me how they scaled out the rate of throughput of Chubby. So we build the etcd with the only cache proxy in the open source world.

ADAM GLICK: You mentioned something called Paxos. What is Paxos?

XIANG LI: Paxos is a protocol for state machine replication in a synchronized environment where a crash can happen. Roughly speaking, I think there are two levels of Paxos. So original Paxos is not that complicated. However, it only deals with one round of agreements. If you think of a key value store, then you can have only one key if you only use traditional Paxos.

So to get an upgrade, you need Multi-Paxos. It enables multi-rounds of agreements so you can build a key-value store with unlimited number of keys. However, it is much more complicated. Also, there are many different ways to implement and to optimize Multi-Paxos since Paxos paper only describes the idea, but not the implementation.

ADAM GLICK: Paxos is a consensus algorithm that you use to make sure that your data was consistent in the event of a failure?

XIANG LI: Yes.

ADAM GLICK: But you looked at Paxos, but you didn't actually build on Paxos. You're actually used Raft. Can you explain to me the difference between the two and why you chose to build with Raft?

XIANG LI: Conceptually, I think Raft is more principled than Multi-Paxos. It clearly defines a set of terminologies and describes how one can implement its correctly. So I would say Raft focuses more on the better algorithm, to implement the better algorithm for performance. Also, Raft really activated the distributed system world, I would say, it made normal engineers feel empowered to work on consensus algorithms rather than being afraid of it. And now I can see much more public built distributed systems with Raft than before with Paxos.

ADAM GLICK: You built etcd, and then your internship ended. What happened both to you and to the project at that point?

XIANG LI: We launched etcd in August right before I left to school, and etcd was on the trending project of GitHub for a few days and reached a thousand stars soon. So I was so excited about the product, and I wanted to work on this product after I went back to school, so I had to choose less courses for the next semesters to maintain the product. And I hoped CoreOS would pay me for that, though, but they didn't.

And so after a couple of months, CoreOS hired Ben as a contractor, and then we hired Blake, the creator of Doozer, to work on the product with us. So it really contributed to the early versions of etcd a lot.

ADAM GLICK: Did that influence what you decided to do once you graduated? You graduate from Carnegie Mellon, and you went back to CoreOS. Did you go back to work on the etcd project? Was that the inspiration?

XIANG LI: Yes, I soon went back to work for CoreOS for full time, and I worked on etcd full time for about one or two years.

ADAM GLICK: etcd has gone through a number of evolutions. There was version 0, version 2, version 3. What are the differences between the major versions?

XIANG LI: etcd0 is the initial alpha-quality project I put together very quickly during my internship. So it worked, but it has many rough edges. The biggest problem was the rough implementation. It was not that reliable, and the user may run into various consistency issues, especially doing reconfiguration and snapshotting. So etcd2 is the production-quality version of etcd0. We don't have etcd1 because the marketing people think it's a big jump and we have to skip 1.

ADAM GLICK: [LAUGHS]

XIANG LI: The API is similar, but we did a rough imitation. Basically, we changed the heart of etcd, and this time, we did not really rely on external rough imitation, but we brought our own while inside etcd repo. So it is easier for us to control the quality, and we saw huge reliability improvements.

etcd3 focuses more on the API and the efficiency parts. So etcd0 API was done by me in a couple of weeks. Also, I learned from Zookeeper and Doozer. The API was still not very well-designed. In etcd3, we adopted gRPC as a main API surface and exposed HTTP API by using a gRPC HTTP gateway. gRPC allows etcd to have a better-defined API and better performance.

We also added features like mini-transactions, multi-version concurrence control, and reliable watch into etcd3. So after migrating Kubernetes to etcd3, we see huge performance improvements.

ADAM GLICK: I also, if I recall correctly, saw that V3 moved to a tree model from a flat key space. What drove the decision to make that shift?

XIANG LI: There are a few reasons. First, that we believe that a flat model is more flexible and you can build a tree model on top of it. Second, at the time, we wanted to support binary keys, so we have to support a flat model. With a tree model, it's very difficult to support binary keys.

ADAM GLICK: Is there an etcd4 in plan, and what would be the change necessary to make an etcd4?

XIANG LI: Hopefully not. We bumped the major versions because of compatibility issues. We don't really want to break our user again.

ADAM GLICK: In an interview last week, we talked about gRPC. Did etcd always use gRPC for its API? You mentioned that as part of V3.

XIANG LI: No, from the very beginning, it uses HTTP, because there was no gRPC at the moment. We actually started to look at gRPC very, very early on. I think it's gRPC alpha or beta.

And then we started to work with the Google gRPC goal team. And we want to use gRPC because I think it can provide a much better-defined API surface for very serious distributed systems. And then we worked on a few early features of gRPC together to support etcd use case. And I think it's a really good decision for us.

ADAM GLICK: How has the growth and usage of etcd changed with the growth of Kubernetes?

XIANG LI: We actually didn't see a sudden or very clear jump. If you look at the GitHub stars or issue perspective, etcd has grown almost linearly since the very beginning. But I think Kubernetes did put a lot of challenges and pressures on the etcd side.

We had people ask questions related to Kubernetes in the etcd repo. If you don't know Kubernetes, you have no idea what they were talking about. But I was one of the first CoreOS employees to look into Kubernetes project, so we were able to have some of our users out.

And also, as Kubernetes has gotten more and more popular, we started to keep an eye on its progress, and we'd like to make sure etcd works well for it. We actually did the etcd3 implementation for the API server in Kubernetes, and now I actually work more on Kubernetes than etcd at Alibaba.

ADAM GLICK: How did etcd have to change to handle the amount of data that Kubernetes wants to store and the throughput needs that it has when managing clusters?

XIANG LI: At Alibaba, we operate very large-scale Kubernetes clusters, and we have to improve etcd accordingly. There are several major things. So the first one is the performance of the Raft algorithm. The second one is to improve the on-disk storage for etcd. And the third one is to improve the compaction efficiency because of the MVCC model.

After we made those improvements we can support very large Kubernetes clusters.

ADAM GLICK: Many people think of etcd as simply a component of Kubernetes as opposed to a separate product or project, but indeed, it started separate from Kubernetes. Can you give some examples of how people use etcd outside of Kubernetes?

XIANG LI: Yeah, today, a couple of infrastructure-level open source products still use etcd. For example, Apache, Flink, Rook project from CNCF, and OpenStack. We also have internal system use etcd at Alibaba other than Kubernetes.

I think it applies to other companies as well. The etcd Raft implementation is also used by a number of open source projects-- for example, TiKV, a distributed key-value store from CNCF, and CockroachDB, a distributed database system.

ADAM GLICK: Am I right in thinking that etcd was also the first project to have an operator developed for it?

XIANG LI: Yes, at CoreOS, we developed the first few operators, and I'm the author of etcd-operator. And we create the operator concept together.

ADAM GLICK: etcd has had security audits and undergone two Jepsen tests.

XIANG LI: Mm-hmm.

ADAM GLICK: What has the project learned from those audits and tests?

XIANG LI: For the Jepsen tests, the results are pretty positive, and we learned that the testing are really, really important. We did a lot of tasks for etcd, but the unit tests, the integration tests, and the functional tests. For functional test, we actually run and etcd cluster 24 hours each week, and we inject failures into etcd itself, and we see how the failure can be recovered and whether etcd are still consistent after those failures. I think that really helped a lot for the Jepsen tests later on.

Also, we learned that it's very important to have a very good user experience and very good user-facing documentation. Jepsen tests helped us to figure some of the areas that are weak and we need to improve. So it didn't really put a lot of time on the security side of the etcd, because most of the time, etcd will be used as an internal project supporting the infrastructure level of the system. So in the security test, I think they discovered some really interesting bugs, and in the future, we need to improve the security state of etcd accordingly and use the state of art algorithms for encoding for the encryption.

ADAM GLICK: It started as a project a CoreOS but has now been donated to the CNCF and is a CNCF project. When did you decide to donate it to the CNCF, and what was the motivation for making that decision?

XIANG LI: etcd is developed at CoreOS initially. All the major contributors are from CoreOS. We wanted to keep the goal focused and move really fast from the beginning, so we didn't really try to acquire external developers.

As they product has become more and more mature later on, we've found it important to keep the product stable and seek opinions from different parties and from different perspectives. So we started to find external maintainers. Because of the usage of Kubernetes, Google, IBM started to put under engineering effort product. And after Red Hat acquired CoreOS I also moved to work for Alibaba Cloud.

Another co-maintainer, Gyuho, moved to work for AWS. Then, moving etcd to a neutral place, more specifically donating to CNCF, became a clear choice for us. And overall, I think it's a wise decision. We got free access to testing infrastructure from CNCF. CNCF helped us with Jepsen tests and analysis for the correctness of distributed system. It's not that cheap, but CNCF paid for it. They also have-- with security audit for the product, we really are grateful for the help from CNCF.

ADAM GLICK: You mentioned you're now working at Alibaba. What are you working on these days?

XIANG LI: I'm working on Alibaba's internal cluster management system. So we used to use in-house built systems to support all the Alibaba machines. We have tens of thousands of machines. And my main job was to move those machines to use Kubernetes. And we now use Kubernetes to manage all the Alibaba applications.

ADAM GLICK: We've been chatting for a while about etcd, which is spelled E-T-C-D. Where did that name come from?

XIANG LI: Actually, it's the directory for storing configuring datas on a Linux box. And we want to have a system to store configuration data for distributed systems. So then we call it etcd. We add the D as a "Distributed." etcd is short for "etc Distributed." We think about other names before the launch, but we cannot figure out a better one, so we stick with etcd.

ADAM GLICK: I heard a rumor that one of the names that people looked at was Alpaca. Is that true?

XIANG LI: Yes, it is true, but I don't like the name particularly because it's similar to the f-word in Chinese.

ADAM GLICK: But in a parallel universe, there's plushy little alpacas that people are giving out at etcd booths?

XIANG LI: Yes.

ADAM GLICK: What's next for etcd? What's on your roadmap?

XIANG LI: I think there are three main efforts for etcd for now and, I believe, for the future. So first is availability and correctness. So although we passed the Jepsen tests and the results are positive, but there are still more areas to improve. People are still hitting consistency issues.

For example, Tungsten Cloud just hit a bug when authentication is enabled during a class updates. And they are fixing it with etcd maintainers. Second, we want to make etcd easy to operate. So we are operating thousands of etcd clusters at Alibaba Cloud, so discovery.etcd.io is backed by a real asset cluster.

Normally, if you set up a cluster up and running and never touch it, it'll work just fine. But when something really happens, it's not easy to debug and to roll back to a healthy state. So we are working on improvements over the etcd Kernel Tool for better debugging ability, as well as the downgradability to recover your cluster to a healthy state.

Third, I think, is scalability. We know people want Kubernetes clusters, and etcd was one of the blockers. So we have been solving this problem very actively with Google engineers. We are making progress on improving the internal data storage, the compaction, as I mentioned before, I think the whole system will be much more efficient in the next few releases. We are pretty confident that etcd will do well for more than 10,000 nodes Kubernetes clusters.

ADAM GLICK: It's been fantastic having you on the show. Thanks for coming, Xiang.

XIANG LI: Thank you for hosting me.

ADAM GLICK: You can find Xiang Li on Twitter @xiangli0227. You can find the etcd project at etcd.io.

[MUSIC PLAYING]

ADAM GLICK: Thanks for listening. As always, if you enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @KubernetesPod, or reach us by email at kubernetespodcast@google.com.

CRAIG BOX: You can also check out our website at kubernetespodcast.com, where you will find transcripts and show notes. Until next time, stay safe, and take care.

ADAM GLICK: Catch you next week.

[MUSIC PLAYING]

View More Episodes

etcd, with Xiang Li

Chatter of the week

News of the week

Links from the interview

Transcript