#138 February 16, 2021

Multi-Cluster Services, with Jeremy Olmsted-Thompson

Hosts: Craig Box, Tim Hockin

This week we talk multi-cluster services with Jeremy Olmsted-Thompson, co-chair of the Kubernetes Multicluster SIG, and tech lead on the Google Kubernetes Engine platform team. Guest host Tim Hockin shows us the way.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

News of the week

CRAIG BOX: Hi, and welcome to the Kubernetes Podcast from Google. I'm Craig Box, with my very special guest host, Tim Hockin.


I have with me here one of the creators of Kubernetes, who is still knee-deep in the community, reviewing KEPs, driving new designs. So of course, today, I want to talk to you, Tim, about "Star Wars."

TIM HOCKIN: Excellent.

CRAIG BOX: Is "The Mandalorian" the best thing since "The Empire Strikes Back?"

TIM HOCKIN: It probably is. It really is the truest expression of "Star Wars" that we've seen in quite a long time, I think.

CRAIG BOX: I don't want to go too deep into spoilers, but the first series has a lot less lore and a lot less Jedi in it. And I think it's all the better for it.

TIM HOCKIN: You know, I think the lore around the Jedi is interesting. And while Lucas may suffer his director abilities, he is a great storyteller. And so I appreciate what you're saying, but I didn't mind the Jedi. I actually really liked the idea of this grand republic filled with these noble heroes.

CRAIG BOX: But I like the idea that we're telling a story that's in the same universe, but it's often a different part, and not everything has to connect back to the core saga of things. That was a really good angle, too-- how can we expand this universe? And I feel some of the things in the second season bring it back to the connection to things like the Jedi a little bit. And I think that was, perhaps, an interesting choice.

TIM HOCKIN: Yeah, absolutely. Just because the Jedi exist and are pretty cool doesn't mean that every single story set in this universe has to be about them. So I really do appreciate this is just an average man trying to make his way through the universe and running into some others. And it's pretty excellently done.

CRAIG BOX: It sounds like they're going to take it to somewhat of an extreme with a few more spinoff TV series. How much "Star Wars" is too much?

TIM HOCKIN: I haven't found it yet. I am a rabid consumer of all the "Star Wars" media. I might have mentioned this in public before, but I love to read the novels, and the comics, and watch the TV shows. And so basically, if it's out there, I'm trying to consume it. I will give Disney props this year for overloading me. And I'm now about three novels behind. But I'm still working on it.

CRAIG BOX: They took all the novels and took them out of canon, didn't they? Did they have a new series of novels in the new universe?

TIM HOCKIN: Oh, yeah, of course. They did. They drew a line at some point, and they said, anything prior to this is not canon. And if you've read them, I'm sorry. They're just legends. But they've got a whole slate of new, wonderful, really great novels that cover backstories for characters like Ahsoka or Captain Phasma.

They build out the universe, too. They add more dimensions to it. They fill in gaps in the timelines. And it's really great. One of the things that's wonderful about "The Mandalorian" is it brings in all these references from these other comics and novels, which if you haven't read them, you won't notice. They won't be out of place. But if you have read them, you'd be like, oh, oh, I know what that is.

CRAIG BOX: Excellent. Now what is the correct order to watch the original six movies? Sorry-- let me rephrase that. There are not an original six movies. There's the first six movies. How dare I call the second three movies original.

TIM HOCKIN: The correct order is, IV, V, II, III, VI.

CRAIG BOX: Interesting.

TIM HOCKIN: Unless you threw in "Rogue One," which you have to put in somewhere in there. But I'm not really sure what the right way to do "Rogue One" is at this point.

CRAIG BOX: I think I agree with that. That is, I think, sometimes called the machete order, where you get rid of Episode I entirely.


CRAIG BOX: I think we don't need too much trade negotiation and Jar Jar Binks really to set the scene.


CRAIG BOX: I remember in preparation for Episode VII coming out, going back and watching the prequels and really wishing I hadn't done that.

TIM HOCKIN: Having kids, I've watched the prequels several times. And episode I is pretty dry. The kids have a hard time getting into it. Except there's Darth Maul, which really is the salvation of the entire film-- and some good space battles.

CRAIG BOX: For all two minutes that he's there.

TIM HOCKIN: Yeah, right. But he's an important character later. So it's important to meet him and to know what drives him. I've watched Episode II I cannot tell you how many times because as a parent of two boys, the fight scenes and the clones are just awesome. They're just fantastic. The kids' imaginations go nuts. And then Episode III, of course, with its epic opening and the space battles, we get a lot of play on those. They are a lot less concerned with the quality of the acting and are much easier to get immersed in the universe.

CRAIG BOX: And are your children conscious of the effort in more recent movies to have more diversity in the cast and especially more lead female characters?

TIM HOCKIN: Yeah, I think at the pre-teen age, those things don't really cross their minds. I will say, I have two boys. And when we are choosing Disney movies, they tend to shy away from the movies that are true love stories or that have a lot of female leads. And we try to bring them back to remember these are great movies too. They didn't have any problem at all with Rey as just an ass-kicking Jedi.

CRAIG BOX: Excellent. Were they shipping for Poe and Finn?

TIM HOCKIN: Poe is a favorite character, for sure, for sure. Finn got done dirty with the whole movies. He could've been such a good character. And they just-- they didn't give him enough to do.

CRAIG BOX: Yes. I know John Boyega has said as much, and it's a bit of a shame.

TIM HOCKIN: Yeah. I'd like to play as Finn when I play in Star Wars Battlefront 2 because he's a pretty awesome character to play.

CRAIG BOX: Well, as much fun as that's been, we should probably get back to talking about Kubernetes at some point. So let's get to the news.


CRAIG BOX: The Istio project has released version 1.9, starting a 2021 push to improve stability and ease of day two operations. Virtual machine integration officially hits beta in this release, as well as the ability to classify metrics based on request or response.

Next week is the first Istio user and developer conference. IstioCon features five days of talks as well as social events and networking opportunities. Registration is free, so sign up now.

TIM HOCKIN: Last week saw the first Chaos Carnival, a conference dedicated to cloud-native chaos engineering. The event was arranged by LitmusChaos and OpenEBS creators MayaData. And the headline news from it was that MayaData is spinning LitmusChaos off into a new company. The new business, called ChaosNative, will allow both companies to focus on their separate communities. LitmusChaos is a CNCF sandbox project.

CRAIG BOX: The Cilium Project has created an online editor which you can use to visually create and verify Kubernetes network policies. You can start with your tutorial and score your policies for their security. It even has an example for “it's always DNS”, because let's face it-- it is. If you want to understand exactly how network policy works, Dominik Tornow has continued his series of systems modeling analysis of Kubernetes features with posts on both Kubernetes networking and network policy specifically.

TIM HOCKIN: Malware can be quite cheeky these days. Some cryptocurrency miners assume that if they can exploit your system, others could too, and look to kill off any other malware containers they find to maximize their chances. A new attack recently seen in the wild by Trend Micro, tries to start a privileged container which can be used to gain root access on the host. If it fails, it starts crypto mining while pretending to be NGINX.

CRAIG BOX: Dynatrace has added cloud automation to its software intelligence platform powered by their open source project Keptn, which you can learn about in episode 119. This promises quality checking for applications, remediation of releases that fail on production, and inventory for performance of releases. Support for Kubernetes and cloud logs was also added to the platform.

TIM HOCKIN: Cloud-native application management framework Shipa has released version 1.2. New integrations in this release include full integration with Istio, injection of secrets with Hashicorp Vault, and support for deploying from private registries. Shipa is free for up to five users.

CRAIG BOX: Recent updates to the hosted Kubernetes data services from the clouds-- GKE now defaults new clusters into the release channel system, with the default being their regular channel. Users can opt for faster or slower updates with the rapid or stable channels respectively. Amazon has launched OpenID Connect support for EKS, which lets you use OIDC identities alongside IAM, or replace it completely if you wish. Microsoft's AKS now supports starting and stopping clusters in GA as well as reusing IPs across load balancers.

TIM HOCKIN: VMware's Tanzu Build Service now supports building Windows containers, signed container images, and FIPS compliance. The Tanzu service, now at version 1.1, is based on the open source kpack utility.

CRAIG BOX: Blogger Jeff Geerling has been experimenting with livestreaming on YouTube, producing a new 10-episode Kubernetes 101 series. His blog post explaining the economics is interesting. The series had lots of viewers, with the first episode starting strong, but a large drop-off after. Between ad revenue and sponsorship, he currently only makes one quarter of his consulting rate, but he's proud of the accomplishment. The content now forms the basis of a Kubernetes 101 e-book, which you could buy for $4.99.

TIM HOCKIN: The CNCF are hosting eight events before KubeCon EU. And the call for proposals for all of them closes February 19. If you want to talk about Crossplane, security, Fluentd, Rust, service mesh, edge, AI, or Wasm, get your proposal in soon. If you just want to listen, you can add attendance to one of these days to your KubeCon ticket for $20.

CRAIG BOX: The team at Teleport has been building a cloud-hosted version of the Unified Access Software and wrote a blog post about deploying it to Kubernetes. Moving from on-premises to running this software, it says, required consideration for multi-tenancy, billing, and walking a line between cost and availability. Virag Mody from Teleport has written about this journey, with lessons that are relevant to people running any kind of workloads on Kubernetes.

TIM HOCKIN: Finally, with the upcoming deprecation of pod security policy in Kubernetes, you may be looking for a replacement. Chip Zoller has looked at the two leading candidates, OPA and Gatekeeper by Styra, which recently graduated in the CNCF, or Kyverno, a new CNCF sandbox project contributed by Nirmata. Zoller looks at the pros and cons of each. He perceives Kyverno as being easier but Kubernetes-only. Whereas Gatekeeper connects you to the entire ecosystem that OPA supports at the cost of learning its programming model.

CRAIG BOX: And that's the news.


CRAIG BOX: Jeremy Olmsted-Thompson is a tech lead on the GKE platform team and a co-chair of the Kubernetes multi-cluster special interest group. Welcome to the show, Jeremy.

JEREMY OLMSTED-THOMPSON: Thank you for having me.

CRAIG BOX: Where were you when Kubernetes was launched?

JEREMY OLMSTED-THOMPSON: I was working on log analytics at VMware and actually looking into how we can take advantage of containerization to help ease our deployments. We were building a solution and looking into various solutions and then-- boom-- Kubernetes comes out. And hey, look, this thing solves all the problems that we're trying to deal with right now and, more importantly, makes a lot of sense in the way it does it. So that was pretty exciting.

CRAIG BOX: Did it create new problems?

JEREMY OLMSTED-THOMPSON: So we didn't immediately adopt it. I started immediately playing around with it. But it took us a little bit of time to get used to that. We kept going with our plans for a little while. And then I ended up actually leading VMware to take a little bit of a break tech and start an agency with a good friend of mine focusing on some nonprofit work for a little while. And that was really fun. Did some mobile apps. But then this opportunity came up at Google to come work on GKE on vSphere, which seems like a pretty perfect fit. That ended up becoming Anthos GKE on-prem.

CRAIG BOX: What was the process like of building that out? That was one of the first times that Google had produced a product that was to run outside of the constraints of its own environment.

JEREMY OLMSTED-THOMPSON: First of all, it was very cool getting to work so closely with all of these great Kubernetes experts who built it from the ground up and knew everything they were doing. But yeah, it was a new space. So you know, we're building out all kinds of new technology, new practices for working in this new environment but still taking advantage of all the previous learnings that we'd had building GKE on GCP.

CRAIG BOX: Now, someone who's using GKE on-prem is presumably also running it on GCP. So now they'll be in the situation where we're dealing with multiple clusters.


CRAIG BOX: How did you end up becoming the co-chair of the multi-cluster SIG?

JEREMY OLMSTED-THOMPSON: So we started talking to our bigger customers about the next phase of their Kubernetes deployments. A common pattern we saw was at some point, whether it's to control blast radius, manage upgrades, location-- honestly, too many reasons-- customers find that one cluster is no longer enough. And that's because traditionally, in Kubernetes, the cluster is the end of the universe. Kubernetes clusters have a lot of useful information and a lot of capabilities, but they don't know anything about other clusters or, really, themselves from the outside.

A lot of our customers are super capable. And they've built these multi-cluster solutions for tying their clusters together and deploying across clusters. But everyone ends up in a slightly different place that's tailor-made for where you are at the time. And it can be hard to maintain that and grow that. And so we started thinking, hey, everybody has this problem. Maybe we can come up with a common solution.

So I started chatting with community folks and driving some interest around, what if we start with something simple? What if we start with just Services? Everybody needs to connect their Services between clusters. Let's start there. We started driving a lot of conversation and started picking up other things that we could tackle. What are other problems that everybody has that we can address that are clearly defined?

CRAIG BOX: Old-school Kubernetes people may remember with some reverence-- or some scared feelings, perhaps-- the word Federation. There was a thing that was the idea for multi-cluster originally, this tying these clusters together through some sort of federated model. Can you tell me what happened with that idea and how it's progressed through the community so far?

JEREMY OLMSTED-THOMPSON: Federation is still around. It went through a bit of a rewrite to Federation v2. But it's still around and very useful for people who are working on it and maintaining it. But my kind of take is Federation tried to solve all of the multi-cluster problems at once, which is noble and incredibly hard. How do I deploy all my stuff and make it all work? After that, I think, with multiple-cluster services, the big difference was not trying to solve all the problems at once-- in fact, trying to specifically solve the narrowest set of problems we possibly could.

CRAIG BOX: So we know what a Service is in Kubernetes - we select a group of pods and direct traffic to it. Now in SIG Multi-Cluster, you're working on a multi-cluster service. What is the multi-cluster service?

JEREMY OLMSTED-THOMPSON: A multi-cluster service, in many ways, is-- or what we've really tried to do-- is make it the multi-cluster equivalent of a Service but behaving in almost exactly the same way that you'd expect within a single cluster.

CRAIG BOX: Well, you picked a good name for it, at least.

JEREMY OLMSTED-THOMPSON: Yeah, not super creative, but easy to understand.

CRAIG BOX: If I have something that runs in a single cluster, I create the Service object there, and it is attached to the API server in the same cluster as the things that it's addressing. Now I have multiple clusters. Where do I create this object?

JEREMY OLMSTED-THOMPSON: This builds on the concept that we've introduced that we've been referring to as namespace sameness. A namespace is clearly part of the identity for a resource in Kubernetes. People are used to that.

And so the statement that we're making is-- and this is an open-source term that we've come up with here in the community is ClusterSet. Within a ClusterSet, which is some group of clusters that work together. That's basically the amount of definition we have. But within that, a namespace should have a consistent meaning. We're trying to get more parts of Kubernetes to pick this up. But the idea is that if a Service exists in one namespace in one cluster and the same name in another cluster, those should be the same service.

And so if we can take that for granted, then a multi-cluster service is actually just a service that you create a ServiceExport resource for with the same name. And then that becomes part of that multi-cluster service. And so if you have this same service in multiple clusters, we kinda merge those together. And consumers get a Service name. Just like they get that cluster.local service name within a cluster, they get a clusterset.local service name in multi-cluster. But behind that, you basically just get a Service.

CRAIG BOX: Inside a single cluster, you have Endpoints, objects, and then Services-- objects that refer to them. What are the types of API objects that you've had to create to implement this multi-cluster service system?

JEREMY OLMSTED-THOMPSON: We tried to break this out into two parts. There's the export and the import. Just as our API name isn't super creative, neither are these resources. We have the ServiceExport, which is what you create to export a Service. And that is named-mapped to the Service in the namespace that you want to make available. And we've kept this as simple as possible. So that is literally just a name. You create this resource, and you export it.

On the import side, most consumers won't really interact with this. But we have a ServiceImport that is created. And the ServiceImport mirrors a Service and is backed by Endpoints in that importing cluster. Now, really, the normal way to consume this service is just with that service name, which gets created in that importing cluster at the same time as ServiceImport. So you get a familiar-looking service.namespace.clusterset.local name for that service instead of cluster.local.

CRAIG BOX: Why do I need the ServiceExport object? Why can't the ServiceImport just refer to it by its name, like you mentioned before?

JEREMY OLMSTED-THOMPSON: We wanted to separate the concepts of exporting and importing. You can actually control the export. So with the ServiceImport-- first of all, it can be difficult to understand, is this cluster actually part of the clusters backing the service? Because not every service is going to be deployed in every cluster. Or is this actually importing? And so someone who just wants to consume can just only care about that ServiceImport. And then the decision to create that export is left up to the service owner.

CRAIG BOX: There are a lot of service mesh products that are addressing the multiple-cluster connectivity. Is this something that you hope that they will build their implementation on top of this common API?

JEREMY OLMSTED-THOMPSON: That's exactly right. In fact, we've been working with a few of these other implementers to try and create some unity and consistency so that you get that same portable Kubernetes experience that you expect regardless of which implementation you go with. And we've started seeing some pretty great adoption already.

CRAIG BOX: It's reasonably easy to point a load balancer, be it a hardware device or a cloud load balancer, at the cluster and at the Service and address all of the backends. How am I handling this in a multi-cluster environment? How do I address this multi-cluster service object I've created?

JEREMY OLMSTED-THOMPSON: Within a cluster, you basically get a ClusterIP. And we call this the ClusterSetIP. So from outside of a cluster, we want to keep it as consistent as possible. There's the Gateway APIs project going on in SIG Network. So we've talked to them about making sure that just like a gateway can point at a single service within a cluster, it can point at a multi-cluster service across clusters. And so you get that kind of same experience building on itself.

CRAIG BOX: So in one or more of the clusters where these services are hosted, do I have a multi-cluster service object that has the list of cluster endpoints, in the same way that I'd have a list of pod Endpoints within a single Service?

JEREMY OLMSTED-THOMPSON: That's exactly right. When you create a ServiceExport, what happens is the other clusters in your ClusterSet get a ServiceImport resource that is backed by endpoints, just like you would see with a Service within a single cluster.

CRAIG BOX: Is this something that I get automatically propagated between all of the clusters? Or do I have to have some external thing create this in each of the clusters? Or does that even matter?

JEREMY OLMSTED-THOMPSON: This is the fun thing. We wanted to build some consistency here and bring to the community the same kind of portability in multi-cluster that they're used to within a single cluster. So we've defined this API and got a lot of people on board. So there's a few different implementations right now working on this API, everywhere from various custom implementations for a single environment to service meshes.

And actually, Google is just releasing our own GKE multi-cluster services implementation for GKE backed by Google's Traffic Director technology. And so with any of these implementations, you have your clusters. You create your ServiceExport. And that ServiceImport is automatically created in all of your member clusters.

CRAIG BOX: You've mentioned that GKE has an implementation of this multi-cluster service API, which is launched this week. So first of all, congratulations.

JEREMY OLMSTED-THOMPSON: Thank you. We're very excited about it.

CRAIG BOX: How am I interacting with this? If I've created these Services, and let's say, I've got clusters in multiple locations. I'm deploying the same thing. I've got these Services with the same names. How am I actually going about creating the exports and imports required and connecting this up to my global load balancing?

JEREMY OLMSTED-THOMPSON: This is something we're really proud of. We tried to make it as simple as possible. You just take your clusters, you register them with our GKE hub-- this is our registry-- and enable our feature. Once our feature is enabled, all of those clusters become linked. And then you simply create a ServiceExport with the same name as the Service that you want to export.

And that Service is automatically propagated to the other clusters that are linked to your environment. And you can start consuming it, just like a cluster.local Service in your cluster. Only this time, these Services span multiple clusters. And the endpoint propagation health checks and whatnot are handled by Google's network control plane.

CRAIG BOX: Is this something that applies just to connectivity within the clusters? So if I'm doing a DNS lookup for that clusterset.local name from within one of these clusters, I'm connected to multiple things? Or am I actually able to export this and publish it?

JEREMY OLMSTED-THOMPSON: We've kept this entirely within a cluster right now. So every cluster gets its own unique picture that should be eventually consistent, hopefully, relatively quickly. We didn't want you to take any kind of central dependencies. So as much as possible, each cluster handles its own import to help make it a more consistent, available service from the perspective of each consumer.

CRAIG BOX: But in terms of how I might expose a service in a single cluster through an Ingress, we now have a multi-cluster service. Can I expose that to the internet in a similar fashion?

JEREMY OLMSTED-THOMPSON: Multi-cluster services mostly targets that cross-cluster, trusted east-west traffic. But it complements very nicely our other products in the space, like multiple-cluster Ingress, which lets you expose that multi-cluster service to an external consumer as a single service, just like you would the multi-cluster service within a cluster.

CRAIG BOX: In our example where we have multiple clusters in multiple locations, all of which were running a replica of the server, if I'm calling into this multi-cluster server, am I going to favor my local copy of the service? Or is that something that I might want to or not want to in certain cases?

JEREMY OLMSTED-THOMPSON: Multi-cluster services, to be as consistent with the existing in-cluster Service experience, really builds on the same data plane, which means building on kube-proxy. And kube-proxy doesn't have intelligence around waiting and whatnot. So you get even traffic spreading. If you want to favor a location more than others, then creating more pods in that location is the answer. The idea is that if you don't want traffic to always go to all the back ends, then that may not be a fungible service. Maybe you're really describing an east service and a west service.

CRAIG BOX: How do you think the availability of multi-cluster service is going to improve availability of people's services?

JEREMY OLMSTED-THOMPSON: My hope is that people can take advantage of how easy it will now be to expand a service across clusters to really add that extra level of redundancy. It can simplify a lot of operations, too, when you think about truly safe upgrades, or you want to change some cluster scope resources. If you have a service spanning multiple clusters, you're no longer forced to do it all at once. You can pick a cluster, upgrade it, see how it works, and continue from there. If you wanted to do a blue-green upgrade, spin up a new cluster, migrate your workloads, tear down the old one, now that's much easier than it was before.

CRAIG BOX: The service having gone generally available on GKE, I'm interested in how your API design progresses through the community. This is implemented as CRDs, which is fantastic in terms of being able to decouple the deployment from Kubernetes releases and move quickly. How do you see multi-cluster services moving through your SIG?

JEREMY OLMSTED-THOMPSON: It's going very well. A bunch of other implementers are adopting it right now. We are progressing through the graduation beta to GA in open source with tons of support and great feedback from all kinds of different implementations serving different audiences. Yeah, it's going really well. And I think starting with our own CRDs really helped us explore and understand the problem space.

CRAIG BOX: As you mentioned up front, this was a single thing that you could do. You could say, we'd take a Service, make it multi-cluster. Do we now start thinking about how we do multi-cluster deployments? What's the next step after this?

JEREMY OLMSTED-THOMPSON: There's some discussion going on, really, around that right now. I think there's people looking into what the next type of deployer could look like.

And another immediate problem we're tackling right now is creating some form of consistent cluster ID, some kind of consistent cluster identification to solve the problem that we talked about at the start. A cluster doesn't know anything about itself from the outside world.

CRAIG BOX: When you have deployment systems like the cluster API, for example, that by way of provisioning clusters have some idea of them, do you think there is a space for this? Or do we start getting into a totals all the way down problem?

JEREMY OLMSTED-THOMPSON: I think there's definitely a space. And actually, the current ID concept that's circulating has been brought up with the cluster API folks as something that would be very useful. From my perspective, you know, cluster API is about creating clusters and provisioning those clusters. And cluster ID is about identifying those clusters, and it doesn't necessarily have to care how the clusters got there.

CRAIG BOX: So which direction would you like to see multi-cluster evolve in next?

JEREMY OLMSTED-THOMPSON: That's a tough question. That's a little up in the air, and it will be driven a lot by what people do with multi-cluster services. In a lot of ways, we're opening some new doors. And we're going to see some new needs develop. As soon as we make it so easy to deploy across clusters, I think we're immediately going to start seeing new requests.

CRAIG BOX: Well, I look forward to seeing what you come up with next. And that you very much for joining us today.

JEREMY OLMSTED-THOMPSON: Thank you for having me.

CRAIG BOX: You can find Jeremy on Twitter and GitHub as JeremyOT.


CRAIG BOX: Tim, thank you very much for helping us out with the show today.

TIM HOCKIN: No problem, Craig. It was fun.

CRAIG BOX: If you enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @KubernetesPod or reach us by email at kubernetespodcast@google.com.

TIM HOCKIN: You can also check out the website kubernetespodcasts.com, where you will find transcripts and show notes as well as links to subscribe.

CRAIG BOX: I'll be back with another guest host next week. So until then, thanks for listening.

TIM HOCKIN: Live long and prosper.