#32 December 4, 2018

MetalLB, with David Anderson

Hosts: Craig Box, Adam Glick

If you’re running on-prem, and you say set up a Service type=LoadBalancer, what happens? Does your cluster call your NOC and have them order you a Juniper router? MetalLB is a popular answer to that question. Your hosts discuss load balancing with MetalLB’s author, Google Cloud SRE David Anderson.

Do you have something cool to share? Some questions? Let us know:

News of the week

CRAIG BOX: Hi and welcome to the "Kubernetes Podcast from Google." I'm Craig Box.

ADAM GLICK: And I'm Adam Glick.


CRAIG BOX: Has the Christmas music started in Seattle?

ADAM GLICK: Oh, oh yes, it has. My personal favorite yesterday as we were out shopping for Hanukkah candles, and I'm in the Hanukkah section, and the guy looks at me, and he goes, Merry Christmas.

CRAIG BOX: Other about your famous Hanukkah songs?

ADAM GLICK: There is the Adam Sandler.


ADAM GLICK: You don't recall this?

CRAIG BOX: I do remember that. It took me a while. But--

ADAM GLICK: Yeah. That's probably the best I can give you. There's no Mariah Carey one for us. So.

Are you preparing for the glorious days of Christmas?

CRAIG BOX: We do make a point of going out and getting a Christmas tree every year. So sometime this week we'll be wandering down to the Christmas tree shop.

One of the things I've always liked about New York at Christmastime-- I've been there in December a couple of times-- and there's just like the florists on every block basically have a wall of Christmas trees to select from. I'm going to have to walk a little bit further than that. And last year we actually picked out a Christmas tree and then brought it home in a cab, which is a thing you can do in London.

ADAM GLICK: In the cab?

CRAIG BOX: Yeah. I have-- there's a sort of like a homeware shop, like a Home Depot thing called Home Base here in the UK. It's probably about 15 minutes walk away. I have actually walked a Christmas tree home from there in the past.

ADAM GLICK: We started looking at Christmas trees. And we got our first Christmas tree ornament yesterday as well. So slowly building up until our home turns into a winter wonderland.

CRAIG BOX: Brilliant.

ADAM GLICK: Let's get to the news.


CRAIG BOX: Kubernetes 1.13 has just been released. Highlights include the graduation of the kubeadm bootstrap tool to GA, alongside support for the container storage interface, also in GA. A new feature in beta is a dry-run mode in the API server. DryRun lets you test a request, including admission control and server side validation, without committing it to the system, basically doing everything but saving the object. This powers a new kubectl diff command and allows you to validate exactly what will change when you submit an object to the API server.

Other noteworthy changes include support for plugins and kubectl, general availability of the GC regional persistent disk, and removal of support for etcd version 2 now that version 3 is the default. Look out for that one if you're upgrading.

If you're running a previous version of Kubernetes, be aware a critical vulnerability in the handling of API server requests has been published. A specially-crafted request could be used to gain privileges on an aggregated API server or the kubelet on a node. It is patched in the latest versions of 1.10 through 1.12, as well as the 1.13 release.

ADAM GLICK: Indeed's Hiring Lab says Kubernetes is the most popular skill in tech. Kubernetes had the fastest growth in job searches, rising 173% from a year before. Golang, the language Kubernetes is written in, was the top programming language with 81% year-on-year growth in searches.

CRAIG BOX: Congratulations to the Envoy community, graduating to a top level project in the CNCF. Envoy, a modern proxy server, has rapidly risen to a leadership position in Cloud Native as a universal data plane powering load balancing and service mesh technologies. By graduating to top level, Envoy has demonstrated it has a healthy community of committers from diverse vendors, including Lyft, Google, and Apple, as well as a well-defined governance model. Make sure you catch next week's episode where we speak to Matt Klein, the original author of Envoy.

ADAM GLICK: Last week was Amazon Web Services' annual Reinvent Conference in Las Vegas. And they launched a number of new services. Their first announcement in the container space was Firecracker, an open-source virtualization technology for serverless workloads. Firecracker is a fork of Google's Chrome OS Virtual Machine Monitor and influenced by Google's NoVM research project, as well as Intel's Clear Containers.

Compared to QEMU, which aims to emulate all devices an x86 computer might have, Firecracker emulates a small handful of devices, promising a safer, faster, and lower resource usage model VM. No BIOS, no PCI, just the minimum you could conceivably need to host a network service. Amazon says Firecracker was built to power Lambda and Fargate workloads.

Firecracker has experimental integration with containerd, and the Kata Containers pProject are looking at ways that they can integrate Firecracker as an alternative back end. If you want to experiment with Firecracker, Alex Glikson from Carnegie Mellon University has written up how to get it working with nested virtualization on Google Compute Engine VMs. And you can find that link in the show notes.

CRAIG BOX: Next up was AWS App Mesh, otherwise known as AWS discount Istio. App Mesh brings a hosted service mesh built on the Envoy proxy for services running on Amazon ECS, EKS, and Kubernetes on EC2. At launch it has basic support for a subset of the traffic management functions provided by Envoy. Support for the rest of those functions and observability features are due before the GA of the service some time in 2019. And mutual TLS security is on the roadmap only for after GA.

App Mesh is conceptually similar to Istio in that is a control plane for the Envoy proxy. Unfortunately, that's about where the comparison ends. Istio is fully open source and supports people running in all Kubernetes environments, cloud and on-prem, as well as on VMs and other environments such as Console and Nomad.

Istio has the support from multiple major vendors, including Google, IBM, VMware, and Red Hat, and an API which has been evolved due to the experience of many. Multi-cluster support is already available. I personally consider it a shame that AWS did not join the Istio community and find a way to use the Istio APIs, even if they chose to implement them on their own multi-tenant control plane.

ADAM GLICK: In related news, Google is rolling out Istio on GKE in beta this week, where you can click Enable Istio in your GKE cluster to get the full experience, including traffic management, observability, and security.

CRAIG BOX: Amazon staffers also took to Twitter to remind people they still care about EKS, with launches such as in-place upgrades announced on slides, but with no detail pages or documentation at the time of recording.

ADAM GLICK: Agones, the open source game server built on Kubernetes released version 0.6.0 this week. This release provides a number of improvements around scaling and the use of cluster autoscaler, as well as features around node control and health checks. If you want to learn more about Agones, check out episode 26 with its founders, Mark Mandel, and Cyril Tovena.

CRAIG BOX: And that's the news.


ADAM GLICK: David Anderson is a site reliability engineer with Google Cloud, focusing on Kubernetes Engine, and the author of MetalLB, a network load-balancing implementation for Kubernetes using standard routing protocols. Welcome to the show, Dave.

DAVID ANDERSON: Thanks for having me on.

CRAIG BOX: I see that you've just ticked over 10 years at Google. Congratulations.

DAVID ANDERSON: Yeah. Thank you. It's been a long while.

CRAIG BOX: Have you been in SRE the whole time you've been here?

DAVID ANDERSON: No, I've actually bounced between regular software engineering roles and site reliability engineering multiple times. But I keep coming back to SRE. I have the itch.

CRAIG BOX: What project have you worked on in that time?

DAVID ANDERSON: Pretty much you name it, I've done it. I started in web search infrastructure, then did a stint in security, then cluster infrastructure-- managing tens of thousands of machines, then networking, and now all the way up in the cloud.

CRAIG BOX: What was the impetus to move to the cloud team?

DAVID ANDERSON: It was actually just-- I was just moving to Seattle, and the Google Seattle office has a huge cloud presence. And it seemed like a fun place to go. It seemed to be where everyone was going.

ADAM GLICK: We mentioned that you created MetalLB. Can you explain for folks what MetalLB is?

DAVID ANDERSON: Sure. So as you said earlier, MetalLB is a load balancer implementation for bare metal clusters. So what that means is if you've used Kubernetes, you know that there is a load balancer object in bare metal clusters. Those objects do not work out of the box. They do nothing. And so you can't really-- there are no good options to get traffic into your Kubernetes cluster.

So MetalLB solves that and tries to give people an easy way to get services exposed outside of their Kubernetes cluster.

CRAIG BOX: It feels like a strange itch for someone working on cloud to scratch. What exactly was it that brought you to needing this project?

DAVID ANDERSON: Right. Yeah. I may work on cloud, but I majored in embedded systems. And this was 10-plus years ago now. In embedded systems, and so I love touching hardware. And I sort of miss that in the cloud.

If you talk to people in the industry, they'll tell you there's sort of two kinds of people. There are people who want to use the cloud, and people who want to build the cloud. And I'm definitely in the latter half of the community.

So I have a bunch of machines at home, and I run a Kubernetes cluster at home. And so that's how the itch developed is I just set up a Kubernetes cluster and looked at how to get it working, and discovered that there were huge swaths of Kubernetes that don't work out of the box if you're not on a cloud provider.

ADAM GLICK: Was this just because Kubernetes grew up on the cloud? Was this something that the original authors didn't really contemplate?

DAVID ANDERSON: Not so much didn't contemplate. Kubernetes by design wants to be extremely flexible. And so they targeted the things that are easier to get started with. Right?

If you have a cloud provider, you know the environment you're running in. So it's very easy to plug in all those pieces.

What they did do-- and I thank them for the foresight-- is they at least had very clean extension points for all these things. So they tell you these pieces do not work out of the box. But here is a very clean place for you to plug in your own thing.

CRAIG BOX: What is it that you had to implement in order to be able to provide this functionality in Kubernetes?

DAVID ANDERSON: Like most things in Kubernetes, the MetalLB takes the form of a controller, which looks in your Kubernetes cluster for service objects that are lacking an IP address and external connectivity, and tries to provision them for you. So that's what that looks like concretely is you run a little deployment in your cluster. And you configure it and tell it what IP addresses it can use to make services available outside the cluster.

And then it just-- like the rest of Kubernetes, just does its thing in the background. And you basically never have to interact with it ever again. Once you've installed it, your cluster just starts working exactly like a cloud provider would where if you create a service, it just starts working.

CRAIG BOX: Let's have a look then at a couple of sample configurations where you might use this. If I've got a router on my home network that's just doing that from a single IP address, and I've got a rack of Raspberry Pis. I've got Kubernetes installed on those. I'd like to get some traffic into there. How can MetalLB help me do that?

DAVID ANDERSON: Yeah. So the case of Raspberry Pis on your home network is exactly why I wrote MetalLB in the first place. So in MetalLB's configuration you would use what it calls Layer 2 mode, which is the simplest possible configuration. The only configuration you have to give it is what IP addresses it can use such that it doesn't conflict with anything else on your network. And then you just start creating services. And it should-- fingers crossed-- make them available to your home network.

Now the catch is, unlike a cloud provider, I can't give you public IP addresses, because I don't have any of those to give out.


DAVID ANDERSON: So it will only be available on your home network unless you take additional steps beyond MetalLB.

CRAIG BOX: And if I had a rack of Compaq Proliant servers and Cisco 2600 routers back from 1997, what would I be doing in that situation?

DAVID ANDERSON: Right. So that's where MetalLB's other operating mode comes in, which is BGP mode. So I don't know if everyone knows what BGP is. But it's the standard routing protocol that knits the internet together.

So anytime you would visit a data center, there is a very likely BGP running through and through.

ADAM GLICK: It's Border Gateway Protocol?

DAVID ANDERSON: That is correct. Yes. So in that mode you need to have a router that can speak BGP and configure MetalLB to connect to that router.

The huge benefit you get in that configuration is unlike Layer 2 mode, you actually have load balancing. In the Layer 2 mode, due to the limitations of the protocols MetalLB uses, you cannot have load balancing. You just get failover. So a single machine is serving your service at any one time.

In BGP mode, if you have 100 machines, all 100 of them can be providing your service. So that is what I would suggest for anyone with serious enough hardware. And to be clear, BGP is available on relatively small-- sort of small business type systems as well. You don't have to go all the way up to the enterprise data center gear to get that.

CRAIG BOX: Is there a place for interoperability with something like Calico, where we have-- it uses BGP between the nodes on the network to exchange what the internal IP addresses are, like the pod address space that would be available to each node?

DAVID ANDERSON: Yeah, absolutely. So interoperability can be a little bit exciting, because this is sort of the flip side of-- it's great that Kubernetes is so flexible. On the other hand, Kubernetes is so flexible that it's very hard to predict what kind of environment you're going to find yourself in and how to work nicely within that environment.

So for things like Calico, which use BGP, the MetalLB website has a bunch of documentation on you have to be a little bit careful integrating the two so that they play nice together. But in general they can interoperate gracefully with each other. And we're working with the Calico folks in particular to make that integration even smoother.

ADAM GLICK: In terms of configuration and installation, where is this installed? Does it sit in front of the cluster, kind of as a proxy in front of it? Or does it go on each node?

DAVID ANDERSON: MetalLB runs entirely inside the cluster. So it's a completely self-hosted solution. You don't need to bring any other things, aside from a BGP-enabled router if you're going down that route.

So it runs one component that is cluster-wide. So it runs on every single node in your cluster. And then there is one global controller that sort of supervises them and makes address allocation decisions.

But it runs entirely within the cluster, similar to how Heptio Contour runs inside your cluster to provide ingress services. MetalLB is very similar, but for load balancers.

ADAM GLICK: Does it require any CRDs?

DAVID ANDERSON: It does not currently. The primary reason for that is that when I started developing MetalLB CRDs were not generally available in Kubernetes yet. So it has a very simple YAML configuration file that you push into the cluster. There is a bug open somewhere to convert that into CRDs. But that will take a little bit more effort to get there.

CRAIG BOX: But most of the actual state data is saved in the service object, or is an endpoint to something like that. So it doesn't actually need to persist data itself? Would that be fair?

DAVID ANDERSON: Yes, absolutely. MetalLB, aside from its configuration, is-- the components are completely stateless. So it does not need any kind of configuration store or anything. It just relies entirely on Kubernetes storing its service configuration on behalf of MetalLB.

CRAIG BOX: What was the hardest part of writing MetalLB?

DAVID ANDERSON: The hardest part is something that I'm still struggling with today, which is that it is very hard to test on bare metal, partially just because it takes a lot of effort to create clusters on bare metal. And so it's very hard to automate. And also just there is-- if you compare it to, say, Google Cloud, Google Cloud there are relatively few configurations to explore on bare metal. If you can imagine that someone has created a configuration that runs in weird and wonderful ways.

So there is a lot more variability. And it's hard to get even one single test case set up correctly. So that is the main challenge is finding a way to not use our users as test subjects.

CRAIG BOX: Would you say, though, that the options are running a virtualization tool of some sort-- be it something commercial like VMware or KVM-- and then knowing that you deploy your cluster on top of something which is then somewhat standard, you're always deploying on top of KVM, it should-- relatively speaking-- always be the same, versus running on the metal natively, but then having to deal with the inconsistencies.

DAVID ANDERSON: So that is definitely one aspect to it. As soon as you drop off the virtualization systems and onto the bare metal, you introduce a whole host of new challenges. The biggest challenge, though, is just the sheer variety of network configurations.

So do your machines-- be they VMs or bare metal-- do they have one network interface? Do they have 20? Do you have a very complex network topology? Which Kubernetes network add-on are you using? Because each one changes the behavior of machines in subtle ways, ways that don't matter to most pods, because all they care about is that I can talk to other pods and I can get to the internet. But for something like MetalLB, those little variations have huge repercussions.

CRAIG BOX: What is the community of people who are running Kubernetes without cloud like?

DAVID ANDERSON: I would say it's finding itself is the best way I could put it. It's a community of people who are very interested in running Kubernetes on bare metal, but feel that there is not very much support for it out there.

I would sort of divide them into two categories. There is the enterprise users who want to run in their own data centers for whatever reasons. They have a pretty good support environment in the form of Red Hat and those big enterprise providers.

And then there is the long tail of small companies and individual people who want to run on bare metal. They have very little support, and so are sort of left to their own devices to figure out what to plug into their cluster to make it work. So that latter category is the people that I'm most interested in, because I'm just one person. I definitely cannot compete with Red Hat and OpenShift in terms of providing service to big enterprise customers.

ADAM GLICK: The docs say that MetalLB is in beta right now. What does that mean for people who want to run in production?

DAVID ANDERSON: That means you should be a little bit careful. With that said, when I first launched it I market it as alpha. And within two days someone reported that they were running it in production with 200 machines, and they were very happy.

So it has, in fact, been running in production for dozens of people for a long time now. The beta is mostly a mechanism to protect myself from people having too high expectations.

What beta means to me is that things like the configuration file format might still change. There might be some new features coming down the pipeline. I don't consider it done yet. But in terms of stability, it should be very mature at this point.

ADAM GLICK: Speaking of new features, what's next for MetalLB?

DAVID ANDERSON: What is next is a combination of more of the same and some new nebulous things. More of the same meaning more network protocols. There are dozens more network protocols out there that a few people here and there really would like to use.


DAVID ANDERSON: OSPF is the number one request. Yes. There is OSPF. There's also IPv6 support, which is somewhat incomplete at this point.

CRAIG BOX: Is there much you can do with IPv6 while Kubernetes doesn't natively support it?

DAVID ANDERSON: So Kubernetes itself does support IPv6 natively if you run an IPv6 only cluster. So you have to choose currently between IPv4 only or IPv6 only. And so in practice everyone chooses IPv4 only, because most people running Kubernetes are trying to sell something to people. And you want to sell to as many people as possible. And today that means IPv4.

There is work in the pipeline to make Kubernetes truly dual stack. So we really want MetalLB to be ready as soon as that lands to support that.

And then the other sort of more out there ideas are supporting better load balancing, because right now there are limitations. Using standard routing protocols imposes some limitations in the quality of the load balancing that you get. There are projects such as Katran-- I think that's from Facebook-- that implements a second layer of load balancing to eliminate those downsides.

It would be fantastic to have MetalLB just automatically use those additional load balancers to improve the experience. Those are more nebulous features than just more protocols. But that's the direction I would like to go is to continue to keep MetalLB simple to use, but make it better without much investment.

ADAM GLICK: If people want to help contribute to the project, where can they go?

DAVID ANDERSON: Off the website there is a link to GitHub. There are a whole bunch of issues open. I try to keep those tagged appropriately for you-- good first issues, bugs, enhancements, and so forth. So that's a very good place to go.

Otherwise we are on Slack. So the Kubernetes Slack has a MetalLB channel. So drop in there and come talk about what you would like to do. And we can help you get oriented.

CRAIG BOX: To say for a project predominately driven by one person, being yourself, it's very well-documented and seems very easy to get into for beginners.

DAVID ANDERSON: Thank you. That is one of my personal hobby horses that I like things being well-documented. It's still not as good as I would like. But I have made an effort, yes.

CRAIG BOX: Well, if you do want to find out more about MetalLB, you can go to MetalLB.Universe.TF, which stands for TensorFlow. And find Dave on Twitter @Dave_UniverseTF.

I believe we'll also be able to find you on the floor at KubeCon Seattle?

DAVID ANDERSON: That's the plan, yes. I'm not speaking at KubeCon. But I will be wandering around, and probably I will be at the Google booth and just generally walking around. So come find me there.

CRAIG BOX: Brilliant. So thank you very much for talking to us today, Dave.

DAVID ANDERSON: Thank you for having me on.

CRAIG BOX: Thank you, as always, for listening. If you enjoyed the show, we love it when you help spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @Kubernetespod or reach us by email at kubernetespodcast@google.com.

ADAM GLICK: You can also check us out on our website and catch our transcripts at kubernetespodcast.com. Until next time, take care.

CRAIG BOX: See you next week.