#133 January 12, 2021
Thomas Graf is the inventor of Cilium and the co-founder of Isovalent. Cilium is a container networking plugin built on top of eBPF, bringing modern SDN technologies to accelerate your pods. Adam and Craig also discuss the many uses of Christmas trees.
Do you have something cool to share? Some questions? Let us know:
ADAM GLICK: Hi, and welcome to the Kubernetes Podcast from Google. I'm Adam Glick.
CRAIG BOX: And I'm Craig Box.
ADAM GLICK: Welcome back, Craig.
CRAIG BOX: Thank you, Adam. Happy new year.
ADAM GLICK: Happy new year to you as well. It is great to be kicking off another year. And as we've put 2020 in the rearview mirror, any parts of the holidays that you're leaving behind?
CRAIG BOX: Longtime listeners of the show may remember that every now and then in my walks around the neighborhood in the June, July time frame, I would find a Christmas tree just sitting on the side of the road, left unloved for many months, all the color drained out, and so on. And this now must be the time of year where those are born, those are left out.
We have a two-week window in January which the council will come around and collect Christmas trees. And for whatever reason, our date was right at the beginning of that. So the 5th of January is the date in which everyone was supposed to put their Christmas tree out.
That feels a little bit too early if you want to keep your tree up. I know that a lot of the English heritage properties, for example, have decided that the traditional way was to have it up all through January. And good on them if they have enough space to make that happen. But there have been giant piles of Christmas trees around the neighborhood.
ADAM GLICK: Really?
CRAIG BOX: And no sooner were they there than they were gone. But there are a few stragglers, my own included, I should point out. I left it up a little bit longer, and it's outside. It might be collected. Or maybe June, July, August, maybe it'll still be out there.
ADAM GLICK: Well, if you're looking for something else that you can do with the tree, I've seen a number of interesting possibilities out there. I'll throw one in the show notes about a new thing that I'd heard about this year, which is eating your old Christmas tree.
CRAIG BOX: Oh, yes?
ADAM GLICK: So apparently if you want to take the needles off of it and you want to prepare the tree, there are ways that you can bake and work your Christmas tree into your new year's dining. So if you want to try something new and different, that's probably something most of us-- at least, for me, I've never tried eating a Christmas tree.
CRAIG BOX: You could take the trunk and whittle something out of it.
ADAM GLICK: [CHUCKLES] Indeed. It is interesting to see the different trees that you see out in the neighborhood. I've seen some very, very nice trees and some that look like they've been worked over a little bit. I think some people have some larger ornaments than others.
CRAIG BOX: I hear that the more common choice in America is the artificial Christmas tree. You don't see many of them out on the side of the road.
ADAM GLICK: Nope. There's a large number of articles about which is more environmentally friendly, the artificial tree or a regular, real tree. But that, of course, comes down to how many years you reuse it for. But about 80% of the Christmas trees used in America are actually artificial, from what I was reading. So America has made that shift.
CRAIG BOX: I have seen a couple of them outside in pots. I had friends in New Zealand who would keep one in the backyard for the rest of the year and bring it inside at Christmas. But I do think the natural wish of a tree is to continue to grow. And unless it's some sort of Christmas bonsai, I don't know how you'll be able to keep it at the same height year after year.
ADAM GLICK: Well, you could just keep cutting off the top, just reusing it.
CRAIG BOX: You have to reshape it. Should we get to the news?
ADAM GLICK: Let's get to the news.
ADAM GLICK: Google Cloud has reiterated its support for the CNCF with a $3 million grant to fund the running of the Kubernetes infrastructure. These credits will allow Kubernetes to deal with the 2,300 monthly pull requests that trigger 400,000 integration test runs using 300,000 compute core hours per year. This continues Google's support for the Kubernetes infrastructure, having previously given $9 million of credits over three years.
CRAIG BOX: AWS has announced Managed Prometheus and Grafana services in partnership with Grafana Labs. The Prometheus service uses the Cortex project developed at Weaveworks and now shepherded by Grafana Labs and will be billed per metric and by query time when out of preview. The Grafana service is charged by active users and comes with an option to upgrade to Grafana enterprise.
ADAM GLICK: Container security companies continue to be hot acquisition targets. This week, Red Hat kicked off the new year with the announcement of the intention to acquire StackRox, a Bay Area Kubernetes security company with around 60 employees. This follows the acquisition of Twistlock by Palo Alto Networks, Cisco's acquisition of Portshift, and VMware's purchase of Octarine all last year.
Red Hat stated that once the acquisition is complete, they plan to open source the StackRox technology. No price was disclosed, but analysts expect it to be over $100 million given that StackRox had raised 65 million in several funding rounds. Red Hat also recently announced the general availability of Windows Server nodes and containers in OpenShift 4.6
CRAIG BOX: The CNCF has released their annual report for 2020. In the five years since its inception, the CNCF has grown to manage over 70 projects with over 103,000 contributors in 177 different countries. Membership grew by 150 organizations to 519 in total, while the end user community added 145 new member companies.
Event attendance was up by 90% with the move to online events for both KubeCon EU and North America. And five projects graduated, bringing the total to 14. Congratulations to the CNCF and to our whole community on a fantastic year.
ADAM GLICK: The KubeCon North America transparency report has been released. Key details include almost 16,000 virtual attendees, of which 2/3 were first-timers. The reach expanded, with people joining from 137 countries across six continents-- still no one from Antarctica, we note. The four-day event had 213 sessions, with 52% of keynote speakers identifying as women or gender nonconforming, and 33% of attendees participating in one or more project office hours. The next KubeCon EU will also be virtual this May, while the next KubeCon North America is currently planned to be a hybrid event with both online and in-person elements in October of this year.
CRAIG BOX: Rancher has announced Harvester, a new project to manage hyper-converged infrastructure on bare metal with no Kubernetes knowledge needed. The project, in alpha, targets the vSphere or Nutanix user who still thinks in disk images and VLANs rather than containers. Under the hood, it combines Kubernetes and projects like KubeVert for VMs and Longhorn for storage. An ISO install is available for bare metal. Or, if you already have some Kubernetes, a Helm chart has been published as well.
ADAM GLICK: The Kubernetes blog has continued its traditional series of posts diving deep into new features of a release with several posts about 1.20. The new blogs cover pod impersonation and short-lived volumes in the Container Storage Interface, the general availability of third-party device metrics, and more granular control of storage volume permissions changes.
CRAIG BOX: The Sonobuoy project from VMware has released version 0.2. Commonly used to validate the compliance of Kubernetes clusters, Sonobuoy has announced its intention to become a more generic cluster testing tool and has been working on architectural changes, as well as planning Windows support. Meanwhile, VMware's Ingress project, Contour, announced the results of a security audit, with vendor Cure53 calling it one of the most mature and well-structured projects they have encountered in their CNCF-sponsored work.
ADAM GLICK: Lyft has announced Pulse, a statistics gathering solution for Envoy Mobile. Pulse lets you export metrics from a mobile app like you might do for a microservice, allowing you to monitor things like taps on a button or app crashes. The time series data can then be uploaded to your Prometheus environment using their StatsD-based metrics collector.
CRAIG BOX: Multicluster control plane Crossplane has hit version 1.0. Developed by Upbound, the authors of Rook, Crossplane integrates provider solutions for managing resources on top of Kubernetes or falls back to Terraform if those aren't available. 1.0 comes with stability features, such as v1 APIs, leader election, and Prometheus metrics for all binaries.
ADAM GLICK: Dell Technologies has introduced Project Karavi, a tool for extending support for their storage technologies past the basic operations provided by the Container Storage Interface. Karavi, Greek for "island," launches with observability features and aims to grow to include enterprise table stakes, like encryption and replication.
CRAIG BOX: If you want to run your own self-managed Kubernetes clusters on Microsoft Azure, the newly announced Cluster API Provider for Azure, or CAPZ, it is your tool. It lets you use the Kubernetes cluster API to create and manage machines, which will be deployed on Azure's IaaS. Microsoft has said that it is ready to replace the AKS Engine tool previously recommended for this task and suggests you don't look too closely at the version numbers.
ADAM GLICK: The latest project journey from the CNCF has been released, this time for Vitess, a cloud-native database system. Originally an internal YouTube product to handle scaling MySQL for massive amounts of storage, Vitess helps deploy, scale, and manage large SQL clusters. Since joining the CNCF, the project has had a 40% increase in contributing companies, a 78% increase in individual contributors, and an 114% increase in documentation commits. The project was donated to the CNCF in February 2018, and you can learn more about it by listening to Episode 81.
CRAIG BOX: VMware Tanzu GemFire was formerly Pivotal Cloud Cache built on Pivotal GemFire, which is based on Apache Geode. Confused yet? If you like the sound of an in-memory database for fast response and high consistency adapted to the ephemeral nature of Kubernetes, then GemFire might be the thing for you. It's now packaged up with Helm and ready to run and supported on your Tanzu Kubernetes Grid.
ADAM GLICK: Finally, a Kubernetes Security Essentials course is now available from the Linux Foundation for people interested in studying for the Certified Kubernetes Security Specialist exam. The course is a 30-hour, self-paced training covering security concerns for cloud production environments, including hardening systems and clusters, securing the container supply chain, monitoring events, logging security actions, and more.
CRAIG BOX: And that's the news.
ADAM GLICK: Thomas Graf is the co-founder of Cilium and the CTO and co-founder of Isovalent, the company behind Cilium. Before that, Thomas worked at Red Hat as a Linux kernel developer on networking, security, and eBPF. Welcome to the show, Thomas.
THOMAS GRAF: Thank you very much. It's awesome to be on.
CRAIG BOX: If I read this right, you've been working on Linux networking since 1998. How many times has it been reinvented since then?
THOMAS GRAF: That's an excellent question. As part of my career, probably three or four times. Like when I joined, ipchains was just about to be replaced by iptables. It was the very early days for software networking, but it was already exciting. Back then, it had felt like a revolution. And now we have been revolutionizing that a couple of times at this point.
CRAIG BOX: So for the younger-school Unix nerds out there, describe, perhaps, ipchains and iptables and how a packet traverses through the networking stack on Linux at a very high level.
THOMAS GRAF: It was the days where firewalls, which were typically segment networks, like who can talk to whom, would be implemented in hardware, like, using hardware firewalls. You would buy a box to put a firewall in place. And us as the Linux community, we figured, oh, we can do this as well. We can make a Linux box be a hardware firewall or act as a firewall so I can build my firewall at home. And that was the starting point for ipchains. So you could use a Linux box and have firewall rules, who can talk to who on the network.
ADAM GLICK: Is this the first virtual network device?
THOMAS GRAF: I think those were really the early days of software-defined networking and software-based networking.
CRAIG BOX: So what's the difference between the chain and the table?
THOMAS GRAF: "Chain," I think, explains it really well. It's a chain of rules. I think for those of you who have been configuring a hardware firewall, it's a huge, long list of sequential rules, who is allowed to talk to who else. ipchains was a way to create the same set of rules in Linux, basically one rule set.
It quickly became clear that it's very difficult to express complex rule sets with just one list of rules. So tables were created, where you could jump around and have multiple tables and have multiple rule sets-- on a very high level, the difference between ipchains and iptables.
CRAIG BOX: So a table contains chains?
THOMAS GRAF: Correct. Yes, exactly.
CRAIG BOX: That was state of the art in Linux for, perhaps, 15, 20 years?
THOMAS GRAF: I think iptables is about 20 years old at this point, yes.
CRAIG BOX: So what didn't it do?
THOMAS GRAF: Fundamentally, iptables, it's built based in a hardware- and device-based world, right? It's doing exactly the same as a hardware firewall, just in software. And that has a lot of consequences.
The biggest one is that processing a linear list of rules is easy for hardware to do efficiently. It's incredibly hard for software to do efficiently. This basic fundamental design flaw that was baked in early is really limiting further use of iptables in its form.
And the main difference there is that when iptables was created, we were still using dial-up modems. So network speeds were very, very different from what we see today. We're pushing millions of packets per second now. I'm pushing a couple of thousands per packet at most back then.
CRAIG BOX: Can't we make the network hardware in our Linux box do this for us?
THOMAS GRAF: We could, and there are people doing this. Like we are seeing SmartNICs evolve that can do this. But in the end, I think performance is one angle. And you will always be a little bit faster in hardware if you know what you're doing.
I think the really appealing side of any software-based and any software-defined aspect is that you have full flexibility and programmability. You don't necessarily need to know the building blocks that you're going to need to solve a problem. You can solve future problems without knowing about them when you build the framework. That's the foundation of why programmability in general is so successful and so powerful and why software is taking over, not just in networking, why software is taking over so many hardware-dominated fields.
ADAM GLICK: And an iptable is basically a lookup table. Is that a fair analogy?
THOMAS GRAF: I will call it a list of rules that are processed for each packet that is traversing through the Linux kernel, where each packet has to look at every rule and see if that rule applies and, if so, execute an action.
ADAM GLICK: Does it literally do a sequential pass through each of the rule's seat? It's like a giant, embedded set of if-statements, so to speak?
THOMAS GRAF: Exactly. If you look at the typical firewall rule set, it will be, is this going to port 22? If so, then accept. Is this going to port 80? If so, then accept. That's a basic rule. Over the years, we have been able to create a fair amount of more sophisticated rules that are a little bit better. But principally, it all comes down to that type of processing.
CRAIG BOX: Does this sound like something that we could improve if we had a programming language that we could apply to the situation?
THOMAS GRAF: Exactly. That's the perfect transition into eBPF or programmable networking.
CRAIG BOX: So when did the "BPF" come into the picture?
THOMAS GRAF: Strictly speaking, BPF is very old. It's older than I am. BPF has been defined and created by the founders of the internet. It was designed as a packet filter very, very early on to say, if I'm only interested in seeing or processing a subset of the network packets, I need some language to express this filter. That was BPF, Berkeley Packet Filter. What we're talking about now is eBPF, which is significantly advanced compared to the roots of BPF.
ADAM GLICK: What has changed? What does the e stand for, and what is different about it?
THOMAS GRAF: The difference is the e stands for "extended BPF." It is still a programming language, and it is possible to express every BPF program in eBPF. eBPF is fundamentally more powerful. It can interact with the Linux kernel, so it can actually call into a Linux kernel functionality.
It is 64-bit. That's a big step. It has many more eBPF program types. So BPF was really at the socket level. And it was used, for example, for tcpdump.
EBPF programs can be attached to trace points, networking. There are eBPF-based LSMs. eBPF can be used to trace user space applications using uprobes. It's like comparing a very early programming language with C++. Even C++ will have some roots in an older language. But from a skill perspective, it's fundamentally different.
I think, from that perspective, eBPF is exactly the same. It is making a system, the Linux kernel, which is very hard to change, makes that programmable and allows for innovation without requiring to change kernel source code.
ADAM GLICK: So with this change, are we asking people to update what they do and stop writing iptables and start writing things in eBPF? Or is this something that augments it or uses it in certain scenarios?
THOMAS GRAF: It's both. You can, for example, use eBPF to implement iptables' intent if you want to. That has one big downside, though, because you are not fundamentally changing the assumption that the user will be writing a sequential list of rules. Yes, you can use eBPF to implement a sequential list of rules, but that's not going to be the most efficient manner.
What we see is, instead of just reimplementing iptables, we are seeing more efficient ways found to implement user intent, such as Kubernetes resources, services, network policy, and so on. So instead of assuming that you're starting at a sequential list of rules, such as with iptables, you are tying eBPF directly to higher-level abstractions, such as Kubernetes.
And I also want to mention eBPF is not networking only. We are seeing eBPF revolutionize tracing, profiling. There's an excellent Linux security module based on eBPF that is currently changing how runtime security is being done. There is seccomp that is being used by many, many, many users. eBPF is changing how kernel development is being done. It's not specific to iptables.
CRAIG BOX: We spoke to Leonardo Di Donato in Episode 91 who works on Falco, which uses eBPF for observability. Is there anything it can't do? Are we at the point where we stop development on the outside kernel and do all the new work in eBPF?
THOMAS GRAF: There's a lot of discussion around, is eBPF more secure than native kernel development? A lot of people have the assumption that a kernel-level developer will write perfect code, and if a piece of kernel code is merged into the Linux kernel, it's bug-free. That's, I think, the assumption that is out there. That's not true.
CRAIG BOX: No?
THOMAS GRAF: No way, right?
CRAIG BOX: Don't say it. I feel heartbroken.
THOMAS GRAF: I've been doing kernel development for long enough that I can say that, but it's not true. So I think there is discussion about if something can be done in eBPF instead of native code, and it can be done at the same efficiency, it is more secure. And that will benefit everybody. Getting to the question, is everything possible? Not quite.
I think there are limitations to eBPF because it's a validated and verified language. So you can't load arbitrary eBPF code into the Linux kernel. There are limitations. For example, an eBPF program must be guaranteed to run to completion. It can't, for example, just spin forever because that would deadlock the entire machine.
THOMAS GRAF: You can write in BPF, but it's not what almost everybody does. eBPF has been invented by Linux kernel developers. So yes, we have a higher-level language.
CRAIG BOX: Is it C?
THOMAS GRAF: It's C, yeah. That's a very high-level language.
CRAIG BOX: I knew it.
THOMAS GRAF: eBPF is not a general purpose programming language. It's specific to the Linux kernel, and it allows it to interact with the Linux kernel. So you can treat the kernel as an SDK. The existing kernel functionality that is available can be used by eBPF programs. That's a main difference to, let's say, WASM or other programming languages.
CRAIG BOX: So why isn't it called KernelScript?
THOMAS GRAF: I think it would actually be a good name. I wouldn't be opposed to that at all. One of the requirements for eBPF to be merged into the Linux kernel, I think around 2014, was that it has to replace BPF because the kernel community was not willing to maintain two virtual machines. It was specifically written to replace BPF. That's why it has the name eBPF as well.
CRAIG BOX: eBPF has got to the point where it supports its own conference. The eBPF Summit 2020 was held back in October. What kind of things are discussed at an eBPF conference?
THOMAS GRAF: It was an awesome event. It was the first time we held it. And we had an awesome community presence. We had talks on all levels-- like, awesome introduction-level talks from people like Liz Rice, Brendan Gregg-- all the way into deep types of Linux kernel eBPF verifier topics by the eBPF maintainers. It had a great span of topics for everybody.
We were anticipating a couple of hundred people to show up. And then we were completely overrun, and over 2,000 people signed up. And we kind of last minute made sure that we could actually host the whole crowd. So it was a fantastic event.
ADAM GLICK: You're also the creator of the Cilium project. What is Cilium?
THOMAS GRAF: To keep it quick, Cilium is a CNI plugin, which means it will help you connect Kubernetes pods. It will also enforce network policy to do security. And it will give you the visibility that you can troubleshoot your networking layer. So you can, for example, answer the question, is it DNS?
CRAIG BOX: But the answer is yes.
THOMAS GRAF: Yes.
CRAIG BOX: It's like the opposite of Betteridge's law of headlines.
THOMAS GRAF: It's always DNS. But in the corner case of when it's not DNS, we can also answer that question.
CRAIG BOX: And it's fair to say Cilium is built on top of eBPF?
THOMAS GRAF: Foundationally different to pretty much everything that's out there is that eBPF has been the reason why we created Cilium. So we haven't created Cilium and then saw, Oh, we also need to support eBPF. Cilium has been designed from scratch to leverage eBPF.
ADAM GLICK: Where did you get the idea for Cilium?
THOMAS GRAF: It's a long story. I'm trying to keep it short, though. I was part of the software-defined networking movement as well. I worked in Open vSwitch, which was the most successful SDN open source project.
I and others saw a ton of value in what SDN brought to the open source community. But it felt like it only fulfilled half its potential that the shift offered. It brought a lot of functionality that was previously done in networking on hardware to software. But it still preserved this device- and machine-based model, which made sense during the virtualization age, where a virtual machine is still a machine.
Today, we're talking containers and services. And a container is not a machine. A container is not a device. Open vSwitch and other SDNs, they're still treating things they connect as devices and machines.
And we felt that that's fundamentally wrong. eBPF offers us the ability to go much further and actually treat applications the way they want to be treated, as applications, and then connect them. It's actually quite similar to how service mesh looks at this same principle as well.
CRAIG BOX: Is the internet at least still a series of tubes?
THOMAS GRAF: Absolutely. By the way, we're still doing, for example, BGP. So some of the internet's roots are still very present even in the most cloud-native-specific worlds.
ADAM GLICK: Can you explain how BGP, the Border Gateway Protocol, interacts with eBPF?
THOMAS GRAF: One of the most powerful aspects of eBPF is that it can interact with the Linux kernel natively, which means Linux can already do BGP. So we didn't have to reinvent BGP or even add support for BGP. Cilium can interact with BGP using standard open source software such as BIRD or FRR.
For those of you who are not familiar with BGP at all, BGP is basically responsible for distributing routes in a network. How can I reach somebody? And BGP will make sure that this information is distributed across an entire network. So this is typically something that you need less of if you're in the cloud and something that you need more of if you maintain your own data center or if you operate the entire internet, of course.
CRAIG BOX: Is eBPF something that people who use Cilium should ever think about, or is it simply just an implementation detail that happens to make it a really fast way to do networking?
THOMAS GRAF: It's absolutely an implementation detail. So in order to use Cilium or any of the other eBPF-based projects, you don't need to understand eBPF at all. At the same time, eBPF is the enabler and being seen as something very powerful, an incredible shift. I think this is why there's so much interest in learning and understanding eBPF.
In the end, eBPF is a kernel-level technology. It's meant to be consumed by kernel developers. So there's no need to ever really understand it to benefit from Cilium or, let's say, BCC or bpftrace or the other eBPF projects.
ADAM GLICK: Was Cilium a company first or an idea first?
THOMAS GRAF: Definitely an idea first. I think in the beginning, we didn't even think about a company at all. We started the company about one year later.
Cilium as a project spawned around the same time as Docker became more popular, when containers started to come around. The paper for Borg just came out earlier as well. So it was very clear what will come down the line, and it was very clear that the existing projects that existed in the open source world would not be great fits. So it felt very natural just to start a project.
We literally open sourced it from the first commit. So we didn't write something for a year and then open source it. We started from scratch using eBPF and used it to its full extent completely open. And then about one year later, it was very clear that, oh, a lot of users are catching on. There's enough traction on this. Why don't we start a company around it?
CRAIG BOX: Earlier, you described Cilium as a CNI, which is a relatively new concept in Kubernetes terms, a plugin which lets you define a network interface for a container. How did Cilium start out, given that CNI didn't exist at the time?
THOMAS GRAF: We were starting out as a Mesos and lib-network/Docker plugin before Kubernetes existed. So that was the early days. And unfortunately, all the design decisions that we made were not specific to any of these environments. They were made on the assumption that containers would be run and that eventually those containers would get connected to more legacy workloads, like VMs and metal machines.
CRAIG BOX: What needs to happen to those external machines to be able to connect them to the environment running in the containers?
THOMAS GRAF: This is a relatively recent feature. So we have just recently announced this in our latest release. We have added the ability for Cilium to not only run as a CNI in Kubernetes, but to actually run it in agent mode inside of a VM.
So for example, you can host VMs in a cloud provider setting and then run Cilium on the VM. And it will connect to the mesh or to the Cilium network space and make the VMs available in the Kubernetes cluster very similar to if that VM would run as a pod. So it will have connectivity to other pods running in the Kubernetes cluster.
You can define network policy rules. You will have visibility, and so on. All the network values that we provide in Kubernetes will then basically expand to VMs or, of course, metal machines running as a Cilium agent.
ADAM GLICK: Some of the benefits that you mentioned around monitoring and providing policy and control for network sound a lot like the benefits that we hear about from service meshes. Is this something that is consumed and used by service meshes, or is it a different technology that is providing some of the same functionality?
THOMAS GRAF: I think that was just a meeting recently where some of the Istio folks on extracting the visibility that Hubble has-- Hubble is the visibility layer of Cilium-- and feed that into Istio metrics. I think what's wonderful is that's true for all the service mesh projects that are out there are basically operating on very similar assumptions as Cilium, like treating applications as applications and not as machines, focusing on visibility.
Sometimes this is called tracing. Sometimes this is called flow visibility. There's different names, but it all comes down to a huge desire to provide deep visibility of what's going on at the connectivity layer. And whether we call that networking or service mesh doesn't actually matter that much. There is a desire to understand what's going on.
CRAIG BOX: You published a blog post in the past about accelerating Envoy and getting packets into and out of service mesh quicker using Cilium. How does that work?
THOMAS GRAF: Service mesh as it is done today requires a sidecar proxy to be deployed inside of a Kubernetes pod in order for the service mesh to gain control and visibility. Typically, that is Envoy. And in order for Envoy, which is running in user space-- so it operates on a socket basis, so it opens sockets and communicates via sockets.
In order to get Envoy into the picture, there needs to be some redirection that's happening on the network level, so that all the communication in and out of a pod actually flows through this Envoy proxy. And this means that for a typical connection that goes from a to b that will typically traverse two TCP stacks, one on the sending side, one on the receiving node, this means that you now have two traversals on each side, so in total, four. So you went from two to four traversals of the TCP stack.
To some extent, that's kind of the easiest way to solve this problem. At the same time, the sidecar proxy is always guaranteed to run on the same node as the application pod. And if you're running on the same node, you will have no loss at the network level. So why are you using TCP and IP and all of those complicated network protocols? So what Cilium can do is basically shortcut this and directly connect the sockets of the application with the sidecar proxy, basically taking out the entire complexity of network protocols when an application pod talks to a sidecar proxy.
CRAIG BOX: So taking that to its logical extreme, why isn't Envoy implemented as an eBPF program?
THOMAS GRAF: Maybe it will. I think there are a lot of people that would like us to do this. To some extent, we already have, for example, deep integration with Envoy where some of the Envoy functionality, such as extracting some of the visibility that only the kernel has, we do that in eBPF and give that to Envoy. So I think the two worlds are already colliding and being merged together.
It would be too easy to just say, we can just take Envoy and run that as eBPF. It's a little bit more complex. But I think over the next months and years, we'll see that Envoy and eBPF will be seen as a logical combination. Whatever is the best technology, whatever is the best tool at hand, will be used to implement whatever is needed.
ADAM GLICK: Are you or someone else in the Cilium project working with the folks in Envoy to try and come up with that resolution? Or are you letting the user space and the people out there make the decisions on which is the best way to move forward for them?
THOMAS GRAF: We are actually regular contributors to Envoy ourselves. We've spoken at EnvoyCon several times. There's lots of collaboration going on both ways. There's currently a KEP in the Kubernetes repo around moving kube-proxy to xDS, which is the protocol that Envoy pretty much shaped. And then, OK, let's implement xDS with Envoy and eBPF.
So I think there is a lot of movement into that direction. There is not always a lot of overlap with the people that write eBPF Cilium. Typically, kernel developers and Envoy C++ programmers, sometimes a little bit of bridging is needed because just the skill set is so much different.
CRAIG BOX: You haven't even gotten any Go in there.
THOMAS GRAF: The Cilium agent is written in Go. So that adds the third language. There's something nice to add there. We actually wrote a Go extension framework for Envoy where you can write Go programs to parse protocols that run as part of Envoy. So you can literally take a Go protocol parser that exists, such as the HDP parser, and run that as part of the Envoy binary to parse protocols.
CRAIG BOX: And you can put a chicken inside a duck inside a turkey.
THOMAS GRAF: Exactly.
CRAIG BOX: Now, we spoke to Antonin Bas from Project Antrea, which is another CNI plugin based on Open vSwitch, which you said before you were working on. We don't want to start a feud, but he said that eBPF was out of contention for them because it didn't run on Windows. What do you think about that?
THOMAS GRAF: Absolutely right. Right now, eBPF has not been ported to Windows. There is, again, lots of discussion. And I think it will happen. The question is when.
There was a tweet a couple of months back on Microsoft working on a eBPF-based tracing solution. So there is definitely something eBPF-related going on inside Microsoft. But right now, eBPF is not available on Windows.
CRAIG BOX: If we go back to our rebranding before and think of eBPF as a KernelScript, is there a Windows equivalent to that?
THOMAS GRAF: I'm not very familiar with the Windows kernel. In general, there are ways of dynamically loading extensions. I don't think there is anything as equivalent as eBPF is right now.
ADAM GLICK: With Cilium, you've announced a number of partnerships. Who are some of the folks that you're partnering with, and what are the use cases and the benefits that they're looking for as they use and implement Cilium?
THOMAS GRAF: Yeah, we work with an amazing set of customers and users doing a variety of things. I think some are looking at Cilium as the most scalable, most advanced CNI. The hyperscalers out there that are running thousands and thousands of services and they see eBPF replacing iptables-based kube-proxy and making Kubernetes service scalable.
We are seeing users-- for example, Palantir spoke at eBPF summit about how they use Cilium for very extensive network security profiling and filtering. DataDock spoke about how they leverage Cilium to do multi-cloud. They spoke about looking at eBPF and Cilium to do edge-based load balancing. DataDock is using Cilium as a CNI.
So I think we have that side of the world. But we also have the typical financials. Capital One, for example, spoke at eBPF how they use Cilium and eBPF, not just only Cilium for eBPF. They actually also do other eBPF-related things. So I think we have a fascinating set of users.
CRAIG BOX: Now, back in August, Google announced that Cilium was underpinning a new dataplane for GKE. What does Cilium bring above and beyond version 1 of the GKE dataplane?
THOMAS GRAF: Google is one of the cloud providers using Cilium. And I think, for Google, the major aspect that eBPF and Cilium bring to the table is visibility. The aspect of troubleshooting and providing visibility, in particular for network policy, but also just network troubleshooting, the type of visibility that eBPF can bring and that is then leveraged by Cilium is significantly more advanced compared to anything that's out there.
To make a simple example, in the iptables world, you're typically operating on the scope of byte counters and packet counters. You can count how many times a rule was hit. With eBPF, we have full programmability.
For example, if we drop a packet for policy reasons, we can expose and write as much metadata as we want, for example, feed that into our lock stash or into some lock files, and so on. I think that's a fundamentally more advanced system. I think that was one of the reasons that made it an easy pick for Google to bet on Cilium.
The second one is, with more and more traditional enterprises deploying Kubernetes, the requirements on the networking side are increasing rapidly. And eBPF, because of its programmability, allows us to innovate, which means instead of going back and doing kernel changes, we can write eBPF programs to solve the very unique new problems that currently appear in the cloud-native space as more and more enterprises come in.
ADAM GLICK: You mentioned Hubble before, a name that seems both an allusion to the Space Telescope and thus into observability and being able to have visibility. What is Hubble, and how does it relate to Cilium?
THOMAS GRAF: Hubble is the observability layer of Cilium. So it runs on top of Cilium and uses all the powers of eBPF to extract visibility. So it, for example, provides network-based metrics. It will, for example, tell you which pods are currently experiencing DNS failures, or how many packets have been forwarded, how many policy violations have happened.
You can export flow logs. It will track everything that's going on on your network. It has extensive aggregation, so you can choose how much data you want. There is a CLI where you can query what has been going on, very similar to a tcpdump-style interface where you can see everything on the network layer, but not on the level of an interface or a machine, but an entire Kubernetes cluster.
So you can come in and ask the question, in my entire cluster, how many drops have been happening in the last five minutes? Show me the drops that have happened. Show me all the DNS requests that have happened from this pod, and so on. So it's trying to give the visibility that is needed from a networking perspective.
CRAIG BOX: Does it have a page where you can answer the question, is it DNS, that just always just says yes?
THOMAS GRAF: It doesn't always say yes, but we literally have a dashboard that basically answers exactly this question.
CRAIG BOX: Cilium 1.9 was released in November. What's new in that release?
THOMAS GRAF: The biggest feature we added is Maglev support. And for the networking geeks that will say, yeah, finally, it happened, for everybody else, what is Maglev? Maglev is the ability to do consistent hashing when doing load balancing.
So if you run a load balancer and you run that in front of a service, you depend on that load balancer to steer packets to a number of replicas sitting in the back. But what happens if the load balancer fails or dies? Typically, you would expect all the connections that have been handled by the load balancer to be dropped as well.
With Maglev and consistent hashing, one load balancer in a set of load balancers will always make the same decisions as all the other load balancers, which means if one of the load balancers fail, one of the other ones can take over. This unlocks the ability to use Cilium as a load balancing layer into your Kubernetes cluster and then only relies on something as simple as ECMP to actually get the package to the boxes running Cilium. So this is allowing Cilium to become, not only the CNI, but the load balancing layer as well.
ADAM GLICK: You help found a company called Isovalent. What inspired the creation of a new company?
THOMAS GRAF: I didn't actually intend to found a company when we created the Cilium project. I'm not a typical Silicon Valley software engineer either. I have never worked in Silicon Valley before starting the company. So starting a company felt quite remote to me in the beginning. I didn't even have an idea how to do it.
But then with the success of Cilium and seeing how quickly it got adopted, I was approached. Like, why don't you found a company? All the potential is there. And this was a time when I joined forces with my now-co-founder, Dan Wendlandt, who has startup experience. So he was employee number five at the company called Nicira, which was acquired by VMware.
He was looking to start a company, and he found Cilium and saw, OK, this is exactly what I wanted to do. Oh, crap, Thomas is actually doing this already. So we sat together, and let's just found a company together. That's a logical choice. That's how we started the company.
ADAM GLICK: Did you consider naming the company Cilium?
THOMAS GRAF: Yes. The reason why it's not called Cilium is because, around the same time, Docker as a company was struggling with the fact that the company was called the same as the open source project, like Docker the company, Docker the open source project. And based on that experience, it felt risky to call the company the same as the open source project. I think these days, we might look at the situation slightly differently. But it was definitely a very safe call based on what happened to Docker back then.
ADAM GLICK: These days, would it be Cilium Labs?
THOMAS GRAF: We actually considered calling it "Silly Corp" at some point. [CHUCKLES]
ADAM GLICK: How did you choose the names?
THOMAS GRAF: It had to sound interesting and it had to be unique. And we wanted to have some meaning to it. So cilia, kind of the hairs in your ears, it reminded us of microservices being connected together, because you will have many of them. And they're connected below the surface. So Cilium being below the surface and connecting everything together and creating this large mesh made a lot of sense. And we liked the name Cilium.
CRAIG BOX: And what's an Isovalent?
THOMAS GRAF: We liked the chemical property isovalent. There's a bit of a story here. We actually were called Covalent in the very beginning because Covalent bonds make up diamonds. And we are a networking and a network security company, and diamond is something that represents strong security, right?
So Covalent was our first name. But then for reasons for trademark, we couldn't use Covalent. So we had to rename, and Isovalent was close enough and still had a chemical property that we liked. So we renamed ourselves to Isovalent.
CRAIG BOX: Cilium sounds like it should be an element, but it isn't.
THOMAS GRAF: It is not.
CRAIG BOX: You recently took a $29 million investment led by Google and Andreessen Horowitz. What will you spend it on?
THOMAS GRAF: We're not going to change what we're doing on the Cilium side. We will continue what we have been doing on the open source and engineering side. What we will change is we will build up sales and marketing. We have been completely in stealth mode until recently. So we have not been talking about what we are doing as a company.
That is changing now. We have seen enough success that allowed us to make the decision that we're now more public about what we are doing as a company. So with that, we also launched Cilium Enterprise, the enterprise distribution of the open source project.
CRAIG BOX: So 22 years on, do you think you've finally fixed networking?
THOMAS GRAF: I'm definitely hoping that Cilium will be around forever. Reality will likely be that in 22 years, we'll probably have something very, very different. And I hope to be working on that exciting new topic in 22 years.
ADAM GLICK: Thank you for joining us today, Thomas. It was great to have you on the show.
THOMAS GRAF: Thank you very much.
ADAM GLICK: You can find Thomas on Twitter at @tgraf__ with two underscores, and you can find Cilium on the web at cilium.io.
ADAM GLICK: Thanks for listening. As always, if you've enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @KubernetesPod or reach us by email at email@example.com.
CRAIG BOX: Check out our website at kubernetespodcast.com, where we keep our transcripts and our show notes and where you can find links to subscribe. Until next time, take care.
ADAM GLICK: Catch you next week.