Kubernetes Podcast from Google: Episode 217 - Cilium and eBPF, with Bill Mulligan

#217 January 23, 2024

Cilium and eBPF, with Bill Mulligan

Hosts: Abdel Sghiouar, Kaslin Fields

Guest is Bill Mulligan. Bill is Community Pollinator at Isovalent working on Cilium and eBPF. We learned how to properly pronounce Isovalent and what it actually means. We also spoke in depth about eBPF, Cilium, network function in Kubernetes and more.

Do you have something cool to share? Some questions? Let us know:

News of the week

The Kubernetes legacy Linux package repositories are going away in January 2024

Kubernetes 1.29 is now available on GKE in the Rapid Channel

The Vmware Tanzu Application Catalog is fully compliant with the SLSA Level 3

AWS extended support for Kubernetes minor versions pricing update

The Kubernetes Contributor Summit Paris CFP is Open, closes Feb 4th

KubeCon and CloudNativeCon EU 2024 co-located events agenda is live

The Cloud Native Glossary is now available in French

Blixt a new experimental LoadBalancer based on the Gateway API and eBPF

Links from the interview

Bill Mulligan:

Covalent bonds on Wikipedia

Isovalent Hybridization on Wikipedia

Isovalent company site

BPF - Berkeley Packet Filtering

eBPF project site

Fast by Friday: Why eBPF is Essential - Brendan Gregg

Cilium Certified Associate (CCA)

CCA Study Guide from Isovalent on GitHub

Istio Certified Associate (ICA)

Certified Kubernetes Administrator (CKA)

Certified Kubernetes Application Developer (CKAD)

Kubernetes and Cloud Native Associate (KCNA)

Resources to prepare for the CCA certification

Isovalent library

The World of Cilium

Cisco acquired Isovalent

Developing eBPF Apps in Java

BGP in eBPF

Transcript

Show full transcript

KASLIN FIELDS: Hello, and welcome to the Kubernetes Podcast from Google. I'm your host, Kaslin Fields.

ABDEL SGHIOUAR: And I'm Abdel Sghiouar.

[MUSIC PLAYING]

KASLIN FIELDS: This week, we spoke to Bill Mulligan. Bill is a Community Pollinator at Isovalent and works on Cilium and eBPF. We learned how to properly pronounce Isovalent and what it actually means. We also spoke in depth about eBPF, Cilium, network function in Kubernetes, and more. But first, let's get to the news.

ABDEL SGHIOUAR: The legacy Linux package repositories, apt.kubernetes.io and yum.kubernetes.io, AKA, packages.cloud.google.com, have been frozen starting from September 13, 2023 and are going away in January 2024. If you run your own Kubernetes clusters, you must make sure your clusters are set up to pull from the active community-owned repository, packages.k8s.io. Make sure to check out the blog from the Kubernetes community in the show notes to learn more.

KASLIN FIELDS: Kubernetes 1.29 is now available on GKE in the Rapid channel. Google Cloud customers can start experimenting with the new version of Kubernetes to find any backward compatibility issues that might occur.

ABDEL SGHIOUAR: VMware announced the Tanzu Application catalog is fully compliant with the supply chain levels of Software Artifacts Level 3 Security, or SALSA. The application is the enterprise version of Bitnami app catalog available on the Tanzu platform. AWS announced the pricing for extended support of Kubernetes minor versions on EKS. Starting April 1, 2024, EKS clusters running versions in the extended support window will be charged $0.60 per cluster per hour.

KASLIN FIELDS: Clusters in standard support remain at $0.10 per cluster per hour. Extended support is a feature of EKS which gives customers an extra 12 months of support for minor Kubernetes versions. In other terms, if you run in the extended support window of EKS, you will pay more for the support provided by AWS.

ABDEL SGHIOUAR: The Kubernetes Contributor Summit Paris will happen on March 19, 2024. The CFP four planned sessions is open to contributors and will close on Sunday, February 4.

KASLIN FIELDS: The agenda for KubeCon and CloudNativeCon co-located events is live. You can start building your agenda today for the 15 events happening on Tuesday, March 19th, 2024 in Paris.

ABDEL SGHIOUAR: The cloud native glossary is now available in French. The glossary is a CNCF-led project that aims to explain cloud native concepts in simple language and is translated into 12 languages beside English.

KASLIN FIELDS: Blixt, which means "lightning" in Swedish, however it may be pronounced, is the name of an experimental layer for load balancer for Kubernetes. The project originated at Kong and was donated to Kubernetes' SIG network in 2022. The control plane is built on the gateway API using Go. The data plane uses eBPF and is written in Rust. There's a new blog post about Blixt from the Kubernetes community, which you can find the link to in the show notes. And that's the news.

ABDEL SGHIOUAR: Today, I'm with Bill Mulligan. Bill is a Community Builder at Isovalent, which I am going to count on you to tell me how to pronounce it, actually. And you are also a committer to Cilium. You run a very popular newsletter on LinkedIn, which I'm subscribed to. It's called "eCHO News," which is a biweekly publication on everything eBPF and Cilium. Welcome to the show, Bill.

BILL MULLIGAN: Thanks for having me.

ABDEL SGHIOUAR: We were chatting before we started the recording, and you are from Wisconsin, right?

BILL MULLIGAN: Yeah, that's correct.

ABDEL SGHIOUAR: Tell us something fun, something people don't know about you.

BILL MULLIGAN: Yeah, so the big stereotype about Wisconsin is that it's like the biggest producer of cheese in America, and I guess the fun facts that I have is at my high school, we had Drive your Tractor to School Day. And my uncle owned 8,000 cows, so all the stereotypes are true.

ABDEL SGHIOUAR: OK, sure. I didn't say anything. You said it. All right. So I've never been to Wisconsin. I hope I can visit it at some point. So all the stuff I know about Wisconsin are American stand-up comedians making fun of Wisconsin.

BILL MULLIGAN: Yeah, pretty much.

ABDEL SGHIOUAR: And cheese is one of them, so. All right, so cheese aside, let's get going. So we're here today to talk about eBPF, which I'm actually very excited to learn about because it have been a big subject of discussion/debate in the cloud native community. One of the reasons why I actually wanted to have you on the show is to have somebody who could really make us make sense of all this these technologies. But before we get going, tell us a little bit how did you got into cloud native. Where does it all started?

BILL MULLIGAN: Yeah, so my background is probably a little bit different than a lot of people in the industry. I actually did my undergrad in biochemistry, and then I did a master's in social science. And I would say I kind of accidentally fell into cloud native. I was looking for a job after grad school and just applying to a bunch of different startups because that's the people that employ people who have no other skills.

And so I got into this four-person company that was doing machine learning platform on Kubernetes, and I was really excited because I thought Kubernetes was, like, some great Greek island and that's where the company office were going to be. But then the first day at work, they sat me down with my Windows laptop, and they're like, you're going to set up a Kubernetes cluster. And this is, like, Kubernetes 1.8 or something, and so I'd never seen the command line before. And yeah, it took me eight hours, but I had a working Kubernetes cluster by the end of the day.

ABDEL SGHIOUAR: Nice, nice. It sounds like your origin story is kind of similar to mine. My first job, my interview was, set up a ticketing system based on open tickets, which is like an open source ticketing system, and then OpenERP, which today people call Odoo, which is like an ERP thing. So it was like, hey, sit down. If, by the end of the day, you manage to figure this out, you have a job. So yeah, it was interesting. Then you fall into it, and then how did you started contributing to open source? What was your-- what do you do with open source stuff?

BILL MULLIGAN: Yeah, so I think this is also funny for me. My first contribution was actually just fixing a broken link in the Kubernetes documentation, so Kubernetes sync docs has a lot of great work. I think by getting people into the community, even if they don't stick around like me, I made that contribution, and then I think I've made one other contribution to Kubernetes but haven't been back since.

But now I'm a maintainer of another project in the cloud native ecosystem, so from the first small change of just fixing a broken link to a maintainer. And I think it's great when people start small with open source. You don't need to try to boil the whole cloud native ocean.

ABDEL SGHIOUAR: Yeah, and you certainly do not need to start contributing code. Documentation is a very good way to get started.

BILL MULLIGAN: Yeah, absolutely.

ABDEL SGHIOUAR: I would even go further and say it's a very welcome contribution because I don't think a lot of developers actually like writing documentation, so if you are able to help out, people will appreciate that.

BILL MULLIGAN: Yeah, definitely.

ABDEL SGHIOUAR: Awesome, awesome. All right. So then I guess you worked a bunch of companies before Isovalent, where you are right now, which is very known for, A, having a very hard name to pronounce-- and probably no one understands what it means-- and, B, pushing very hard on Cilium and eBPF. So let's start with the beginning. How do you pronounce the name of your company?

BILL MULLIGAN: I'm actually very impressed with your pronunciation. I would say Isovalent, which is very close to you. I've also heard eeso-valent and other iterations, but yeah, Isovalent.

ABDEL SGHIOUAR: OK. I guess probably the people who would say eeso-valence are maybe French speakers because I-S-O in French is written iso, which is the standard, like the iso plus a number. But I speak two other languages, so that's probably why. And I think I heard it pronounced before in a conference, so that's probably how I know it. So what does actually "isovalent" mean, the word?

BILL MULLIGAN: Yeah, so the company actually wasn't originally named Isovalent. The original name of the company was Covalent, and this comes from chemistry, actually, when you have covalent bonds, and this is bonds between two atoms that are really strong. And actually, if you look at the original Cilium logo, you kind of see this covalent bond in the logo.

Now, the project logo and the company name have been updated, so isovalent is very similar. You can also have isovalent bonds, and so that's where the pronunciation comes from too. So it's kind of this chemistry background of how do we connect things together, and that plays into everything that Cilium does today is really connecting our cloud native infrastructure.

ABDEL SGHIOUAR: Makes a lot of sense. All right. That means for the people who are going to transcript our episode, they will have to get it correctly and then for us we will have to go do some digging to find some chemistry Wikipedia pages to add to the show notes so people understand what you're talking about. Cool.

And so since I joined cloud native, eBPF have always been in the picture, and the only thing I know is eBPF comes from something that was called BPF before, without the E, which stands for Berkeley Packet Filtering, which was supposed to be a replacement of Iptables. I might be wrong. So what is eBPF? Help us get the 101 kind of level of understanding?

BILL MULLIGAN: eBPF is this Linux Kernel technology, and what it allows you to do is to run, essentially, specified programs in the Linux Kernel when certain event happens, so that's kind of like the very low-level explanation of what eBPF is.

Taking it up a level, the thing that a lot of people like to do is eBPF is to the Kernel what JavaScript is to the browser, and so that kind of example you kind of get, right? Before we had static web pages. And then JavaScript comes along, and suddenly, we can have interactive elements that kind of respond to user input. And that's a little bit what eBPF is doing in the Kernel. An event happens, and then you can run a program that changes what's happening in the Kernel.

And then taking it up one more level, a really more abstract analogy that I sometimes like to use is eBPF is like the cloud native app store. So cell phones originally, when you had your brick phone, came with whatever software or programs were delivered by the device manufacturer. Then Apple came along with the iPhone, and there was an app store. And you could download new programs to add functionality on the fly for whatever you needed.

So you're like, OK, I need to navigate around where I am. So I download Google Maps. I need to order food. I download Deliveroo. And so you can add new functionality to this existing hardware as you need it on the fly, and that's what eBPF allows you to do. You're adding new programs, new functionality to the Linux Kernel when you need them, and you're able to run them, update them, and do this all in a safe, secure, and performant manner.

ABDEL SGHIOUAR: Awesome. I really like this analogy, actually. I really like the analogy of the browser-- JavaScript is to the browser what eBPF is to the Kernel. But let me get this in my own words. So when you say "events," that's essentially-- in Kernel terminology, that's syscalls, right?

BILL MULLIGAN: Yeah.

ABDEL SGHIOUAR: Or that's things that the programs or apps running on the Kernel are trying to do.

BILL MULLIGAN: Yeah, exactly, so syscalls could be one example or different trace points within the Kernel, so whenever different things happen within the Kernel, yeah.

ABDEL SGHIOUAR: Got it. And then is eBPF more the framework and the runtime sandbox-type thing, or is it just one of those things?

BILL MULLIGAN: eBPF is actually made up of a lot of different components, and I would recommend checking out ebpf.io. It has an explanation of what eBPF is. But essentially, walking through what it is, first you're going to have your eBPF program, and eBPF programs are written by developers.

These are then going to be loaded into the Kernel, usually by a library. So for example, Cilium has an eBPF go library that we use the Kernel is then going to go through a process of verification, and this verification step is actually really important for the eBPF because it essentially ensures the safety of the Linux Kernel. This is kind of like the critical component of your operating system, so you don't you want to make sure it doesn't crash or stall the Kernel. So the verifier makes sure that the programs run to completion, it's not accessing memory it isn't supposed to, essentially that these programs are going to be safe for your Linux Kernel to run.

Then it's going to be compiled to the architecture, so it basically runs as fast as natively-compiled Kernel code, and then it's going to be essentially loaded into your Kernel. And so then it's a sandbox program running into the Kernel.

But the great thing about eBPF is it provides a really low overhead way of doing this because this program isn't just something that's running all the time, running in the background. It's only running, essentially, when a specific event that triggers it happens, and so you have these really low overhead way of doing new things in the Kernel. And since it's running in the Kernel, it has access to a lot of things that aren't available to programs just running in user space too.

ABDEL SGHIOUAR: Got it. Yeah, so that's a very, very good explanation in the sense that basically whatever the Kernel executes, you can hook up kind of like a program to it and then make the program do certain things for you, either modify things or maybe even observability. I know that one of the features of Cilium is observability, which relies on eBPF. And then you can go some-- you can get some super fine-grained observability that you cannot get from, let's say, the HTTP stack or the TCP stack or whatever. OK, so that's cool.

So then that's basically explaining what's eBPF, but what eBPF today, in 2024? I guess my question is, help people make a sense of, as a developer, should I care, shouldn't I care? Will I ever have to use eBPF? Will I ever have to use-- [? a ?] [? reason ?] [? to ?] [? care. ?] There is a library that uses eBPF. Is this something that people have to be exposed to, or, let's say, I don't know, 99% of people shouldn't care?

BILL MULLIGAN: So I would say it's both, like any good answer. It depends. So what eBPF is in 2024-- and maybe I'll start with, first, why are people talking about it so much, and why has it become a hot technology right now?

ABDEL SGHIOUAR: Cool.

BILL MULLIGAN: And when we're thinking about the cloud native revolution, things are happening a lot faster, they're a lot more dynamic, and we need to essentially design infrastructure for different scale, a different speed than we were doing previously. It's kind of like this whole, also, pets-to-cattle analogy too. We're using from things that we can maintain individually and by themselves to groups of things.

And so what eBPF allows us to do is to add new functionality because what people may not realize about the Linux Kernel is that it's a 30-year-old technology, and it's kind of accumulated a lot of things in 30 years. And the other thing is Linux is deployed on literally billions of devices worldwide right, and so if you want to make a change to the upstream community, you have to get it accepted by the whole community, and it has to be acceptable to run on those billions of devices.

And so since this is such a critical component of basically all modern computing, it takes a long time to bring things into production, and so if you have an idea of, hey, I'd like to add this new functionality to the Linux Kernel, first you have to convince the Linux Kernel community that it's a good idea, get it merged upstream. And then once it's merged upstream, it's in the latest Linux Kernel release, and to get it into a long-term supported release by your vendor that you're using or that's approved by compliance, that may take, actually, a couple of years.

So this new functionality isn't something that you can have tomorrow. It's something that can take literally a couple of years to actually reach end users, and eBPF really changes this whole paradigm. You can sit down, write an eBPF program, adding this new functionality load it into your Kernel on the fly. Making eBPF verify it ensures that it's not going to crash in Kernel, and once you have this new functionality in your Kernel, you're able to change essentially what the Linux Kernel is doing.

And Brendan Gregg, who is a person who popularized a lot of stuff around eBPF, gave a talk at the last eBPF Summit about Fast by Friday, and he's talking about how he could literally be in meetings with customers, and they were asking questions about the performance of their systems. He could write a eBPF program in the meeting and show them the actual data by the end of the meeting.

That's the difference, days, hours, rather than years. And I think that's why eBPF is becoming so important and interesting right now, because in the cloud native world, things move quickly, and eBPF allows us to keep up with that speed of change. And I guess the real question is whether you should care about it. Should everybody go start programming eBPF programs tomorrow? And this, I think, also depends too, like any good answer, but I think the answer for most people is actually probably no.

And I think this may be a little bit controversial. People like to get their hands on with the newest, the latest technology, but the same thing that we were seeing with Linux Kernel development before, a lot of specialized companies, people like Google, deploy a team of Linux Kernel developers so that they can make the modifications that they need right and maybe even maintain their own Kernel.

But most companies are going to take either a regular release from one of the distributions or run something from their vendor of choice, and that's because they don't have this expertise to be able to do that. But they're still using Linux as this underlying technology that they're relying on, and I think we're going to see the same thing with eBPF.

People that are really specialized in this are going to be writing eBPF programs. It's going to be providing a lot of functionality, a lot of benefits in terms of performance, scalability, time to market. But in terms of actually programming eBPF, most companies aren't going to have to worry about that.

And I actually think, in some ways, that's a good thing because you get the advantage of this technology without having to become an expert in it.

ABDEL SGHIOUAR: Got it.

BILL MULLIGAN: Right. And that's kind of what we like about advanced technologies. There's the quote, "Any sufficiently advanced technology is indistinguishable from magic." And I think that's what eBPF is going to provide to a lot of people, this magical experience of getting this new functionality in the Linux Kernel, providing something like faster networking, better observability, more secure systems, essentially tomorrow for free. And that's a great benefit, and they don't have to know how it works all under the hood. But they still get the benefits of this technology.

ABDEL SGHIOUAR: Got it. But that's good, and the analogy or the example I would give here-- and I'm realizing I'm going to give a product pitch, but whatever-- is the GKE Dataplane Version 2, which we talked about on Twitter, the new functionality we added where we replaced the old CNI we had GKE with Cilium. And we were able to instantly give people network policy login because before, network policy couldn't debug it because there is no logs, and so now we just added this new functionality, and for most people, they shouldn't care that it's there. They just need to know, OK, well you annotate your network policy, and then, boom, we can generate logs for you.

And that helps further debugging or troubleshooting if network policies are actually working or not working right because I know that one of the probably big things in Kubernetes is that no one really likes how network policies are written. They are really a pain to write and understand. So that's a good way to describe it in the sense that a lot of people will just have to install something, enable something, and in your analogy, you used [? "download ?] an app from the marketplace," and then, boom, they have the functionality without having to even care about it.

BILL MULLIGAN: Yeah, exactly. It's like the same way you don't care what language your app on your phone is written in right. People aren't going to care like whether something is written with Iptables or eBPF. They're going to look for the actual end-user benefits. Does this provide me faster networking? Is this easier to debug? And I think projects and products that are leveraging eBPF are going eventually went out in the end because of these benefits that essentially come with eBPF itself.

ABDEL SGHIOUAR: Awesome. Cool. OK, we'll move forward toward-- the other thing that your company does is Cilium, right? So I went and scooped around a little bit in the Isovalent website and clicked on the products, and there's like a long list of functionalities that Cilium can do. In my head-- and again, maybe I'm too dumb to follow up-- Cilium is a CNI. That's how I know it. So what is Cilium today? What can people do with Cilium?

BILL MULLIGAN: Yeah, so if you go to cilium.io, you'll see the headlines. It says "eBPF-powered networking observability and security." And that's what it does-- networking security and observability, which-- obviously, all really broad buckets. But I think if we dive into each of those, we can kind of trace the history and the development of the Cilium project and kind of understand how it came to be and why it is the way today.

So you're right. Cilium originally started just as a CNI, so a way to connect containers together, and that was back in 2015. I think one of the things that a lot of people maybe don't realize about Cilium is that it's actually quite an old project at this point. It's going to be going on nine years now?

ABDEL SGHIOUAR: So it's one year younger than Kubernetes itself.

BILL MULLIGAN: Yeah, exactly, so it's been around for a long time and kind of seen a lot of developments. And I think the project has really grown with the needs of the cloud native industry as a whole too. And so it began first as this connectivity layer, connecting containers together. Then the next thing that people start to ask for is, how do we secure this? This is where network policy comes in. If all of-- in a world of microservices, everything's flowing over the network, so we need to be able to secure this. That's where the first part of security comes [? in too. ?]

Then the next part is around observability. If all of our traffic is flowing over the network, what happens when things break if we need to debug something? And this is where Hubble was launched, and Hubble provides you network observability. Basically, what it does is it attaches onto Cilium, pulls all the information off of Cilium so that you're able to see things like a service map. You're able to pull network metrics out of there and have observability of, is this thing connected or is it not?

And then as we see the development of the industry from single clusters to multiclusters, Cilium added things like multicluster mesh, being able to provide a consistent layer to tie everything together. And then I think now we've gotten to a point in the industry where cloud native really isn't just about Kubernetes. It's about being able to extend to things outside of Kubernetes too, and so this is when we started to get things like Cilium Mesh, so providing transit gateways to virtual machines, bare metal machines. And then I think "mesh" is a very overloaded term, but there's also Cilium Service Mesh, which kind of does this exact same thing, connecting, securing, and observing everything.

And I think a lot of people tie service mesh with these terms too, but I think, different from a lot of service meshes, Cilium kind of goes from the bottom up, from layer three, layer four, controlling all of the network packets, up towards layer seven. And I think that's the thing that's exciting for me about Cilium Service Mesh is really connecting, securing, and observing everything that's happening. And in, I guess, the world of distributed computing, everything's going over the network, so it really is the most critical layer in your infrastructure stack today.

ABDEL SGHIOUAR: Yeah. I think it's important to acknowledge, as you rightfully mentioned, that service mesh is an overloaded term, and that's because probably-- because Istio was the first project, open source project, to introduce this concept of service mesh, in people's heads it just stuck with how Istio does things. But that's not necessarily the case for other service meshes.

I remember looking at Linkerd. I looked also at not Cilium, but I looked at the other one. There is one which is open source, very popular. At some point, HashiCorp released their Consul service mesh. So I was looking at those, and I was realizing, OK, well, the way-- the service mesh architecture in Istio is very specific to Istio, and that's not necessarily how all the other tools doing service mesh work.

BILL MULLIGAN: Yeah, I would say this is actually a really funny thing about like Cilium Service Mesh. Cilium Service Mesh, when we kind of were thinking about it and we looked around-- what does a service mesh mean today? What are all the functionalities? And the Cilium project itself, before we launched the official Cilium Service Mesh, already had kind of 80% of the functionality of a service mesh, things like connecting things together, connecting multiple clusters together, providing that observability component, providing some type of network security too. And there's only a couple of things around neutral authentication that were actually added to make it the official Cilium service mesh.

ABDEL SGHIOUAR: Got it.

BILL MULLIGAN: I'm kind of biased here. I've written some articles that I think, in the future, we're actually going to see like the whole service mesh terminology just kind of disappear right because these are things that we were doing before, connecting things together, observing how they work, providing security for them. These are all part of the networking layer itself. I don't think it needs to be a whole separate category. It's part of what we do. We're already doing it.

I think we originally came up with the service mesh term to help us understand these for like the cloud native world, but as we've advanced so much in the past eight, nine years, I think we've learned a lot, and we're able to take those learnings and integrate them back.

And I think this is a really important thing if we want to move the industry forward because I think at this point, we've seen Kubernetes has a lot of mainstream adoption, and we need to-- it's not-- the new use cases connecting back to the rest of our IT infrastructure, and a big part of that is educating and helping other people understand how their world connects to the cloud native world. And we need to be able to translate that terminology from cloud native back into the rest of our infrastructure.

Some people in the networking world-- this is people who are working with hardware switches, the Cisco switches, are doing things with BGP. How do they connect to the cloud native world? Well, we need to explain things in their terminology, and I think we're going to see a lot of that going forwards.

ABDEL SGHIOUAR: I don't think that you mentioned in the fact that the service mesh will disappear as a term as it probably will become like a native feature into Kubernetes itself maybe going further. I don't think that that's neither a hot take nor a biased thing. I think that the way Kubernetes itself is progressing is going in a sense that a lot of things that used to be a big deal a few years ago now are just available in the platform, and no one cares about them.

And that's, in my opinion, one of the best things in Kubernetes as a project. It's being able to like provide you with that abstraction layer that what you used to care about-- I don't know. People used to care about jobs, and the [? job ?] API was a big deal a few years ago. Now it's just there. No one cares. It's whatever. You just write a job. You write a cron, and it just works.

So in that sense, yeah, service mesh not becoming a big deal, and the functionalities just being part of the platform is, as you said, probably going to make it such a way that no one will care about service mesh at all in the future. And in that sense, I might be considering myself a visionary because I wrote that [? "you ?] probably don't need a service mesh" talk a couple of times, which goes in the sense of like explaining to people that, yeah, your CISO reading a white paper that says "service mesh is a big deal." That's not how you should reason or think about it. You shouldn't implement it just because everybody is doing it. You should implement it because you know what it does.

Cool. Well, that's just a long rant. Anyway, I want to move further to the next part of my question. So eBPF and Cilium-- how are they connected? And what's the relationship between these two things?

BILL MULLIGAN: It's similar to what we were just talking about, about this abstraction layer. So I would say that Cilium in some ways is an abstraction layer on eBPF. So eBPF is kind of all these programs running in the Kernel, but you need a way to manage that to do what you want to do. And rather than writing bytecode or programming eBPF program, Cilium allows you to write things in YAML.

And so you can say, connect these two containers. This container shouldn't talk to that container. Tell me what our network traffic looks like in your YAML definitions rather than having to write the eBPF programs. So what Cilium essentially does is there is a Cilium agent that's installed as a DaemonSet on every single node in your cluster, and what that's responsible for is loading and unloading these eBPF programs that are doing the networking, the observability, and the security in your cluster.

ABDEL SGHIOUAR: Got it. Got it. So essentially, Cilium uses eBPF for part of the features it's able to provide.

BILL MULLIGAN: Yeah, essentially all of the features are-- the vast majority of the features in Cilium are done through eBPF.

ABDEL SGHIOUAR: Got it. So does that mean that people would be able to probably-- if they still want to have the option to write their own programs and run them as part of Cilium, is there an API to extend Cilium itself?

BILL MULLIGAN: You can run programs on nodes that are also running Cilium if you want to do different things or you want to collect different metrics than Cilium is doing, and we see people doing that right now. So I think one thing that's kind of interesting-- Cilium was one of the first applications leveraging eBPF, and we're starting to see more and more of them.

And one of the interesting things in the eBPF ecosystem is now how do we manage all of these eBPF programs in a sensible and reasonable way. So similar to Kubernetes is managing the life cycle of all these containers that are suddenly appearing on our infrastructure I think in the same way we're going to have to figure out how we deal with the whole life cycle and, let's say, orchestration of these eBPF programs running on your Linux node.

The Kernel community is still discussing how this is going to happen, and I don't think this is a finished discussion. We're kind of at the start, not the end. But I think this is a really interesting area for the whole eBPF community right now.

ABDEL SGHIOUAR: Got it, yeah. Now at least I can confidently say I understand what eBPF and Cilium are, so thank you for all these explanations. That's amazing. So Cilium is now a CNCF project, right?

BILL MULLIGAN: Yep.

ABDEL SGHIOUAR: At what level? Has it graduated sandbox?

BILL MULLIGAN: Yeah, it just graduated last year.

ABDEL SGHIOUAR: Graduated last year, cool. So is Isovalent still one of the biggest contributors to it or like are there other people? Because I know that there are other companies also offering manage stuff around Cilium, right?

BILL MULLIGAN: Yeah, Isovalent is the main contributor, and I actually have the annual report from Cilium. Every year we kind of do a summary of what's happened in the community, and you can link it in the show notes too. And the top five contributing companies by number of PRs were Isovalent, Independent, Red Hat, VMware, and Google.

ABDEL SGHIOUAR: OK, cool. Yeah, because I remember that we mentioned the report in our last episode, in the news portion, because I saw it on the CNCF website's blog, but I didn't read the details. OK, so you are still one of the main contributors.

OK, so then we talked about eBPF. We talked about Cilium. We talked about Kubernetes and the ecosystem and all this cloud native stuff. Then Isovalent is the company that provides manage offering around this, right?

BILL MULLIGAN: So we provide a enterprise distribution of Cilium.

ABDEL SGHIOUAR: OK, enterprise distribution. So it's like the freemium model. You have the open source, which you can use, but then if you need more features or technical support, you can go to Isovalent.

BILL MULLIGAN: Yeah, exactly.

ABDEL SGHIOUAR: Cool. And I know that's one of the things that I have seen-- and I also replied to one of the posts on LinkedIn-- was the CCA, which is the Cilium Certified Associate. For those of you who doesn't know, I can see their certificates like Pokemons. I just want to have them all. So what's that? Are you involved in that project?

BILL MULLIGAN: Yeah, so I've been involved in that project too. This is a new certification from the Linux Foundation similar to the CKA, CKAD, KCNA if you want to keep on going with the alphabet soup here.

ABDEL SGHIOUAR: Yes, the ICA, the Istio one.

BILL MULLIGAN: Yeah, exactly. So the CCA is a new certification from the Linux Foundation focused specifically on Cilium, and it's an entry-level certification basically to test your knowledge, like, do you know what Cilium is, do you know like the different features, and things like that. And so I'll just walk through the general like components of the exam so people can learn more about it.

So the first part is installation and configuration. The second part is the architecture of Cilium. Then it goes into network policy-- so how do you secure your network-- service mesh, network observability with Hubble and the service mesh, cluster mesh, a basic understanding of eBPF, things like BGP and external networking. So it covers a wide range of different topics on there.

ABDEL SGHIOUAR: Cool. It's worth mentioning that as of now that we're recording this, the exam is in beta. You have to sign up to get it. I replied to Chris, the CTO of the CNCF to say "I want to be in," and I guess there was a conversation going on in LinkedIn between yourself, myself, and then somebody also from Isovalent where we shared some resources around how to prepare for it, which we'll make sure to include in the show notes, right?

BILL MULLIGAN: Yep.

ABDEL SGHIOUAR: Anything else you want to mention about the certification? Is it worth people's time? Should people go look at it?

BILL MULLIGAN: Yeah, it was done by a lot of people across the community, so lots of great questions. I know there's a lot of experts on that. I think a lot of it was driven by my colleague, Nico, and he does a lot of great content around Cilium. Shadrach on my team also came up with a CCA study guide that you can find on GitHub, so I'd recommend checking that out. I'm sure it'll be in the show notes.

And if you want to catch them all for, I guess, certs around Cilium, I'd also recommend going to the Isovalent website. We actually have a lot of labs-- and these are hands-on demo environments-- going through, essentially, all the different features about Cilium. And people really like love them, and a lot of people are trying to collect them all. And my colleague, Raphael, also created a whole world of Cilium thing that will take you to the different parts of the project. And you can do labs, and track your progress, kind of going through the whole world of Cilium. So there's a lot of cool resources to check out if you want to learn more about the project after this podcast.

ABDEL SGHIOUAR: Yeah. Awesome, yeah, we'll make sure to include all the links. I think-- I see certifications, beyond the Pokemon joke, as just a great way to learn things and stay on top of the game and also to test your knowledge, so I think it's worth it. So just make sure to check it out. It's in beta.

And also, it's worth mentioning that when certifications are being developed by the Linux Foundation, they do actually use the community to help write questions, so if you are interested, there are ways to actually get involved in these things. Cool. So we're getting close to almost the end of our conversation here, but I still wanted to mention some news from, I think, late last year. You got acquired by Cisco.

BILL MULLIGAN: We did.

ABDEL SGHIOUAR: And it's funny that you mentioned earlier that you're trying to bridge the gap between cloud native networking and then the Cisco switch-based hardware networking world. I'm coming from that world, by the way, FYI. That's what I did in a previous life. So what's that? What does that mean, actually, for Isovalent?

BILL MULLIGAN: I think it makes me more excited about the future for Cilium. I think in the cloud native world, Cilium has become the default option for all the major public cloud, providers basically every single Kubernetes distribution, literally hundreds of end users from all different scales, across every single industry. Cilium has become the standard in the cloud native world.

I think this acquisition by Cisco really signifies that Cilium is becoming the future of networking. I think Cisco, as this networking giant, saying, we think Cilium is the future, I think is a really big statement for where the project is heading to. And it's kind of looking around like, who hasn't said Cilium is the future for networking? And there's not that many people left. So it makes me really excited about the future of the project.

In terms of what does it mean for Isovalent, it means we have more resources, more people, more of everything to continue investing in the Cilium project, so I'm super excited about where the project is going. I know there's a lot of people over at Cisco, like Stephen Augustus, heading up the open source program office there, and they're looking to do a lot more in the cloud native community. And so I think this is a really exciting time to come in there and really build the momentum even more behind the Cilium project.

ABDEL SGHIOUAR: Yeah, as we were talking, actually, I was thinking in my head. Before advocacy, I was doing consulting, and a big part of any migration project to the cloud, when you work with major companies that already have their on-premise systems and they are looking into either extending to the cloud to use more stuff in the cloud or just completely migrating, part of what we call the foundational job, the foundational work you have to do is disconnecting the cloud to on-premise.

And I remember from that time, when I didn't even knew Cilium existed, a lot of times it's discussions that are more tailored toward the old school, if I dare say it this way. It was like VPNs and routers and stuff like that. And I do see Cisco coming and acquiring a cloud native company is a very good indicator of the maturity of the project and also probably an indicator of the future, if I dare make a prediction, that networking will just become easier.

And it's like, being able to-- because yeah, one of the things that Cilium does is-- it's YAML file-based, so you just write a YAML file. And then you guys wrote code that translates YAML file into low-level IPs and TCP sessions and firewalls and all that stuff that 99% of developers shouldn't even care about. So going further, being able to interconnect these traditional networking and cloud native networking is super-exciting, in my opinion, future to look for.

BILL MULLIGAN: Yeah, I think that's the future that we're looking at too, and it kind of goes back to what I was talking about before as, how can we make sure that we bring the rest of it into or at least connected to the cloud native world? And I think this really positions us very well and the Cilium project as a whole kind of to be this bridge from cloud native to the rest of our IT infrastructure.

And I think eBPF is becoming a key part of this too. My colleague, Dan Finneran, is doing a lot of fun stuff. I don't know if you've seen on Twitter. He recently did something about BGP in eBPF.

ABDEL SGHIOUAR: Oh, yes. I think I saw it, yeah.

BILL MULLIGAN: Yeah, and I think we're going to see a lot more fun stuff like that coming out to making sure that, yeah, the networking does just work, rather than debugging late at night.

ABDEL SGHIOUAR: Yeah. And actually, another thing I wanted to mention around eBPF-- and if anybody is wondering like how-- I don't think that anybody should be wondering the question of how mature this technology is because, again, as we said, it's existed for a while, but if anybody is still wondering, should this thing be a thing, should you care, I guess-- the way I like to describe it is, any technology, whenever the Java community starts caring about it, you know it's serious.

Just context, a friend of mine called [? Mohammad ?] [? Abuleil ?] and another person-- I forgot his name-- are actually writing a library for Java for eBPF. So they are making it available to use eBPF from Java. And that's like, once Java cares, you know it's a big deal, so yeah. It's just-- this is not a joke. This is serious.

Because if there is any constant in technology time, Java it is. So that's the only thing that's existed since probably most people my age, your age, and people around us have got into the technology space. Java was the first thing we heard. So yeah, if people-- if Java cares, that's a safe bet in terms of technology [? that cares, ?] I guess.

BILL MULLIGAN: Yeah, absolutely.

ABDEL SGHIOUAR: Awesome. It was a pleasure talking to you. Thank you for all the explanations.

BILL MULLIGAN: Yeah, and thank you for having me. It was really a fun conversation with you.

ABDEL SGHIOUAR: Yeah, it's like, you helped us explain eBPF and Cilium, so now next time somebody ask me what eBPF is, I will just point them to this episode and go like, go listen to what Bill has to say.

BILL MULLIGAN: Oh, perfect. Thank you.

ABDEL SGHIOUAR: Awesome. Thank you very much, Bill.

BILL MULLIGAN: Yeah, thank you.

KASLIN FIELDS: Thank you so much for that interview, Abdel. It's always great to hear from Bill. He used to work for the CNCF if I recall correctly, so I met him through that. So it's always good to hear from him.

ABDEL SGHIOUAR: Yeah, I think he had some marketing chair position within the CNCF--

KASLIN FIELDS: Something like.

ABDEL SGHIOUAR: --or was working-- yeah, yeah, I remember that from reading the LinkedIn profile.

KASLIN FIELDS: Regardless, Bill is awesome, and you should totally follow the stuff that he does because he finds good things to be involved in my experience with him.

ABDEL SGHIOUAR: Definitely.

KASLIN FIELDS: And we heard a little bit about that background in your interview as well. I didn't know that he had a background in biotech, which is something that I strongly considered going into and then ended up in tech through startups and doing Kubernetes. Love it.

ABDEL SGHIOUAR: Yeah, it's quite interesting. We were debating before the interview whether we should make the joke about Wisconsin and where he's from. And he was like, well, my uncle has cows. I was like, well, we should definitely talk about that then.

KASLIN FIELDS: You've got to. You've got to talk about it.

ABDEL SGHIOUAR: You got to talk about it, exactly.

KASLIN FIELDS: My family actually raises cattle as well.

ABDEL SGHIOUAR: Really? Wow.

KASLIN FIELDS: Yeah, in Virginia just as kind of a small-time thing. But every time I go home-- last time I went home, my dad actually smoked a brisket that was from one of the cows that my uncle raised, so that was pretty cool.

ABDEL SGHIOUAR: Wow. All right. Maybe it's actually my retirement product, so let's see.

KASLIN FIELDS: Good luck with that. I'm not doing that. Back to Cilium, though, and eBPF, something really interesting that I learned about Cilium from this is that it's only a year younger than Kubernetes itself, which I did not know.

ABDEL SGHIOUAR: Yeah, that's actually why in the interview I brought up this point of, what is Cilium beyond just the CNI? Because it was introduced as the CNI in 2015, so one year after Kubernetes, but then it just kept growing. So that's why I made sure to talk about-- OK, talk about Cilium outside of the CNI, and that's when we started talking about the service mesh and observability and all that stuff, right.

KASLIN FIELDS: That's really impressive, and I'm surprised that it's that old. I can't believe that I didn't know that before now, but.

ABDEL SGHIOUAR: Yeah.

KASLIN FIELDS: But it never worked in a realm that was not containers it sounds like because things that early, like 2015-ish and earlier, it's like, a lot of those types of tools worked for things before containers started to become a thing but not Cilium. Cilium was geared toward containers the whole time it sounds like.

ABDEL SGHIOUAR: Yeah, it was pretty much a Kubernetes and containers thing. I think, over time, what's probably going to happen is, especially with the acquisition with Cisco, they're probably start building up stuff that are-- like in the interview, Bill talked about connecting your on-premise systems to your cloud native systems. So in your on-premise, you don't necessarily have containers in Kubernetes, so you might need to run stuff on VMs and things like that. So that probably would be the future because they're focusing quite a lot on this multicluster mesh kind of thing, right?

KASLIN FIELDS: Yeah, he mentioned that extensibility, and I think he used the terminology of like Kubernetes dependencies, which-- I have all sorts of things I want to talk about there. But he mentioned that that was kind of the direction that Cilium was going in, which I find really interesting because I think that is a big trend, and I think you mentioned that in the interview as well, that Kubernetes over time has gathered up a lot of these functionalities that were outside of it but kind of didn't make sense outside of it over time.

So it's definitely expanded to some degree, but I think that, going forward, it'll continue to be a big thing around like the things Kubernetes interacts with. That is often really the hard part of Kubernetes. The stuff within Kubernetes-- if it has the functionality that you need, that's great, and there's a lot that it does. But at some point, you go beyond the realm of Kubernetes itself, and you need some additional tools to help you out. And so Cilium is kind of focused in that space, which makes a lot of sense.

ABDEL SGHIOUAR: Yeah. Before advocacy, when I was in consulting, Cilium was the number-one tool that customers were asking about all the time, way before even we introduced it to GKE in the form of Dataplane Version 2.

KASLIN FIELDS: Interesting.

ABDEL SGHIOUAR: So way before that was the case, people were asking us, can we actually buy Cilium enterprise or buy the actual enterprise support from Isovalent and just run it ourselves? And we do support it-- if you want to run it, you can-- because people wanted those extra functionalities.

KASLIN FIELDS: Very interesting. And this kind of extension of Kubernetes of-- you get to a certain point in Kubernetes, and you're like, I need these things, and Kubernetes doesn't provide that. I used to talk about observability and a few security things, like MTLS and stuff. These are the kinds of things where I would talk about, here's service meshes, and they kind of fill these gaps. And you'll talk about the relationship between Cilium and service mesh a bit.

And I'm not sure that I totally understood it, so being new to Cilium, which-- now I'm like, how have I avoided Cilium for this long? And I need to fix that. But so it sounds like Cilium is kind of service mesh-adjacent. Does it have its own service mesh thing? What is going on here exactly?

ABDEL SGHIOUAR: So the easiest way to think about it is the following-- it has service mesh, but it's not service mesh that you might know if you think about service mesh in the context of Istio.

KASLIN FIELDS: Service mesh is a broad field, huh?

ABDEL SGHIOUAR: Correct. One of the conversations was about the fact that it's an overloaded term because the term "service mesh" was introduced by Istio based on how Istio was architected to work in the first place. But the term itself is generic, and it can mean whatever you want it to mean.

So when Cilium came into the picture-- and to their credits, they're not trying to compete. They're not saying, oh, we're better than Istio or we can do what issue does. They're saying, we do service mesh the way we think it should be done, which is pretty cool.

KASLIN FIELDS: Overloaded term.

ABDEL SGHIOUAR: Exactly, exactly.

KASLIN FIELDS: I remember in, like, 2018-ish, when service mesh was the big thing that everyone was talking about, trying to explain it was always a challenge, and it sounds like it's kind of like sidecar. I just put out a blog post on sidecar, which is a term that we use very frequently in the community, but it's just kind of a colloquial term, honestly. It doesn't have a specific meaning in the technology. It's just how we describe the pattern.

ABDEL SGHIOUAR: Correct.

KASLIN FIELDS: And service mesh is kind of similar. It's the pattern, I guess, of extending Kubernetes in--

ABDEL SGHIOUAR: Correct.

KASLIN FIELDS: --really in a broad definition of ways.

ABDEL SGHIOUAR: Yeah, I think really you can think about service mesh as-- again, whether it's based on sidecars or not based on sidecars-- because Cilium is not. It's eBPF-based, so there is technically no sidecars. But you can think about it as-- one way is to take the term "service mesh" and look at the "mesh." What does a "mesh" mean?

It just literally means that all your pods inside the cluster-- and potentially in various clusters, so multicluster is set up-- know about each other. And they can route traffic between them all the time because in a nonservice mesh scenario, we just use services of type load balancer, so the cluster IP, and then you have that virtual IP that you send requests to. And then it gets translated to a pod IPs. In the service mesh world, the pods know where all the other pods are, so that's one way of thinking about it.

And then the second way is Kubernetes networking is pretty dumb. It's pretty basic. It's IP-based, so it's IP port. Service mesh adds those functionalities about observability, about more fine-grained network controls based on identities instead of just a basic firewall and stuff like that. So yeah, again, I think we probably need to do a whole episode just talking about service mesh.

KASLIN FIELDS: Yeah, there have been several on the podcast before, but we should probably update some of that because it's been a while. I'd say that service mesh is not as big of a hot topic as it used to be, but it's definitely still around. And people are definitely still using it a lot.

ABDEL SGHIOUAR: Yeah, we should do some service mesh in 2024.

KASLIN FIELDS: That would be interesting, yeah. I have some ideas for that. We'll talk about that later. But--

ABDEL SGHIOUAR: Yeah, sure.

KASLIN FIELDS: --a couple more things quickly here. One thing that I thought was cool was that Cilium essentially works by using DaemonSets in Kubernetes.

ABDEL SGHIOUAR: Yeah, for the agent.

KASLIN FIELDS: Yeah, it's, I think-- I kind of hesitate to use this term, but kind of a best practice in Kubernetes. A lot of the time, folks want to run things on the nodes in Kubernetes, and they run them directly on the node to have root access to the node so that they can run-- especially the kinds of things that Cilium does, the kind of observability and the kind of system-level observability into your applications.

And they'll run that directly on the node, which I tend to see as an anti-pattern because you don't want to be messing around with the nodes of the Kubernetes cluster. As much as you can you want to be staying within Kubernetes. So I think it's really cool that Cilium is eBPF, so it's really at the Kernel level, but it's still staying within the abstraction of Kubernetes as well.

ABDEL SGHIOUAR: Yeah, yeah, that's very interesting. I had to go after the interview and do some snooping around the documentation to understand the architecture and stuff, but yeah, that's a very valid observation. It's interesting that they do the agent-- it's not only the agent for Cilium itself. It's even the agent for Hubble, which is the observability stack of Cilium. That also runs as a DaemonSet. I have tried this on GKE before. It works, actually. It's pretty cool. It gives you this like nice interface, and it tells you what's talking to what and the amount of traffic and all those things.

KASLIN FIELDS: I don't know if it works on GKE Autopilot. I know at the beginning of GKE Autopilot--

ABDEL SGHIOUAR: Ah, that's a valid point.

KASLIN FIELDS: --did not fully work because Autopilot puts in several security barriers and forces you into best practices. And one of those best practices that autopilot tries to force you into is keeping things at the Kubernetes abstraction layer, and so there are some things that are closer to the operating system, things that would require root access or access to your nodes that could be dangerous that autopilot doesn't allow you to do. And eBPF is one of those things that kind of skirts that line.

Bill talked a little bit about some of the security features of eBPF to protect your Kernel, but one of the biggest complaints that I hear-- reasons not to use eBPF that I hear is concerns about security because anything that you're doing that's going to run things at the Kernel level is, of course, going to be a little bit scary security-wise.

ABDEL SGHIOUAR: Yeah, of course. Yeah, so I tried Hubble on Dataplane Version 2, which is Cilium-based, but on Standard mode, not Autopilot mode. So I don't know if it works on Autopilot mode. I would assume not because I think the way Autopilot works is that we have special partnerships with companies that have tools that-- they call them security partners or something.

Certain things work that require privileged containers, but we have to explicitly allow them to work. It's kind of the opposite of how standard mode works. I don't know about Cilium. I think we'll have to do some digging to figure that out.

KASLIN FIELDS: Yeah, that's always kind of the tradeoff is, how much abstraction do you want versus how much control do you want?

ABDEL SGHIOUAR: Yeah, correct.

KASLIN FIELDS: Autopilot is one of those abstraction layers that takes away a bit more of that control, but it makes things a little bit easier to do and to do right, theoretically. So they try to put in some good guidelines, but Cilium will be one of those interesting cases that's like-- we'll have to look into it.

Last thing-- the beginning of the interview, what I learned about eBPF from this interview-- I think the way that I would likely describe it to folks is it's a good way to be able to run things at the Kernel level without having to get them into open source. I thought that was a really cool use case that Bill described, and so I feel like that's kind of the core of what it allows you to accomplish. What do you think, Abdel?

ABDEL SGHIOUAR: Yeah, I think that's a very good way of putting it. I remember that part from the conversation. Yeah, if you want to extend Linux functionalities, it will take you ages to get that accepted by the community and contributed upstream, and eBPF allows you to do that relatively quickly. So that's a very good way of describing it.

KASLIN FIELDS: I wonder if Linux maintainers have seen any impact on their workload from the introduction of eBPF.

ABDEL SGHIOUAR: I guess to a large extent we can draw a lot of analogies between how Kubernetes work and how Linux Kernel work in the sense that Kubernetes and Linux are trying to keep things under control, and they don't want to just like start accepting any sort of customization required. So what they're doing is that they're just building an abstraction layer. They say, as long as your thing talks this API, we don't care.

And I think Linux works a little bit like that in the sense that here is eBPF. You can extend it by yourself and write code inside this thing. It's your problem. We're just going to give you the API, and that's it.

KASLIN FIELDS: Yep. There have been a lot of parallels drawn between Kubernetes itself and the Linux Kernel itself over time.

ABDEL SGHIOUAR: It's the two biggest projects, open source projects, right?

KASLIN FIELDS: Yeah.

ABDEL SGHIOUAR: Two very successful open source projects, so.

KASLIN FIELDS: Yeah, so enormous.

ABDEL SGHIOUAR: Exactly.

KASLIN FIELDS: Anyway, thank you very much, Abdel.

ABDEL SGHIOUAR: Thank you.

KASLIN FIELDS: Yeah. We'll see you all next time.

That brings us to the end of another episode. If you enjoyed this show, please help us spread the word, and tell a friend. If you have any feedback for us, you can find us on social media at kubernetespod or reach us by email at <kubernetespodcast@google.com>.

You can also check out the website at kubernetespodcast.com, where you'll find transcripts and show nodes and links to subscribe. Please consider rating us in your podcast player so we can help more people find and enjoy the show. Thanks for listening, and we'll see you next time.

[MUSIC PLAYING]

View More Episodes