Kubernetes Podcast from Google: Episode 198 - Breaking Kubernetes For Fun and Profit, with David Flanagan

#198 March 27, 2023

Breaking Kubernetes For Fun and Profit, with David Flanagan

Hosts: Abdel Sghiouar, Kaslin Fields

David Flanagan is a developer, educator and technology enthusiast with a special interest for Kubernetes and Cloud Native technologies. David is the founder of Rawkode Academy, an online platform aiming at teaching kubernetes to developers.

One of the popular shows on RawKode is Klustered. Where david invites people to fix broken kubernetes clusters, learn a thing or two and have a laugh

Do you have something cool to share? Some questions? Let us know:

News of the week

Istio Ambient Mesh merged into the main branch

Kubernetes 1.27 changes and removals

k8s.gcr.io to registry.k8s.io redirect

Preview support for pod sandbox on Azure Kubernetes Services

Katacontainers

Docker apologies for handling Free Teams deprecation

Schedule for CNCF-hosted and colocated events is up

Kubernetes WithOut Kubelet

CrowdStrike Discovers First-Ever Dero Cryptojacking Campaign Targeting Kubernetes

Links from the interview

David Flanagan

How Spotify Accidentally Deleted All its Kube Clusters with No User Impact - David Xia

You probably DON’T need a service mesh

Klustered episode with Abdel and Marek

Docker first release at PyCon 2013

KubeHuddle 2023 Toronto

Kubernetes Failure Stories

Kubelete runOnce flag

Events worth checking

DeveloperWeek Europe 2023 (26-27, April) (Link gives 100€ discount)

Transcript

Show full transcript

ABDEL SGHIOUAR: Hi, and welcome to the "Kubernetes Podcast" from Google. I'm Abdel Sghiouar.

KASLIN FIELDS: And I'm Kaslin Fields.

[MUSIC PLAYING]

ABDEL SGHIOUAR: This week, we spoke to David Flanagan, the founder of Rawkode Academy. We discussed David's background and career, and current work at Rawkode.

KASLIN FIELDS: We also spoke about "Klustered," a show where David invites people to fix broken Kubernetes clusters.

ABDEL SGHIOUAR: But before, let's get to the news.

[MUSIC PLAYING]

KASLIN FIELDS: The Istio ambient mesh codebase was merged into Istio's main branch. The new data plane mode without sidecars was introduced as an experimental branch in 2022. This is a significant milestone for ambient mesh, paving the way to releasing ambient in Istio 1.18.

ABDEL SGHIOUAR: The Kubernetes Project published the list of API removals and major changes coming to 1.27. The list of changes is long, so we will leave a link to the article in the show notes.

KASLIN FIELDS: Starting the week of March 20, traffic toward k8s.gcr.io is being redirected to registry.k8s.io. New Kubernetes images will only be published to the new registry. If you own anything that references k8s.gcr.io, you should update those references to registry.k8s.io. Check the show notes for more details.

ABDEL SGHIOUAR: Microsoft announced a preview release of Pod Sandboxing in AKS-- Azure Kubernetes Services. The sandboxing feature leverages Kata Containers to provide hypervisor-based isolation for pods.

KASLIN FIELDS: Docker announced and has since apologized because they are sunsetting Free Team subscriptions. The announcement spilled a lot of ink on the internet, with many open source maintainers expressing unhappiness with how the communication was handled. Docker has since published a blog post in which they clarified the impact of this change. If you are in the 2% of Docker users impacted, you should check the link in the show notes for more details.

ABDEL SGHIOUAR: The schedule for CNCF-hosted and collocated events for KubeCon Europe 2023 in Amsterdam have been published. We are also hosting a Container Day at KubeCon at our office in Amsterdam. We will leave links in the show notes for details.

KASLIN FIELDS: The Kubernetes community introduced KWOK. KWOK stands for Kubernetes WithOut Kubelet. This new tool allows you to create clusters with thousands of nodes in seconds. It's built for various use cases, like learning, development, or testing, and works with any Kubernetes API-compliant tool, like Helm and kubectl. Check out more information in the show notes.

ABDEL SGHIOUAR: In security news, CrowdStrike said they started to observe the first ever Dero cryptojacking operation targeting Kubernetes clusters. Dero is a decentralized cryptocurrency aiming to compete with Monero. The attack targets clusters with anonymous API access enabled and non-standard ports. For more info on this and how to protect your clusters, check the show notes.

KASLIN FIELDS: And that's the news.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Hi, everyone. This is Abdel with the "Kubernetes Podcast" by Google. And today, I have the pleasure to interview David.

David is a developer, educator, and technology enthusiast with a special interest for Kubernetes and cloud-native technologies. He is the founder of Rawkode Academy, an online platform aiming at teaching Kubernetes to developers. And he's also involved in a lot of things.

And we will get a discussion. We'll get an opportunity to discuss with him about everything he's doing. Welcome to the show, David.

DAVID FLANAGAN: Yeah, awesome. I'm really excited to be here. Thank you for having me.

ABDEL SGHIOUAR: Thank you for being with us. We decided to title the show "Breaking Kubernetes for Profit and Fun, or for Fun and Profits." That's because I had pleasure to be with you on "Klustered," one of your live streams, right? And we will get to talk about that a little bit, but I thought it was a lot of fun.

DAVID FLANAGAN: Yeah, I think there's something very interesting about having something that is very, very much intentionally, maliciously broken and trying to work backwards from it. But like you said, we'll talk about that very, very soon.

ABDEL SGHIOUAR: Yeah, yeah. So all right, let's get started then with the question we ask everybody on the show. Can you introduce yourself?

DAVID FLANAGAN: Definitely. Of course. So my name is David. My accent is mostly Scottish, although I've been working remotely and internationally now for so long that sometimes, my wife will say to me, what even is that voice? So I'm not sure what my accent is anymore, but I am Scottish, and I'm still based in Scotland.

I've been a developer for around 20 years now. I started off in the early 2000s working in Perl. And I worked with everything in between, moving into DevOps and my mid career, Kubernetes six years ago, and developer advocacy for the last five or six years. I really enjoy teaching.

One of the things I found out earlier in my career is that dev tooling is more interesting to me than product. And helping other people be successful is more important to me than selling that product. So being able to be a developer advocate in the DevOps, and cloud-native, and Kubernetes space is just like the perfect job for me. And I'm very lucky that for the last six months, I've been able to do that for myself with my own company, the Rawkode Academy.

ABDEL SGHIOUAR: Nice. I actually appreciate, and I have a lot of respect to people who have patience for teaching people.

[LAUGHTER]

It's true that's what we do in developer advocacy is mostly teaching, but I think you are taking more of an educational aspect to it, which is quite interesting. I tried being an actual teacher, and I just hated every single moment of it.

DAVID FLANAGAN: Yeah. I don't know if I'd be good as an actual teacher, but I think developer advocacy is one of those really sweet positions, right? We get to be part marketing, part support, part developer or sales, part sales.

It's just this really wonderful role that encompasses everything. And you get to inspire, and educate, and support people all at the same time with really cool technologies. It's been an absolutely wild ride doing DevRel, but I love it.

ABDEL SGHIOUAR: Nice, nice. Let me go back a little bit to your background because you have been around, you have been doing stuff for a very long time. You have a pretty impressive background, but you started as a developer, right?

DAVID FLANAGAN: Yeah, that's right. In the early 2000s, I left school and was working at a bank scanning paper on OS2 machines. It was not every bit as wonderful as I sometimes try to remember it.

And I started going to college. And I think what I found out early for me is I'm really just not good in standard schooling education. It didn't work for me. And in fact, I failed standard grade computing.

I was terrible at it. But what got me into computers and working well with computers was there was one summer I got into Telnet Talkers and MUDs. I don't know if you remember these, but these were really old-school text-based chat rooms.

Telnet Talkers ran on the Telnet protocol, where you'd just say hello. And someone else would respond and say hello back. And it was amazing.

And then I got into MUDs, which were like RPGs, but text-based on the same protocol. So you would say enter room. And it'd give you a description, and you would say, oh, pick up shard, and other weird stuff like this.

And then I got really curious about-- I know you're wondering, where's he going with this weird story about old-school chat rooms? But I wanted to make changes to one. I said I wonder how you do this? And I just got the source code.

This was open source, although we didn't call it open source back then, but anyone old enough to remember-- there was Freshmeat, and there was-- what was the one with all the source? I'll come back to it a bit later. But we had open source, but it wasn't the same as it is today.

And I downloaded this Telnet Talker code base. It was all written in C. It was pretty gnarly. There was no such thing as tests because we didn't write tests 20 years ago.

ABDEL SGHIOUAR: Of course.

DAVID FLANAGAN: And I started just reverse engineering it. And then that's when I realized that me sat in a classroom with 30 other people and somebody talking at me for an hour was never going to capture my attention. I was never going to learn that way.

I very much learn through doing, through experimenting, through breaking. I've always had this recurrent theme of I'll take something that doesn't work the way I want it to and make it work the way I want it to do. And I got really good at programming in C over that summer holiday.

So I ended up going to college. I didn't last very long. I dropped back out.

And just through sheer luck, I started applying for developer jobs by working in this bank. And eventually, a company took a chance on me. And actually, I spoke with Adam, who was the person that hired me at the time.

I was like, why did you give me a chance back then? I said I knew Perl came to the interview. They asked me the most basic Perl question.

And I remember it vividly. What does the my keyword do? And I just bluffed it and got it wrong, but they still gave me the job.

And he told me many years later, oh, you were talking about the computers you had in your house. And I had six at a time. And I was talking about reverse engineering Telnet things. And he's just like, you took a leap of faith.

And I owe a great deal of gratitude to Adam for that because I've been a developer ever since. And it was a really wonderful job, too. It was a very small startup in Scotland that is still a company today, in fact.

I think most horse racing tracks in the UK still use the system that I helped build, the teching system, which is wonderful. But because it was a small company, I was writing C code for the backend daemon. I was writing C# for these handheld devices to scan tickets at the entryway to the racecourse.

I was working on a CRM system that was all written in PHP, and then the bespoke reporting system that they primarily focused on written in Perl. So I got to work on all this wonderful stuff even to the point where we had to build an EPOD system, like a touch screen point of sale system.

And I was like, hey, I know enough about Linux. And I started experimenting with QT. And I wrote a UI. And I had to modify the kernel to have a shiny image when it puts up with the company's logo.

And all of this-- there was no instructions back then. We didn't have Twitter. We didn't have GitHub documentation. We didn't have YouTube for tutorials.

You learned by man pages, and breaking, and experimenting, and just hoping that eventually you would work it out. So yeah, great first rule for me. I learned a lot. And I just took that experience into every other position, eventually working for a company called TeamRock, where I actually was their director of development, which is a big fancy title for just writing too much code and helping other people do it.

But they were-- I'll try and to pivot this towards Kubernetes now for you, right? But they were an old school media company. They owned magazines-- "Metal Hammer" magazine, "Classic Rock" magazine, "Prog" rock magazine. I'm a complete metal head, so it was a dream job for me.

And we were trying to turn this old-school media company into a 21st century digital transformation project. So we got all their magazines online. And we had this TeamRock homepage that had all the stories with interviews with bands and everything, but we also took that as an opportunity to do a cloud migration.

So this was the first time this company had ever been on AWS. And we were doing virtual machines, and learning all the tooling that had to go with that, like Puppet, and Chef, and SaltStack, and Ansible. And it was crazy, but very good for experimentation.

ABDEL SGHIOUAR: Nice.

DAVID FLANAGAN: Their biggest challenge was scale. Now, let me try and summarize their challenge. Most websites have very predictable scale.

You can see this website has good traffic between 9:00 AM and 10:00 AM because people want to get to work. And then they go read something. This website has the same scale because it's shopping. They scale up at Black Friday, they shut back down, done.

The news world is really different, and particularly niche news, like metal news and rock news because a new album dropping can cause your website to go from 200 requests per second to 20,000 requests per second. And that's quite a large differential rate. You have to react really quickly.

Sometimes, album out dates can be predicted, you can predict because you know when they're coming out. And maybe you know when the interview is coming. And you can scale ahead.

But the other thing-- and our doomsday scenario that we always try to plan for-- was, well, what if Lemmy from Motorhead were to die? And unfortunately, he did die. And we had to scale up for this, but that news is completely unpredictable.

That's where our site goes from two VMs or four VMs or eight VMs to even hundreds of VMs very, very quickly. And that was really hard to do for us, for infrastructure. We were, for a while, backing golden images because if you try to scale up a VM, and use cloud in it, and provision everything just in time, it can be very slow.

And then you're scaling. You have to scale over the odds, and then cut back very quickly. Golden images were like, OK. At least we can start giving people 200 response codes a lot quicker.

But really, the problem which is VM technology. And I was very, very lucky to be at small Python conference when Solomon presented Docker. And I was like, whoa.

ABDEL SGHIOUAR: Oh, wow.

DAVID FLANAGAN: This is cool. So this was very early Docker. I think we shipped our first Docker container in production at Docker 0.4, 0.5.

ABDEL SGHIOUAR: OK.

DAVID FLANAGAN: The Docker file had just been released. Before that, there was no Docker file. It was tarballs all the way, but the Docker file just came out.

We started to build in containers. We were deploying them to the VMs. And things were great. We were able to scale up and handle much more traffic than we ever imagined.

In fact, a funny story is that our biggest scaling point or [? talent ?] after that was AWS because when you asked for an ELB, actually, they provision it based on your previous scale. So if you're trying to answer unprotected scale or unprecedented scale, you have to phone them and say, hey, scale this up for me, which is obviously not going to work for us.

ABDEL SGHIOUAR: That's how cloud is supposed to work, right?

DAVID FLANAGAN: No, exactly. We're promised unlimited scale. And it doesn't quite work out that way.

ABDEL SGHIOUAR: Yeah, we had a couple of discussions actually with some guests about this. There is this myth in cloud that people think it's unlimited, but it's unlimited until you hit a limit, basically. And you have to call somebody.

Well, that's actually pretty impressive. You jumped a little bit ahead because I had a question about that TeamRock experience you had there. And you summarized this beautifully.

When you are working in the news world, you cannot predict what's going to happen. And especially the death of a famous person, right? And you were an early adopter of Docker, 2014. That's when Docker actually was announced the first time in Python, I think, in the US, if I'm not mistaken.

DAVID FLANAGAN: It was announced in 2013. And then we got our first container into production probably middle of 2014 after a lot of experimentation and just trying to make sure we could do what we had to do with it.

ABDEL SGHIOUAR: Nice.

DAVID FLANAGAN: But yeah, it worked well.

ABDEL SGHIOUAR: Nice. So then let me ask you a question because there are a bunch of people that will be listening to this show that come to the show with a preconceived notion that we're going to talk Kubernetes and Docker and stuff. And there are people who are starting their careers in this world of cloud-native and Kubernetes, but you were there before Kubernetes existed and before Docker existed.

And I am pretty much the same. I have the same experience. I actually did VM-based stuff and had to deal with scaling and everything. So does it feel to you that as an industry, generally speaking, we are just creating abstraction layers to try to make our life easier, but then once you have more abstraction layers, you have more problems? Or do you think stuff like Docker are really transformational in the way people think they are?

DAVID FLANAGAN: I think Docker is definitely transformational. I don't think Kubernetes is.

ABDEL SGHIOUAR: Hot take.

DAVID FLANAGAN: I think it's a good set of tooling on top of Docker, but I think Docker was transformational. We had cgroups, and we had the concept of a container before Docker, but Docker is the one that got it into people's hands. It's the one that brought a good and strong developer experience.

Just look at Docker fail. Anybody can pick up a Docker fail and piece it together. They know what it's doing. There's quirks, right?

Oh, why are we chaining commands together? Why does change ownership have this weird flag syntax on a copy? Yeah, these all have reasons.

And if you've ever built a 12-gig Docker image, you know why the change ownership flag is there rather than doing that change on itself and then run. But Docker just did change everything, especially if you worked with VMs. Even before Vagrant, right?

Vagrant actually changed a lot as well. Building and working with virtual machines was really, really painful. Just because we're talking about transformational tools, and Docker is definitely up there for me, another one is Puppet, right?

ABDEL SGHIOUAR: Yeah.

DAVID FLANAGAN: I worked in bare metal from 2003, 2002. And our idea of rolling out a new machine to a new racecourse was me to drive to the racecourse, and sit there with a keyboard, and run a bunch of scripts that I maybe brought on a USB key. But then there's changes from the last time that hadn't ran yet.

So then I'd have to make those ad hoc, and then run the commands, hope I got a working system. But there was no really concept of automation there, other than bash scripting itself. And then when something went wrong, I was the one that got woken up at 3:00 o'clock in the morning, not to open my laptop and fix something, but to jump in my car and drive seven hours down to the south of England to the racecourse, get on-site at 4:00 AM, 7:00 AM, whatever time I got there, and then sit in front of a computer again and fix it, and then drive home.

It was not pleasant. So I adopted Puppet really, really early as well. And it's just because you see something. You understand the problems. And then it just immediately resonates with you. And I love technology like that.

ABDEL SGHIOUAR: Nice.

DAVID FLANAGAN: Puppet was great. We automated our infrastructure. I could wake up at 3:00 AM in the morning, delete log fails from anywhere and go back to sleep. And Docker helped us handle scale that we generally just didn't know how to answer for.

And Kubernetes brings us other stuff-- not transformational, but Kubernetes is also a project that I've been heavily involved in for a number of years. And it's changing the way that we build platforms, changing the way that we scale our infrastructure, and just yeah, really, really cool.

ABDEL SGHIOUAR: Nice. Man, you're talking. And I am just remembering stuff. You're bringing me back days years ago-- specifically, the stuff about bare metal.

I remember very well walking into a data center, opening the rack, and putting a KVM, and having to decide which server do you want to connect to. You have to select the right one with the right combination of keys. And this is a funny story.

I remember we had a customer in one of the companies that I was working for. They had their rack with a KVM. And the password to the database was a Post-It note inside the KVM.

DAVID FLANAGAN: Hey, we've all done that, right? Come on.

ABDEL SGHIOUAR: Yes.

DAVID FLANAGAN: Of course.

ABDEL SGHIOUAR: Actually, the last episode with Emily, we were talking specifically about this, and about like the online password managers that get hacked every other week. And Emily said that in some Slack channel she is on, she said that some people wrote-- I cannot believe I'm recommending this, but probably writing stuff on a Post-It note under your keyboard is more safe than having it online.

[LAUGHTER]

So it's quite interesting how we're even shifting our recommendations to how people should do things, right?

DAVID FLANAGAN: Yeah. It's probably safer to have all your passwords in a book or a Post-It than LastPass these days, right?

ABDEL SGHIOUAR: Yeah, well, that's one of them, right? All right. So you you've been doing a lot of bare metal stuff. You worked also in a bare metal-focused company for a while-- Equinix, right?

DAVID FLANAGAN: I did. So I did TeamRock for five years. That was wonderful.

It really got me into cloud-native and Kubernetes because we had that level of scale, or at least unpredictable scale that required a lot more experimentation. I can look back at my career. I've always been experimenting, so even if I was in a simple scale, maybe I'd still be playing with Docker and Kubernetes. I'm not sure.

But I did consultancy for a while after that. And I did a lot of public speaking at local user groups in Scotland, and moving up and down to London. And I thought-- no, I hated it at first. I think we all go through this journey, right?

I'm sure you've done enough talks, but the first one you hate, you're never going to do it again. The second one you hate, you're never going to do it again. And then at some point, you start to enjoy it.

And there's a whole journey that you go-- I still get really nervous. I still hate it, but it's just so rewarding as well, being able to take experience and knowledge and distill it in a way that other people can learn from it. And that's when I moved on to DevRel.

I did two years at InfluxData working with Influx DB, did some time at Equinix Metal, more recently Pulumi. And I just went full-time in doing it for myself, but it was great going to Equinix Metal and being back working with bare metal, but with cloud best practices as well. And I still think Equinix Metal's a great product, and I use it as much as I can. But being able to do a Terraform or a Pulumi and get a real server with real cores and real RAM is just awesome.

ABDEL SGHIOUAR: Yeah. I tried Equinix Metal. It was not called-- it was a company that Equinix bought at some point.

DAVID FLANAGAN: Packet, yeah.

ABDEL SGHIOUAR: Yes. Packet Labs, I think it was called, or something like that.

DAVID FLANAGAN: Yep.

ABDEL SGHIOUAR: And I tried it around the time when Anthos was released because it was the easiest way you could get hard bare metal servers that you can try things on. And that's pretty much what multicloud is all about, or at least what Anthos is all about. And I was pretty impressed, actually, by how good it was. It was a really good product because at the time-- and this is not very-- it's probably a couple of years ago. But even a couple of years ago, I think that they were the only one doing bare metal with that level of automation, right?

DAVID FLANAGAN: Yeah, their automation is now spun out to be a CNCF project of its own called Tinkerbell. So anybody can really go and take that stack. If you know enough about bare metal, it's not that difficult. It's like a DHCP server, an iPXE server, and some automation all glued together that does really magic things.

But I really strongly believe that hybrid architectures, a mixture of bare metal and virtualization, are really important for people at a certain level of skill. You can do really well with virtualization on its own at the start. But at some point, you're going to need to bring in bare metal because you need the bigger disks.

You need the IOPs. You need the machine learning models. Whatever that use case is, there's definitely enough of them out there.

ABDEL SGHIOUAR: Yeah. Yeah, yeah, yeah. When I tried this product, it was pretty cool. And then you talked a little bit about your journey towards cloud-native and Kubernetes.

And then a few years back, you started Rawkode. And the first time, actually, we talked was when you invited me to "Klustered." So can we talk a little bit? What is "Klustered?" Can you explain to people what it is?

DAVID FLANAGAN: "Klustered" is a masochistic game show live stream that runs on my YouTube channel. And the entire premise started off with me spinning up some bare metal Kubernetes clusters, and giving them to people, and saying break them however you want. As long as the machine turns on, anything goes.

And the whole mission and plan was we would sit and try and get those Kubernetes clusters working again. So I had a little bit of hubris or overconfidence in my Kubernetes operating skills. Haven't done it for a number of years, but it's just been so much fun because people are cruel.

That's the number 1 lesson is that people can be very, very cruel when you give them a cluster and tell them to break it. But it's also through trial and tribulation. And it's through failure, right?

The only way you ever learn anything is through mistakes, through failure. Otherwise, you've stopped learning. And "Klustered" is a unique environment where you're forcing the situation of going into the unknown, and not just fixing the problems and learning to see what's broken, but getting into someone's head and seeing how they debug. Like you, right? You are so calm and--

ABDEL SGHIOUAR: Methodological, yeah.

DAVID FLANAGAN: Yeah, yeah, yeah. That's the word. Let's go with that one.

Just the way that you step through the process-- and we've had other people that have been more of the chaotic good, where it's just, ah! We're going to go and do this. We're going to do this.

And then you get the people that are more calm. But getting into that mindset, seeing how people approach problems, how they work backwards, the tools that they use has just been the-- I have learned more from doing "Klustered" than I've done from anything else in my entire career. And that was me going into Kubernetes, as someone who knew Kubernetes very well, but I'm coming out of it 50, 60 episodes later, and over 100 broken clusters with the knowledge of those 50, 60 guests that joined me, or at least some of their knowledge. And that's just amazing to me.

ABDEL SGHIOUAR: Yeah. Yeah, there is a certain level of experience you gain when you do hardware. This is at least was my experience because I'm coming from the hardware background, right?

And when you do a lot of troubleshooting and diagnostics, which I bet you had to do back in the days of TeamRock, and migrating from virtual machines to containers, et cetera, you build in a certain-- I don't know if the word is correct, but you build the logic that you can use. You have methodology in your head. And also, when you have an understanding of the entire stack, and not only the application layer, right?

You know the network. You know the blah, blah. Then you can actually go through those layers and try to fix, or try to see at each layer what could be broken.

DAVID FLANAGAN: Yeah. The cloud adoption-- if we look at it, we've had AWS Cloud for what? Since 2008?

ABDEL SGHIOUAR: Yeah.

DAVID FLANAGAN: That's a long time-- 15 years. And there's now people that are seasoned developers with 5, or 10 years, or 15 years experience who have only ever known cloud. That's amazing for them, right? Fantastic. Do they know what BGP is?

ABDEL SGHIOUAR: Oh, yeah. Probably not.

DAVID FLANAGAN: Probably not. Do they need to know? Maybe, right? It depends.

So yeah. Some things you learn. Some things you have to be shown. And "Klustered" gives us a little bit of both, I think. I remember in your episode, actually, you were in an episode with Marek, right?

ABDEL SGHIOUAR: Yes, yes, yes.

DAVID FLANAGAN: Marek, yeah.

ABDEL SGHIOUAR: Yes.

DAVID FLANAGAN: That was the episode where I had no idea that this flag existed on the kubelet, but the runOnce flag--

ABDEL SGHIOUAR: Yes, I do remember that.

DAVID FLANAGAN: Yeah. And I remember this because I spent a lot of time going through the GitHub repository to find the history of why the flag even existed. And to me, it's these little nuggets of knowledge that I just think are so much fun because now I get to go on a stage and tell people all about "Klustered," and then all these little bits of knowledge, too. And the runOnce flag was actually added to the kubelet so that we could have just a kubelet CLI on a host with the static manifest directory, start it up once, and then move the kubelet into the container itself.

ABDEL SGHIOUAR: Yes, yes. I remember that very well. I was also surprised to learn. Because when you know Kubernetes, you know that it's designed to just keep running containers when they fail. Or if a pod crashes, it should just restart.

And it was a big surprise when I saw this flag, which just ran the pod once. And if it crashes, it don't do anything. So it was quite interesting.

I remember that in the episode, I gave you a little bit of a hard time with networking. So there was some DNS stuff. And there were some networking things that I have screwed up. I don't remember details.

DAVID FLANAGAN: To be honest, looking back at all of the episodes of "Klustered" now, there are a few things that just strike fear into even the most seasoned developer. And networking is one of them. There's no need anymore for people to really think about what the contract table is, but it's still a big part of the networking stack in a Linux machine.

There's not a lot of need for people to understand the difference between IP tables and NF tables, but they still exist. And depending on the kernel, you have to interact with them slightly different. EVPF is obviously changing this, but networking scares people.

DNS scares people. NCD scares people. There's common breaks and themes that people-- they know enough to survive and do just this, but they're not really pushing their deep technical knowledge of it. Again, this is why clusters exist, right?

Is "Klustered" there for someone who's never used Kubernetes before to come and learn Kubernetes? No, no, no. Definitely not. "Klustered" exists for the people that have been in the cloud-native and Kubernetes space for two years, for three years, for four years. And they want to push that knowledge deeper and deeper and deeper, and handle and learn from failures.

There's a great website. And we've seen a couple of talks about this at KubeCon over the years. And it's the k8s.af, which is the Kubernetes failures horror stories. I think it's by Henning Jacobs. I'm not sure if I'm remembering his name.

ABDEL SGHIOUAR: We'll find a link and add it to the show.

DAVID FLANAGAN: But he's done talks on their scaling failures. And other people have shared their scaling failures. And it's like, yeah. This is the best way.

Written postmortems is the best way to predict future outages for your organization. And "Klustered" is an evolution of that. Let's make it more engaging, more fun, and get smart people in front of a camera and say, hey, go fix the broken thing. I love that.

ABDEL SGHIOUAR: That's nice. Can I ask you a question. You mentioned something that I relate to earlier. You said people, developers are afraid of networking. Why do you think that's the case?

DAVID FLANAGAN: Because unfortunately, the tooling to debug networking has not-- well, until recently-- improved in the last 20 years. If you have a networking problem, you have to get low level into the kernel looking at contracts, looking at IP table chaining rules.

ABDEL SGHIOUAR: Oh, yeah.

DAVID FLANAGAN: Really, who remembers how to switch the chain, and look at this, and see what's happening? They're just not really skills that stick in your head because they're things that you use so infrequently, typically. Obviously, network engineers are in it day in and day out.

Another networking problem is the tooling TCP dump. Who likes running TCP dump? Nobody likes running TCP dump.

Fortunately, EPPF again is changing this. EPPF is just an absolute superpower that I try and encourage everybody to get comfortable with because if we look at what Hubble and Cilium are doing for the networking stack, you can visualize your traffic and understand why packets are dropping or being rejected in any layer-- layer 4, layer 7. It works with both, depending on the protocol.

And yeah, it's getting easier. But still, if you put someone on a terminal on a Linux machine and say debug the network, it can be a huge challenge. Because the tooling really just isn't there to support you.

ABDEL SGHIOUAR: Yeah, that's actually a very valid point. I think also, the other reason is because developers are, generally speaking, trained to use abstraction. So when you use a framework, you're essentially abstracting away a bunch of things that you don't want to care about the implementation. And networking definitely cannot be abstracted.

It's a very detailed implementation of a certain way an operating system works. And you have to understand how it works in order to debug it, right? And then when you have to do networking in a distributed environment like Kubernetes, it's even harder because of the nature of how the system works.

DAVID FLANAGAN: Yeah, exactly.

ABDEL SGHIOUAR: Yeah. So we will have a note in the show for people to go watch "Klustered." You told me on Twitter this is coming back in a new format?

DAVID FLANAGAN: It is. Yeah, we're changing things up a little bit. So it's been difficult to find people to join me on "Klustered." So the first thing I'll say is if anybody wants to have some fun fixing a broken Kubernetes cluster on a camera on a live stream, then definitely reach out to me because I would love to have you.

But also, the other thing that people-- let me back up a little bit. We've had now 50, 60 episodes. We've seen a lot, right?

And people are getting a lot more creative with the way that they break the cluster to the point where now, some of the breaks aren't really real-world things that could happen. Now, they're enjoyable to watch. Who doesn't love a remote Cron trigger running EPPF probes, dropping all your traffic rate?

But is that going to happen in your production environment? No. Is it funny? Yes.

Am I learning something? Questionable. Right?

So the new version of "Klustered," which we'll be bringing out in just a couple of months, close to KubeCon, is changing it in a way that we have a real production-like environment. Now, traditional "Klustered"-- you've got a Kubernetes cluster with kubeadm in it and a fake workload that I wrote that has no logging that speaks to Postgres, which has no state because it's all done via init containers.

And that worked great for 50, 60 episodes. But we're not using enough of the CNCF tools out there that people are using in production. We want to use Cilium more. We want to use Valero more. We want to use the test more.

We want to bring in all of these stateful components and microservices, and actually have networking policies that aren't contrived, so that when people look at the cluster, it looks like their cluster. They have 20 services, 100 services, whatever that might be just to open up new ways for breaks to come in that require more intuition and debugging, rather than just looking for malicious intent, if that makes sense.

ABDEL SGHIOUAR: Yeah, yeah. That's definitely a beautiful way to put it. I would say, I would add to that that if somebody wants to go back on the show, just go look at some postmortems of how people have had issues scaling Kubernetes because then they can learn from that and try to implement it in an actual cluster. It's not the basic stuff that has been done in the-- I wouldn't call it basic, but not the same stuff that people faced in the last 50, 60 episodes, but more, as you said, production-like environments, right?

DAVID FLANAGAN: Exactly, yeah. Some of my favorite talks at KubeCon over the years have been the-- I deleted my production cluster, and I'm going to show you how. Those kind of stories, I think, are great because they show how simple misconfigurations can actually have serious consequences on your production environment.

And the one from some former colleagues of mine at Influx-- they had a GitOps pipeline. And there was a typo and a gitdef. And they showed the gitdef. And you couldn't really tell that it was a typo just because it was hidden through whitespace changes and a whole bunch of other stuff.

But that was enough to delete an entire namespace on the production cluster, which deleted all the workloads that went with it. So it's so easy, done. And the team at Monzo has done talks like this. The team at Skyscanner has done talks like this. I love talks like that.

ABDEL SGHIOUAR: Yeah, I think I agree with you in the sense that those stories about how I broke something, or maybe those stories of how we tried certain [? code ?] that didn't work for us-- you probably don't need the service mesh talk I gave a couple of times. Those are quite interesting because the idea there is you're learning from somebody who has done this before. And then they tell you, yeah, we tried it.

It didn't work. Or we tried it, and we screwed this up this way. So I see value for one-on-one talks. I see value for people sharing stories, but I do as well like the war stories, basically.

DAVID FLANAGAN: Yeah. They are generally, to me, the best way for anybody to learn, and expand their knowledge, and gain experience from other people without necessarily having to make the same mistakes. I give a talk now at conferences called "What I Learned Fixing all these Broken Clusters."

And the key message of the talk is not here's tips and tricks to help you operate Kubernetes. The key moral of the talk is, well, everybody and everyone that's been on "Klustered" has been comfortable saying I don't know. And actually, that's the true superpower.

And we actually need to encourage people. We have this hero culture in technology. When projects are going wrong and bugs are annoying customers, there's always that one developer in your organization that swoops in, and fixes it, like, wee, they know everything.

But actually, they just know more because they've made more mistakes than everybody else. And we should normalize saying I don't know, and it won't happen again. We all learned through failure.

ABDEL SGHIOUAR: Yeah. The best way to not look like a crazy person is just say I don't know. It's OK to say that. So cool.

So we will add the link to your "Klustered" and to Rawkode because Rawkode is more than just "Klustered." "Klustered" is one show, but you have also other live streams where you teach people other technologies, right?

DAVID FLANAGAN: Yes. We have almost 400 episodes now on the YouTube channel. I've covered pretty much every CNCF project. And the way that the format of my channel works is that I want to experiment with technology, so I reach out to the maintainer and I say, hey, come and join me for 90 minutes.

And I think that format is great because one, you get to see someone new to technology playing with it, which is me. I am very good at playing the dummy because I am a dummy. And then whenever I have mistakes, or I want to know-- I'm a very curious person. Oh, why is this flag called that? Why is the project called this?

Why does it even exist? Then the maintainer's sat there right next to me. And I say, hey, why don't you answer this question? And then we get all this flavor, history, and context about the technology as we demo it as a newcomer. And I think the format works well. And I hope people get a lot of value from seeing all of these many, many, many, many, many technologies in the cloud-native ecosystem.

ABDEL SGHIOUAR: Nice. And as an added benefit, I would say you get to hear somebody talking in a Scottish accent. So you get to learn a little bit of Scottish, I guess.

DAVID FLANAGAN: Exactly. Right, yeah. Definitely.

ABDEL SGHIOUAR: So next time you are at Glasgow, you're not going to be surprised. Hey, I've heard this before. So you are also running a conference called KubeHuddle.

DAVID FLANAGAN: That's right. So last year, I went to KubeCon in Barcelona. It was great, right?

KubeCon is always good, but it's so big now. We're talking 8,000 people plus. I don't even know what it's going to be in a couple of months' time.

So it's always so busy. There's so many tracks-- 12 tracks, 20 tracks. I've lost count. The sponsors showcase-- a lot of us are working when we're at KubeCon.

So it's just always very fast-paced. You don't get a lot of time to really sit and connect with people and talk about technology. And I just thought it would be really cool if we had some smaller community-focused Kubernetes conferences.

So after KubeCon last year, I booked a venue within three weeks. And I managed to put a conference together in four months. And we had the first KubeHuddle in Edinburgh in Scotland early October last year.

And it was amazing. We had 200 people. We had 30 speakers, lots of great conversations. And the socials were fun.

Everybody was enjoying themselves. And Marino, who I'm sure hopefully some people are listening are familiar with-- Marino Wijay-- decided, hey, can I do this, but in Canada? And I was like, yeah, let's do it. So we've got a team together.

And the KubeHuddle Toronto will be happening in May 2023. So the second edition of KubeHuddle. And we already have interest for a Berlin one next year. So obviously, what started as just a bit of frustration and trying to get some people together for a conversation has resonated with other people. And now we're seeing hopefully more editions of KubeHuddle pop up all over the world.

ABDEL SGHIOUAR: Nice. I think we should probably talk about bringing the show to Stockholm.

DAVID FLANAGAN: Yeah, we should definitely get a KubeHuddle Stockholm. I love coming to Sweden.

ABDEL SGHIOUAR: We tried doing a KCD last year, but it didn't really go anywhere. Surprisingly, there has never been a Kubernetes-focused conference in Sweden so far. So we tried KCD, but things didn't work out. So I think we should probably look at something else.

DAVID FLANAGAN: Yeah, definitely. I think that's a wonderful idea.

ABDEL SGHIOUAR: Nice. We will leave a link to the conference in the comments of the show. I like actually one thing you said when you started talking about KubeHuddle. You said last year, which was obviously, KubeCon Barcelona was three years ago. But I think like me, you just ignored the two years of COVID. You consider that they didn't happen.

DAVID FLANAGAN: Oh, what? Wait. Oh, no, no. You're right. Barcelona was-- it was--

ABDEL SGHIOUAR: Valencia.

DAVID FLANAGAN: Valencia?

ABDEL SGHIOUAR: Yeah, last year was Valencia. OK. So now, OK.

DAVID FLANAGAN: Yeah, yeah. The Spanish KubeCon-- the other Spanish KubeCon.

ABDEL SGHIOUAR: Yes. Because the one before the COVID was Barcelona. And the last was--

DAVID FLANAGAN: Yeah, that's right. Yeah, sorry. Too many KubeCons now.

But it's still a great event. I still encourage everybody to go. It's a fantastic opportunity, but I think we need smaller spaces for people to build Kubernetes and get local people talking to each other, work out what other organizations within your country and your city are doing with Kubernetes. Yeah, definitely a lot of value there.

ABDEL SGHIOUAR: Yeah. This is something that actually, we had again with Emily the last episode. We were talking about the CloudNativeSecurityCon, which is still under CNCF, but the idea was creating something smaller with a set of people that have the same mindset. Just meet and talk because KubeCon is huge. And as you said before the show when we were chatting, you don't get a chance to talk to everyone you want to talk to because there's just way too many people in there.

DAVID FLANAGAN: Yeah, I don't get to speak to everyone I want to talk to. The people I do get to speak with, I don't really feel that I get long enough to chat with them. So I was like, hey, how are you doing?

So good to see you. OK, next person. And the wheels are always in motion at KubeCon because it is so big. And it is so long. We're all tired by the end of every day. And yeah, smaller events are definitely going to be hopefully more prominent in the next couple of years.

ABDEL SGHIOUAR: Nice, nice. Well, cool. David, it was fantastic chatting with you.

DAVID FLANAGAN: The pleasure was definitely all mine. Thank you so much.

ABDEL SGHIOUAR: Do you have anything you want to add, anything you want to share with the audience?

DAVID FLANAGAN: I just encourage people to break Kubernetes clusters and fix them. Hopefully, do it on my live stream. Share your knowledge with everyone else. We're all in this together. And yeah, stay safe.

ABDEL SGHIOUAR: Thank you very much, David. Thanks for joining us. You can find David @rawkode on Twitter or at rawkode.academy. That's the website of your show. Thank you very much, David.

DAVID FLANAGAN: Thank you.

[MUSIC PLAYING]

ABDEL SGHIOUAR: That brings us to the end of another episode. If you enjoyed this show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @KubernetesPod or reach us by email at kubernetespodcast@google.com.

You can also check the website at kubernetespodcast.com, where you will find transcripts, and show notes, and links to subscribe. Please consider reaching us in your podcast player so we can help more people find and enjoy the show. Thanks for listening, and we'll talk to you next time.

[MUSIC PLAYING]

View More Episodes