Kubernetes Podcast from Google: Episode 266 - Kubernetes at Uber with, Lucy Sweet

#266 May 13, 2026

Kubernetes at Uber with, Lucy Sweet

Hosts: Abdel Sghiouar, Kaslin Fields

Guest is Lucy Sweet, a Staff Software engineer at Uber and a co-lead for the Kubernetes Node Lifecycle Working Group. Imagine trying to move millions of compute cores and thousands of microservices to a brand new platform. All without dropping a single user request, ride, or delivery. Sounds like an absolute logistical nightmare, right? Well, today we are sitting down with someone who actually lived to tell the tale Lucy. In this episode, we are diving deep into Uber’s monumental infrastructure journey: moving away from their in-house system to Kubernetes. We’ll be unpacking the reality of running at this scale, why it’s always DNS and why building things for fun is worth it.

Do you have something cool to share? Some questions? Let us know:

News of the week

Links from the interview

Transcript

Show full transcript

ABDEL SGHIOUAR: Hi and welcome to the "Kubernetes Podcast from Google." I'm your host, Abdel Sghiouar.

KASLIN FIELDS: And I'm Kaslin Fields.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Imagine trying to move millions of compute cores and thousands of microservices to a brand-new platform, all without dropping a single user request, ride, or delivery. Sounds like an absolute logistical nightmare, right? Well, today we're sitting down with someone who actually lived to tell the tale.

Lucy Sweet is a Staff Software engineer at Uber and the lead for the Kubernetes Node Lifecycle Working Group. In this episode, we're diving deep into Uber's monumental infrastructure journey, moving away from their in-house system to Kubernetes. We'll be unpacking the reality of running this at scale, why it's always DNS, and why building things for fun is worth it.

KASLIN FIELDS: But first, let's get to the news. Broadcom announced they're donating Velero to the CNCF at the sandbox level. Velero is a Kubernetes-native backup restore and migration tool. It traces its origins to Heptio, which was founded by former Google engineers Joe Beda and Craig McLuckie and acquired by VMware and eventually Broadcom.

ABDEL SGHIOUAR: The CNCF released the KubeCon and CloudNativeCon Amsterdam 2026 transparency reports. This edition became the largest event in CNCF history, with over 13,500 attendees, 46% of which were visiting KubeCon for the first time, representing 100 countries and over 3,000 organizations.

KASLIN FIELDS: The call for proposals for KubeCon, CloudNativeCon North America 2026 is open and will close on May 31, 2026. The event will take place in Salt Lake City, Utah, November 9 to 12.

ABDEL SGHIOUAR: OpenChoreo released version 1.0 to the CNCF sandbox. The project originated as the open-source counterpart to WSO2's commercial choreo SaaS platform and is designed to give engineering teams a complete foundation for running workloads on Kubernetes without requiring them to build it themselves. It includes a Backstage-powered developer portal, built-in CI/CD, GitOps workflows, observability, and what the project calls a programmable control plane.

KASLIN FIELDS: And that's the news.

ABDEL SGHIOUAR: Today I'm talking to Lucy Sweet. Lucy is an engineer at Uber. She is part of the team responsible for building and maintaining most or nearly all the platform infrastructure used by engineers globally at Uber. Lucy is also a Kubernetes Node Lifecycle Working Group lead. Welcome to the show, Lucy.

LUCY SWEET: Thanks. And thanks for having me.

ABDEL SGHIOUAR: So I'm very happy to be here. Took us quite a bit of time to get this together so we can talk to you folks. This is part of our series interviewing end users of Kubernetes. This is a feedback we got for the show. And we want to talk to you a little bit about your Uber journey and your personal journey through moving to Kubernetes, right?

LUCY SWEET: Mm-hmm.

ABDEL SGHIOUAR: So you did a talk a while back. I think it was KubeCon during COVID time, if I remember correctly.

LUCY SWEET: Oh, wow. That's like a lifetime ago at this point. Jesus.

ABDEL SGHIOUAR: Yes, feels like. Feels like ages.

LUCY SWEET: Yeah. Was I even alive back then? My word.

ABDEL SGHIOUAR: Yeah. And part of your talk was talking about the story of how Uber migrated millions of cores to Kubernetes. Can you talk a bit about that? What's the triggering points to converge to Kubernetes?

LUCY SWEET: Yeah, absolutely. So we originally, at Uber, have been running separate stateful, stateless, and batch compute platforms. So we have a stateless platform called Up, a stateful one called Odin. And then there's a batch one, as well, and so on. And none of these used to use Kubernetes.

The stateless one was built on a system called Peloton, which came into existence literally a decade ago plus. And it came into existence around the same time as Kubernetes. It is a bit different, though. Peloton was built on Mesos and was a bit more workflow based rather than reconciliation based.

And we happily used that for many, many years. But over time, Mesos started to be less and less actively developed and maintained. And also, we saw where the wind was going. We saw that everyone was converging on this one platform. And that brings you a lot of benefits.

You know, when you're on the same platform as everyone else, you get a lot of network effects. And you don't have to build solutions to common problems that everyone else has also already had. So back about three years ago now, we decided to move our stateless compute fleet to Kubernetes from Peloton.

And that process all kicked off just, actually, as I started to join Uber. So today we've got our whole stateless fleet on Kubernetes, and I'm sure we'll talk a bit more about that as well. And we're just starting to move our stateful compute stack over as well. Because we had such a good experience with the stateless one.

We were like, OK, well, why not go further? Why not do more? [LAUGHS]

ABDEL SGHIOUAR: Nice. And so I was reading some of your publications on the engineering blog of Uber. And one thing that I found very, very interesting and very cool, actually, to read is the story of the migration, so migrating millions, of course, I assume, hundreds or thousands of microservices without downtime.

I mean, at the scale of Uber, I assume it's a very complex thing. That's essentially like if Google decided, oh, we're going to move from Borg to Kubernetes while keeping Google alive, right?

LUCY SWEET: When are you guys going to do that?

ABDEL SGHIOUAR: That's a very good question. I am in Bergen at Cloud Native Bergen, and somebody literally asked me the same question today.

[LAUGHTER]

Like, when are you guys moving to Kubernetes? So what's, in your mind, have been the key to success for that specific migration, migrating without downtime? What have been some lesson-learned patterns, things that you have learned through that journey?

LUCY SWEET: Mm-hmm. Well, one of the first things I think that we had to pay our technical debt on was that we didn't have this property called portability. So a lot of our services were dependent on, oh, I have to run on this machine. And if I'm not on this machine, then everything breaks.

Or oh, I have to bind to this static network port. So if someone else tries to bind to that network port, everything goes wrong, right? And this, before the migration, was all over Uber. And it was everywhere. It took over a year, from memory, of basically pestering people to undo this.

The way that we tried to do it was-- we normally with every migration and, in fact, every upgrade of every service at Uber do something called make before break. So we try and start new containers of the application on whatever new stack we're running or if there's a new version upgrade with the new container image on the same stack. And then we only start destroying the old version when that new version has come up fully and when all of the health checks and everything's passing, when end-to-end tests have passed, these sorts of things.

And we used that pattern to actually find services who we thought were not portable. Because what we can do is we can say, OK, I'm going to try and spawn a container of your service on K8s cluster. And let's see if you come up. Let's see if your service immediately-- does it exit 1?

Do the health probes pass? Does the end-to-end test pass? If they don't, we just back off. We take the new version on K8s away, and then they end up on a long tail. And eventually, you can keep trying on that long tail automatically. But eventually, it gets to a human.

And a human has to go to the service owner and say, hey, folks, why does your application not work on K8s? What crazy stuff have you done this time? That process, yeah, took over a year to get to. But that portability is so important, and we keep using it now.

So for example, we used portability once we got it to migrate to K8s. But more recently, we also looked at, hey, you know, our entire fleet is on AMD64 arch CPUs. We want to start using ARM. ARM's cool. OK. Well, because we already have this portability trait and we have this make-before-break trait, we can kind of detach the service owners.

We can just start trying to spawn them make-before-break style on CPUs. And if it breaks, not a problem. We'll just take the new containers away. If it doesn't break, awesome. We can put them on ARM. We can do that without actually having to talk to service owners at all, really. We only talk to them if we need to move them and their service won't.

ABDEL SGHIOUAR: Got it.

LUCY SWEET: But at least for me, yeah. It's very easy to lose portability in a company if you're not always looking out for it, just because of Hyrum's law. You know? If you give someone a way to depend on a host, at scale--

ABDEL SGHIOUAR: They will do it.

LUCY SWEET: --someone will depend on a host.

ABDEL SGHIOUAR: Yes.

LUCY SWEET: See this far too many times. [LAUGHS] I'm guilty of this sometimes as well.

ABDEL SGHIOUAR: It's quite interesting, actually, this whole concept of portability. I'm curious about something. I understand the concept of spinning up the service and trying to see if it works and is past testing. But how do you handle it from the traffic point of view? Is that a green-blue, blue-green deployment kind of thing?

LUCY SWEET: So network traffic can be always a bit of fun. So most applications at Uber have something called a canary. So we send 1% of the traffic to the canary slash new version. And then the idea is that if that messes up, if that fails, we can always retry.

Obviously, there are error cases where it's not just a binary, it errored, it didn't. But most of the time, that's good enough. And that's solvable. We do provide customization for our users. But honestly, most users don't have to touch that.

At least when we build our platforms, what we're trying to do is build them into a way where-- as a user, you just want to write code and move on with your life. Our end users who build the Uber app, they shouldn't have to care about, oh, what machine I'm on, what my network traffic looks like, what my scaling looks like. It's just, I write application.

It runs. The end, hopefully. And that's where we come in. And that's where we have to abstract all of these problems away.

ABDEL SGHIOUAR: Interesting. So this is a question that just popped up in my head. And it's not related strictly to Uber. This is actually a feature we have on GKE. And I'm curious about your opinion, especially because you're working on the Node Lifecycle Working Group, right?

So GKE has a feature called blue-green node pools. So what it allows you to do, basically, is bring up node pools under a new Kubernetes version, test if the migration works, if it not, fall back, and then blue-green, but for the infrastructure side. And to me, I've been in the industry for 15 years, so blue-green is application.

It's not infrastructure. So what's your thoughts? I'm just curious. What do you think about this?

LUCY SWEET: So obviously, I haven't-- we don't use GKE at Uber. I know. Don't boo too much.

ABDEL SGHIOUAR: Yeah, that's fine. It's completely fine.

LUCY SWEET: But one of the really tough things, I think, especially with nodes upgrades, is, can you effectively test whether a node is in a ready state for an application without risking disruption? That can be, in my opinion, really tough.

Even if you have blue-green for the nodes themselves, if you place an application on a new node and that node is for some reason not compatible with that application, then you could cause problems, right? And those problems may lead to actual disruption. This is actually one of the things we've been discussing in the Node Lifecycle Working Group, is this idea, as well, not just of this readiness, but also whether a node is ready and schedulable for a workload can actually depend on the workload itself.

And obviously, in K8s, we have the node is ready and the node isn't ready. But that's a very binary signal. Some nodes are ready for some things at some times and may not ever be ready for other things other times. And this has become especially relevant with AI/ML accelerated workloads, where I must be on this node because this node has this TPU or GPU.

And I must not be on this node because of that. So blue-green is a great place to start, especially with the current K8s stuff. I really want us to push further in K8s. Eventually, in the long term, I would love us to be in a position where we could express node readiness and node upgrades for a given node based on the workloads that could then be placed on that node.

So you could say, hey, I'm ready for this type of workload. I'm not necessarily ready for this right now. Or I need to be in maintenance for this, but not necessarily this, this granularity. But it's always fun because we always have to balance this against the fact that we can't make Kubernetes infinitely complex.

ABDEL SGHIOUAR: Yeah, yeah. Of course.

LUCY SWEET: We can't turn Kubernetes into a big Turing machine, as funny as that would be.

ABDEL SGHIOUAR: Yeah. And if I'm reading a little bit into your thoughts, I think that one of the challenges with node upgrades specifically is, are you catching any regressions? Is the performance application going to be the same in the new version?

Are there any bugs in Kubernetes itself, right? So it's super interesting.

LUCY SWEET: Don't worry. Kubernetes never has any bugs.

ABDEL SGHIOUAR: No, it doesn't. No, of course not. It's completely bug free.

LUCY SWEET: Lies and slander.

[LAUGHTER]

ABDEL SGHIOUAR: So back to the topic of Uber. In the talk that you have done, the video I watched, you talked about this classification of services. I believe it was tier five to tier zero, right? Can you talk a little bit about that?

How do you go about classifying-- I guess it's easy. It's like, the thing that is business critical is tier zero. And the thing that no one cares about is tier five, right?

LUCY SWEET: Yeah, basically. So yeah, at Uber, we have these six service tiers to indicate criticality. So tier five is-- we have an internal foosball league in Denmark, and the service that runs that-- because, of course, we overengineered it into a service-- is tier five. You know how engineers are.

In tier four or tier three, you might find internal tools that are useful. But you can live without them with pain. Tier two, you're looking at something that's customer facing, but maybe isn't the core trip flow. So it could be a promotion system or redeeming that sort of thing.

Tier one, you're looking at things that touch what we call the trip flow. So that's the minimum stuff you need to call an Uber. The car arrives. You get in. You get to your destination. You get out. If your service is needed for that trip flow to run, it's tier one.

And then tier zero is the infrastructure services that support tier one, so the actual stateless compute platform itself. And we use these a lot at Uber. So migrations, we normally work tier by tier. We start in tier five, and we work our way down through to tier zero.

And yeah, that can be useful for us. And I think it's actually one of the things that I think is quite valuable. And nearly anyone could get this by just labeling even their Kubernetes deployments, in my opinion. Understanding what workloads in your organization are more important and less important, it can both help during incident response because you can know, oh, this is down, and that's really, really bad versus this thing's down, Who cares? especially during a capacity crunch.

And on capacity, you can actually start to do very interesting things if you, for example, are OK to expect maybe some downtime of the less-important tiered workloads. So for example, we've been thinking about what would happen if you put high-tier workloads on spot instances, for example. Maybe they'll go down every now and then.

But if it's a foosball league, who cares? But if we didn't have this tiering system, that would be really, really tough because we can't put the core trip flow on a spot instance and pray. I mean, we could. But I would get in trouble, apparently. Apparently, this is not good engineering practice.

[LAUGHTER]

ABDEL SGHIOUAR: Yeah. And as I joke, always, when I do my talks, it's like, if you go out to all your users and ask them, Can you tell me how would you, in your head, classify your service? they will all say it's mission critical. Everybody thinks their services are the most important.

Here is a curveball question. Where is DNS on this tiering system?

LUCY SWEET: Oh. [LAUGHTER] DNS is tier zero. Because you know how it goes. Any incident, is it DNS? It's always DNS. The AWS outage we all had recently the fun of experiencing-- Uber doesn't run on AWS.

But don't worry, a lot of our suppliers do anyway. So we were definitely involved. That was a fun DNS one. I think I've read the mini postmortem on that. And, of course--

ABDEL SGHIOUAR: It's always DNS, right?

LUCY SWEET: I have a haiku of that on my wall in my home. "It's not DNS. There's no way it's DNS. It was DNS."

ABDEL SGHIOUAR: Yeah, exactly. So I do have a question about-- this, I think, is coming from the talk as well. When you had Peloton, you had a native snapshotting functionality, which my understanding was, basically, it allows failing containers to be snapshotted and stored so that engineers can debug them later. Right?

And the ephemerality nature of Kubernetes makes it slightly harder to do something like this. So how did you solve it in Kubernetes, if solved it is the right term to use here?

LUCY SWEET: Yeah. In Peloton, we had this thing called container snapshotting. And what that let you do is, yeah, if a container fails, you get an entire snapshot of the container at the point it failed. And engineers love this because they go in. They read internal Java logs, whatever.

They read the state of the file system. Great fun. Kubernetes doesn't natively support this because Kubernetes is clean and blows up the container very quickly after the pod goes away. So what we did is we added a little sidecar to all of our containers. And this sidecar, all it does is, when the container exits, it stops the container being deleted and very quickly jumps onto the file system and uploads it all to a thing called TerraBlob, which you can just basically think of as S3.

It's like Uber's S3. That then sits in TerraBlob. The sidecar exits. And then the pod goes away like normal. But we've now captured the snapshot of what the user wanted. And so users got to keep that feature. Because this is one of the big things about this migration.

We did not want to be in a position where we were taking features away from users. They want to focus on their code. We don't want to be an annoyance to them. If they have something, it's very hard to take things away once you've given it to people, very hard.

ABDEL SGHIOUAR: Yeah. Because if you are executing such a large-scale migration, you want to at least try to have a feature parity between the existing platform and the new platform, right?

LUCY SWEET: Yeah.

ABDEL SGHIOUAR: I mean, that's, in theory, what most organizations are trying to do. The practice is slightly more difficult sometimes.

LUCY SWEET: Look, I'm speaking from theory, OK, definitely not reality. [LAUGHTER]

ABDEL SGHIOUAR: I mean, there is this joke we have at Google, which is there is no such thing as a yellow banana. It's either a green banana or a brown banana. So it's either alpha or deprecated. So stable doesn't exist.

LUCY SWEET: Isn't there a comic from [? Gumex, ?] as well, where it shows the two paths at Google? What path one deprecated, don't even think about it. Path two under construction.

ABDEL SGHIOUAR: Yes, yes. Pretty much. Yeah. So one tool will have deprecated banner, and the new one that is replacing it will have an alpha banner. That's it

LUCY SWEET: Oh, don't worry. This feels very real to me as well. And I don't even work at Google.

[LAUGHTER]

ABDEL SGHIOUAR: All right, so I do have another interesting question. I mean, you run hundreds of millions of cores, a lot of cores. We launched recently the 65,000 nodes in GKE. EKS replied by launching 100k.

I was testing the 65k nodes. At kubectl get pods take five minutes, as you can imagine. So what does it like? What does that like? And how is the debugging [? inside ?] for Uber? How do you solve these kind of large-scale Kubernetes knowledge problems?

LUCY SWEET: Yeah. So we measure in CPU cores because we think it's a nicer way than measuring in hosts and nodes because they can all be different sizes. So right now, we have just over 5 million cores on our stateless fleet. But then we spread those.

So at Uber, we have over 200 Kubernetes clusters right now. And the idea is that these are broken up into availability zones that we have inside the company, but then our users don't actually see this complexity. So from the stateless compute platform side and from the stateful compute platform side, they hit deploy.

And they just see that pods have come up somewhere. And then we abstract away all the networking needs to reach you, and your logs need to reach you in this sort of thing, so that the users don't really have to care. Debugging K8s itself, though, when that goes wrong, can be fun.

The first thing we have to do is find what cluster the workloads even on. We have a platform called Grail, and we use Grail a lot inside Uber. Grail is basically a very big distributed graph database that is in memory and represents the current state of the world across the whole company. And the powerful thing about Grail is it has associations between things.

So there is an association from a Kubernetes pod to an Uber service and from an Uber service to maybe a networking group. And you can craft these very powerful queries, where you say, look up Uber services that meet these parameters. Then follow to the Kubernetes pods that are associated with them.

ABDEL SGHIOUAR: Oh, interesting.

LUCY SWEET: And then do this, then do this. We've posted a blog post about this before. And it's one of the most powerful features, I find, inside Uber. It's the ability to effectively have not just disparate Kubernetes clusters, but even disparate things that aren't even K8s, like proprietary networking groups on the one hand, internal service definitions, Kafka queues, whatever, and associate all of these together in one place, where you can query once and jump between these technologies.

This is one of the coolest things, I think, we have here. You know, we have a query catalog. There's thousands of queries that people have written there. You can do really powerful things for debugging, like, show me every pod that has this property, or show me every pod that's asking for a GPU.

And the service is also part of this team because you can follow through. And so, yeah, a lot of the debugging at Uber, from our end, as platform engineers, where we actually have touch K8s, we use Grail heavily for it. But from a user POV, we try and correlate and bring together and centralize all of this for them.

So their logs get shipped off to our central monitoring system, uMonitor, which is built on M3 and all these other things, and the metrics do as well. So as a user, the cluster is just a detail. Users don't normally care. But yeah, it has been fun seeing people really push the boundaries on how many nodes you can get in a Kubernetes cluster. It's like a new arms race.

ABDEL SGHIOUAR: Yes, yes. I don't know where it's going to end. We'll see. I like this idea of being able to correlate across not only applications, but infrastructure and dependencies and stuff. I find that very powerful. Because basically, if you're an SRE being awakened at 3:00 AM in the morning to fix an incident, the last thing you want to do is have to stay on Slack and ping people to tell them, what is your database, right?

LUCY SWEET: Yeah.

ABDEL SGHIOUAR: So in one of your-- I think it was in the video as well. You talked about, during the migration, the biggest amount of work was on the 10% unique services or special services. Basically, we started by discussing this at the beginning, which is like, everybody has the same problem.

So 90% of all the applications are exactly the same. So what was the complexity there? What makes those 10% special?

LUCY SWEET: It depends on the service. But normally, they have done something that is critical now to their application-- they've built on top of it-- but is simply not something that can be supported portably. So one of the things we saw before was beyond static network ports, which is one issue.

We also saw people doing things like, oh, I'll just put these files on the host. And then when my application comes back, I'll just read them later. You know? It's not like I'll get moved to a different host. That's crazy.

ABDEL SGHIOUAR: No, of course not.

LUCY SWEET: That would never happen. The file system is global and consistent and infinite. That's how it works, right?

ABDEL SGHIOUAR: Yes.

LUCY SWEET: So with these guys in particular, this can be a huge challenge. The other one, as well, was there were some applications that were maybe not as stable as we hoped they would be. So for example, we found some applications on the stateless compute fleet that maybe weren't as stateless as they claimed to be.

Maybe if you took down too many of their pods at the same time, they had a big incident. And it caused a lot of issues. And so for those ones, at least in the interim, we wanted to migrate them. And we didn't want to wait for them to fix these issues.

So there were initially quite a lot of conditions of, if this service has this label, which means be really slow on rollouts, then one pod at a time globally, no more please. And that can take a lot of engineering time. Because fundamentally, an engineer in our team is spending time to write this path for one, two, three services?

It's not really a place where we can scale our impact in the way we want. It's why I am so much into pushing to make sure that we do not regress back to this and that we maintain this portability. Because we want to scale our efforts, and we can't scale our efforts if we're spending a lot of our time dealing with these special cases, these different services that maybe have these unique requirements.

So some of them are always going to exist. But it's all, in my head, a job of minimization, as few special things as we possibly can so that we can spend time making the 90% of people happy and not spend our time just getting the 10% to a basic running state.

ABDEL SGHIOUAR: Yeah, yeah. I did a little bit of work in the consulting space, and that's definitely, probably, one of where you spend most of your time. Everybody has the same problems, but then there is these unique use cases that are usually the ones that take most of the time when you're doing any sort of migration or any sort of architecting.

So speaking of unique, I want to jump ahead to-- so we talked a lot about stateless compute. So that kind of was the first phase, I think, of your migration. And there is, of course, batch and stateful workloads. So that would assume everything that has to do with data and large language model and AI.

So where are you on that? Not what's next. Where are you on that journey basically?

LUCY SWEET: Yeah. So right now, obviously-- all of our stateless fleet now is on K8s. And we're really starting to leverage K8s features, which is really, really cool. So now we're looking at stateful and batch. So right now, the main thing we're looking at is, over the next year or so, how much of our stateful fleet can we get onto Kubernetes?

But the problems here are different because stateful workloads have unique challenges and unique architectural issues. One of the big ones is you cannot move a stateful application in the way that you would move a stateless one. You can't, oh, just spawn it up here, and then shut down the other one immediately, and you're good. What could possibly go wrong?

Especially because at Uber, we have locally attached disks to a lot of our hosts. So the place that data is tenanted is very important to us. We don't have much in the way of Network Attached Storage for reasons. And so one of the things we found, actually, is that there are gaps in Kubernetes that have been a challenge for us to work around when we've designed our stateful migration.

So one of them, for example, is eviction in Kubernetes right now is not that mature. We have the eviction API, where you can create an eviction object against a pod. And either, at that point in time, it will be accepted, and the pod will go away if it's within the PDB-- the Pod Disruption Budget-- or it will 429 and that's it.

It's just a point-in-time decision, right? But that's not really expressive enough. Because as someone who runs a stateful platform, I want to be able to say, hey, I want to evict this workload. And I need to check things that maybe Kubernetes might not know about before doing that.

I might need to check, do I have capacity to put it somewhere else? What's the status of the data on disk? these sorts of things. And with the existing system, you can't really do that. You could maybe add a finalizer to the pod. But when pods are [? in Terminating, ?] a lot of stuff has already happened, especially around networking.

So finalizer doesn't really solve for that. So one of the things we've been looking at, which solves this for, hopefully, more people, as well, in the future in the Node Lifecycle Working Group, is this concept of eviction requests. And what an request does is it lets you say, I would like to get rid of this pod at some point in the future.

And so as someone who wants to get rid of the pod, you create that eviction request. And then you have stakeholders called interceptors. And interceptors annotate the pod with, hey, I want to be told when someone wants to evict this pod. And then one by one, on the eviction request, they appear in a list.

They are signaled to, hey, you can do your thing now. And when you've done your thing, just say that you're completed. And we will move on to the next interceptor.

ABDEL SGHIOUAR: Got it.

LUCY SWEET: And that now is a much more powerful verb. Because instead of a point-in-time decision of, Can I evict you right now? 429 or 201, that's it, you can say, I would like to evict this pod at some point. Please do your business logic. Please move your data.

Maybe it will take you a few minutes. Maybe it will take you a few hours. Maybe it will take you a day if you've got a huge piece of data. And this allows you to do effectively-- you can defer pod eviction to a point in time where it makes sense to your business.

If the pod needs to be evicted now because it's a bad host, you can progress it more quickly. You can also just understand and visualize this a lot easier in Kubernetes. Because right now, obviously, eviction objects don't really exist. You create one, and it blows up with the pod.

It's special. But with this, you can actually model, oh, this pod, someone's trying to evict it because of XYZ, ABC. And the status right now is that this interceptor, maybe one that's moving the data, is copying the data to a new replica. And then once that's done, it will do this, this, this. And then it will go away.

That, I think, is a lot easier, as well, for users of K8s to see and understand, especially when we're modeling it as resources inside the cluster. So right now, we're hoping-- we didn't hit the release for December, unfortunately, but we're hoping that next release we can get at least a KEP in and maybe a nice alpha. Within Uber, we're pushing this really hard.

We've already managed to get a working reference implementation of [? the KEP ?] online and rolling. And we're actually going to use that in our migration, starting in about January, as a production part of it.

ABDEL SGHIOUAR: Nice.

LUCY SWEET: Because outside of just Kubernetes itself, it's a really good primitive. And it's really powerful. And in my opinion, it's not too complex as well.

ABDEL SGHIOUAR: I can see how that would be super useful. So a follow-up question. Will that mean, also, that you can have some sort of signal inside the container that the application can intercept and do something with it, like close connections or commit data to disk? You see what I mean?

LUCY SWEET: Yeah, yeah. What you could do is you could just have an interceptor as the last one. So interceptors have to be able to look at the cluster. But you can do that with service accounts, right? You can have an interceptor on the pod that the pod itself's container is looking for.

And when it comes up, that could be the signal to the application of, hey, close your connections, shut yourself down, or get yourself ready to shut down, I should say, because you shouldn't shut yourself down because then the pod goes away anyway, and then progress. But yeah, all you need to be an interceptor is you need access to the cluster to see eviction request objects.

And you need to make sure, on the pod, you're patched in as, hey, I'm an interceptor for this pod.

ABDEL SGHIOUAR: Yeah. That's certainly better than-- well, I'm going to tell you this. This is, I think, a funny story. I worked on a project a long time ago, which by long time ago, I mean 2018, so not that long ago.

LUCY SWEET: I hadn't even graduated in 2018. So for me, this is a lifetime ago.

ABDEL SGHIOUAR: [LAUGHS] So what this project was, basically, is a customer that had a Node.js app with a known memory leak issue. And they debugged it down to the very simple fact that, after a certain number of requests served from that particular app, the memory leak issue happens.

So what was the solution? The solution was, we're going to put a sidecar that counts how many requests the pod is processing. And after the critical number of requests, we kill the pod.

[LAUGHTER]

So I hope they moved away from this. But that was one of the funniest implementations I've seen.

LUCY SWEET: If your viewers could see my face right now--

ABDEL SGHIOUAR: Yeah, yeah.

LUCY SWEET: If your viewers could see my face.

ABDEL SGHIOUAR: Every time I tell this story, people go what?

LUCY SWEET: I mean, I've definitely seen some fun implementations in the past. One time, I saw a team who wanted to do custom control of node readiness. And their solution was not node readiness gates. Their solution was just to crash loop the kubelet until they were ready.

ABDEL SGHIOUAR: Oh, of course.

LUCY SWEET: That was an interesting implementation approach.

[LAUGHTER]

I mean, if it works, it works, right?

ABDEL SGHIOUAR: I mean, that is technically [? a ?] readiness if you always reply 400 until you are ready, right?

LUCY SWEET: Yeah, exactly. This is the next level. So yeah, next feature in Kubernetes, the kubelet crash loops itself unless it's ready.

ABDEL SGHIOUAR: Yeah.

LUCY SWEET: No. Maybe not.

ABDEL SGHIOUAR: So speaking of fun implementation, you are working on something. It's an LLM wrapper. I'm reading what you wrote. So then you tell me. So it's an LLM wrapper that pretends to be Kubernetes API object, works with kubectl, hallucinates objects into existing [? when ?] you try and get them. My question is, why?

[LAUGHTER]

LUCY SWEET: Under the Fifth Amendment of the United States Constitution, I don't have to answer that.

ABDEL SGHIOUAR: No, you're not American. Come on.

LUCY SWEET: OK, yeah. Fair enough. No. Damn it. So this is what I would call a moment of weakness at 7 o'clock in my house, when I realize maybe that I could do something. And I don't stop to think about, should I do it? So what this is, it's a go binary that's hooked up to an LLM.

And the LLM's instructions are, you are a Kubernetes API server. You must respond with fully formatted JSON responses to HTTP requests. You will be given the URL and the payload from the request. You must not add any extra commentary or the JSON parser will break.

Then you run that as a HTTP server, and you pipe it all back to the LLM. And then you make a kubeconfig that points to your kubectl, directly at it. And you can actually go further. I tried with kubectl. That was very funny because I did a patch against a deployment that didn't exist. And the LLM hallucinated it into existence.

ABDEL SGHIOUAR: Sure.

LUCY SWEET: [? So it would ?] say it was patched. Then I went to get the deployment. And it was slightly different from what I just patched [LAUGHS] because-- I created a deployment object. And then I did get pods. Because obviously, there's no actual Kubernetes behind this.

There's no controller manager, no etcd. And, of course, the pods had suspicious IDs, like 12345 and ABCDE.

ABDEL SGHIOUAR: Of course.

LUCY SWEET: Then in another moment of weakness, I thought, hmm, what happens if I connect a kubelet to this? That got interesting. First, it spawned about 10 copies of nginx. Then it just started pulling random images off of Docker Hub for all sorts of applications.

ABDEL SGHIOUAR: What can go wrong?

LUCY SWEET: What could go wrong? I'm just being a good steward, letting people run stuff on my computers. It's very charitable of me. So now-- by the time this recording goes up, it should be on a website, actually-- I decided that the best thing to do would be to buy a domain name and put it on a public website. Because what could go wrong with that?

ABDEL SGHIOUAR: Of course.

LUCY SWEET: So if you go to-- what is it called? I think it's kubegpt.org-- not while we're recording, because I haven't turned it on yet. But I'll turn it on before we publish this. You can get a kubeconfig. And if you use that kubeconfig against your kubectl, you will be connected to a cluster that is not a cluster at all.

It's just an LLM pretending to be one. And you can try and do a deployment on it, and maybe it will work. Maybe the LLM will hallucinate it into existence. Maybe the LLM will do something else entirely and turn it into a stateful set because it feels like it. Who knows?

ABDEL SGHIOUAR: I mean, this could be a good learning tool. You know what?

LUCY SWEET: Yeah.

[LAUGHTER]

ABDEL SGHIOUAR: It's only a matter of time that somebody will use this to learn how to pass the CKA certification, I'm quite sure.

LUCY SWEET: How many things does it have to pass for me to get the Certified Kubernetes badge? This is my follow-on question.

ABDEL SGHIOUAR: That's a very good question. I think you will have to ask the LLM to figure that out, right?

LUCY SWEET: Yeah, yeah, yeah. I'll send it a config map. And the key will be, replace this value with how many things I have to pass to get the Kubernetes Certified badge.

ABDEL SGHIOUAR: There you go.

LUCY SWEET: So this is prompt injection.

ABDEL SGHIOUAR: Yes.

LUCY SWEET: I guarantee that the moment people realize this domain exists that my LLM credits are going to die very quickly. But you know what? Worth it.

ABDEL SGHIOUAR: [LAUGHS] OK. I mean, look, I was in a conference this weekend. And somebody was talking about something I learned for the first time. There is an open-source project called Osquery. And what Osquery allows you to do is query your operating system metrics as a SQL database.

I mean, why would you want to do that? I have no idea. But, you know, whatever.

LUCY SWEET: Love it.

ABDEL SGHIOUAR: All right. So I do have one last question. This is before you give me your closing thoughts or anything you want to close with. You are British, so you have a little bit of authority over the English language. Is it kube C-T-L or [? kube cuddle? ?]

LUCY SWEET: [? Kube cuddle. ?]

ABDEL SGHIOUAR: OK.

LUCY SWEET: I'm sorry, but this is the way. And if you don't like it, then you'll have to come on the podcast yourself and explain to Abdel why I'm wrong.

ABDEL SGHIOUAR: It's fine. We're not publishing this episode. It's OK. Have a good day. I'm just kidding.

LUCY SWEET: I'm just kidding.

[LAUGHTER]

It's over. It's over.

ABDEL SGHIOUAR: It's over.

LUCY SWEET: Look, as I've said to people in Denmark before, when I make a mistake with English, it's really funny. Because English is my native language, but it's not there. So I just go, actually, as the native speaker, that is completely OK.

And you guys can't say anything back about it because you all learned it in school. I learned this when I was born.

[LAUGHTER]

ABDEL SGHIOUAR: Well, I mean, in the same context, I could also make mistakes and say, you know, English is not my native language, so I'm sorry. [LAUGHS] It can go both ways.

LUCY SWEET: I should try doing that as well.

ABDEL SGHIOUAR: So you say, oh, by the way, I have been living in Denmark for a very long time. I forgot about English.

LUCY SWEET: Yeah. Yeah, exactly. I've got too used to really weird words for things and speaking from my throat.

ABDEL SGHIOUAR: [LAUGHS] Awesome. So, Lucy, this has been a fun discussion. I learned a lot. Any closing thoughts? Anything you're excited about? What's going on? Are we going to see you at KubeCon Europe?

LUCY SWEET: Oh, absolutely, you are. I've got two talks at KubeCon US this year.

ABDEL SGHIOUAR: Awesome.

LUCY SWEET: I'm going to be on an AI/ML panel with a few folks from Google, of all places, among others. And then I also am going to be doing a talk with Sandeep from [? Gen, ?] where we're going to be breaking into a Kubernetes cluster live on stage and doing privilege escalation and all these fun things.

ABDEL SGHIOUAR: Awesome.

LUCY SWEET: So both of those should be great fun. And outside of that, I'm looking forward to when we hit 10 million calls on K8s. I want the eight-figure number. We need to get the stateful fleet on there. And then we can get there. I believe.

ABDEL SGHIOUAR: All right, so this is your open invitation when you hit that number to come back on the show and tell us all the fun stuff you have learned.

LUCY SWEET: I'll bring cake.

ABDEL SGHIOUAR: All right, cool. Sounds good. I'll be coming to Aarhus to celebrate that in this case. Because last time I was in Aarhus, there was cake for the 10-year anniversary of Kubernetes, right?

LUCY SWEET: Yeah, absolutely. And if you want to be physically here for cake, just a heads up. You could just join Uber-- uber.com/careers.

ABDEL SGHIOUAR: Awesome. Yeah, that's great. We're also going to make sure that the links to your talks-- because this is going to air after KubeCon North America, so we'll make sure that we include the recordings. And there's also your LinkedIn profile. We'll have that. And then there's also lucy.sh-- I really like the domain--

LUCY SWEET: Thank you.

ABDEL SGHIOUAR: --that you can go check to follow up on what Lucy is up to, upcoming talks, talks that have been done already, et cetera, et cetera. Awesome. Thank you so much, Lucy.

LUCY SWEET: Thank you so much. It was lovely to talk to you, as ever. Look forward to seeing you. You're not at KubeCon this year, are you?

ABDEL SGHIOUAR: No, not North America. I'm going to be in Morocco.

LUCY SWEET: Oh, Abdel.

ABDEL SGHIOUAR: It's warmer and better food.

LUCY SWEET: Abdel. Listen, I need you to collectively boo right now. All together.

ABDEL SGHIOUAR: Yes. But I will see you in Europe for sure.

LUCY SWEET: Oh, absolutely. Look forward to it.

ABDEL SGHIOUAR: Awesome. Thank you. That brings us to the end of another episode. If you enjoyed this show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on social media at @KubernetesPod or reach us by email at kubernetespodcast@google.com.

You can also check out the website at kubernetespodcast.com, where you will find transcripts and show notes and links to subscribe. Please consider rating us in your podcast player so we can help more people find and enjoy the show. Thanks for listening, and we'll see you next time.

[MUSIC PLAYING]

View More Episodes