Kubernetes Podcast from Google: Episode 229 - AI/ML in Kubernetes, with Maciej Szulik, Clayton Coleman, and Dawn Chen

#229 June 25, 2024

AI/ML in Kubernetes, with Maciej Szulik, Clayton Coleman, and Dawn Chen

Hosts: Abdel Sghiouar, Kaslin Fields

In this episode, we talk to three active leaders who have been around since the very beginning of Kubernetes. We explore how Kubernetes has changed since its inception, with a particular focus on current efforts in Open source Kubernetes to support AI/ML style workloads.

Maciej Szulik is currently taking a seat in the Kubernetes Steering Committee. He’s also leading Special Interests Groups responsible for kubectl, workload and batch controllers. Maciej has been contributing to Kubernetes since the early days, jumping from one area to another where help was needed. He authored the first version of audit and helped shape its current one, as well as touched multiple other places in apimachinery. He was also responsible for designing and implementing Job and CronJob controllers. In kubectl he was responsible for the plugin mechanism and several major refactors to simplify the code. Since May 2024 he joined the ranks of Production Readiness Review (PRR) approvers helping ensure high production standards for the future of Kubernetes releases.

Clayton Coleman is a long-time Kubernetes contributor, having helped launch Kubernetes as open source, being on the bootstrap steering committee, and working across a number of SIGs to make Kubernetes a reliable and powerful foundation for workloads. At Red Hat he led OpenShift’s pivot onto Kubernetes and its growth across on-premise, edge, and into cloud. At Google he is now focused on enabling the next generation of key workloads, especially AI/ML in Kubernetes and on GKE.

Dawn Chen has been a Principal Software Engineer at Google cloud since May 2007. Dawn has worked on an open source project called Kubernetes before the project was founded. She has been one of tech leads in both Kubernetes and GKE, and founded SIG Node from scratch. She also led Anthos platform team for the last 4 years, and mainly focuses on the core infrastructure. Prior to Kubernetes, she was the one of the tech leads for Google internal container infrastructure – Borg for about 7 years. Outside of work, she is a wife, a mother of a 16-year old boy and a good friend. She enjoys reading, cooking, hiking and traveling.

Do you have something cool to share? Some questions? Let us know:

News of the week

Kubernetes 1.31 Code Freeze is on July 9th

Links from the interview

Kubernetes Working Group Batch
Kubernetes Working Group Serving
Blog: Introducing Indexed Jobs (2021)
Docs: Kubernetes Jobs
KEP: Elastic Indexed Jobs
Docs: Kubernetes CronJobs
KubeCon EU 2021: The Long, Winding and Bumpy Road to CronJob’s GA - Maciej Szulik, Red Hat & Alay Patel, Red Hat
KubeCon EU 2018: Writing Kube Controllers for Everyone - Maciej Szulik, Red Hat (Beginner Skill Level)
Kubernetes Working Group Device Management
Kubernetes Enhancement Proposal process README
DockerCon 2014: The announcement of Kubernetes at DockerCon
Blog: AI & Kubernetes (by Kaslin)
Kueue - “Kueue is a cloud-native job queueing system for batch, HPC, AI/ML, and similar applications in a Kubernetes cluster.”
Whitepaper: Large-scale cluster management at {Google} with {Borg}
Email: “Containers: Introduction” - An email introducing the concept of Linux containers to the Linux community

Links from the post-interview chat

Transcript

Show full transcript

ABDEL SGHIOUAR: Hi, and welcome to the "Kubernetes Podcast" from Google. I'm your host, Abdel Sghiouar.

KASLIN FIELDS: And I'm Kaslin Fields.

[MUSIC PLAYING]

ABDEL SGHIOUAR: It's a pretty light news week, but this episode is very long, so buckle in and make sure to stay until the end to hear us banter about all sorts of random stuff.

KASLIN FIELDS: Welcome to the fourth and last episode of our four-part special series for the Kubernetes 10-year anniversary. We dedicated this episode to AI/ML and spoke to three contributors who are leading work in that area-- Clayton Coleman, Dawn Chen, and Maciej Szulik. Even if you're not particularly interested in AI workloads, we talked quite a bit about infrastructure considerations for these workloads that are applicable in a variety of circumstances, so make sure you check it out. But first, let's get to the news.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Code freeze for Kubernetes 1.31 release is on July 9th, 2024. If you are a Kubernetes contributor writing code for an enhancement going into 1.31, make sure you get your code done by that deadline.

KASLIN FIELDS: And that's the news.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Hello, and welcome to a new episode in our 10-years anniversary special. Today, we are talking to Maciej Szulik. Maciej is currently on Kubernetes steering committee. He is also leading special interest groups responsible for kubectl workloads and batch controllers. Maciej have been contributing to Kubernetes since its early days, jumping from one area to another where help was needed. He authored the first version of audit and helped shape its current one as well as touched multiple other places in API machinery.

He was also responsible for designing and implementing Job and CronJob controllers. In kubectl, he was responsible for the plugin mechanisms and several major refactoring to simplify the code. Since 2024, he joined the ranks of Production Readiness Review, or PRR, approvers, which we're going to talk about, helping ensure high quality standards for the future of every Kubernetes release. Welcome to the show, Maciej.

MACIEJ SZULIK: I'm glad to be on this podcast.

ABDEL SGHIOUAR: How am I doing with pronouncing your name?

MACIEJ SZULIK: Very well, very well. Awesome.

ABDEL SGHIOUAR: Good. So this is the last episode in our 10-year anniversary special. And of course, we couldn't do a 10-year anniversary special without dedicating an episode to HPC, and AI, and ML, because that's all the rage these days, right?

MACIEJ SZULIK: Correct. AI is the new hype these days. Definitely a lot of movement, a lot of places. In our end, yes, we've actually started a little bit very shy-- I actually looked it up before this call. The Batch Working Group was initiated two years ago. I can't believe it's been already two years.

So a lot has happened. I will admit that I'm very grateful to the group-- the entire Batch Working Group for the improvements that have been done over the past two years. A lot of the stuff that the folks that are part of the group has pushed forward-- and especially the batch APIs, so we're talking about Job primarily and CronJob controllers as well-- a lot has been pushed.

A lot of those topics actually go all the way back to the early days of Kubernetes when, as you mentioned, when myself and Eric Tune, when we were writing the first versions of Job controller, we were already aware of some of the limitations. But because nobody had the time to actually push those features forward and improve the capabilities of the Job controller, a lot of those features, or in a lot of those cases, limitations actually were lingering for quite some time by now.

And primarily, I would call out Aldo and [? Michal ?] for doing amazing job to push those unsolved problems forward. I will admit that it wasn't easy. There was a lot of bumps on the road, but in the end, we've made it successful. And a lot of the users today are not probably even aware about those early limitations, and I'm happy to talk about them a little bit later.

ABDEL SGHIOUAR: I would like to talk about what those limitations are. Because it's interesting that the group started two years before the AI hype was even a thing.

MACIEJ SZULIK: I mean, if you look back, the HPC-- and there's a lot of folks that are part of the Batch Working Group. A lot of people that are in the University area, they have been doing HPC and some sort of MA-- AI/ML workload type for a decade or even longer. So it's not like a big thing. It just so happened that over the past-- I don't know-- 12 months, a little bit less, it actually hit the highlights and the current news.

But it was actually a thing that was in the development for quite a while. And I will admit when I was back a couple-- that could have been even probably 10 years ago. When I was part of the PyCon program committee, I remember when Python initially started shaping from the web development a lot of those Django frameworks, Flask, and all those topics, and slowly migrated the majority of the topics that were showing up for program committee to review. A lot of those topics were slowly shaping up around the HPC machine learning and all those topics. So it's not a brand new topic, but it was just lingering there and behind.

There are some areas that usually take a little bit longer to catch up. And like I said, the limitations of kube that we initially had-- for example, the Job, primarily, where the biggest hurdle that was with the Job controller was that, for the Job to be able to complete-- to execute, it had to keep the actual pods running the code around in the cluster for as long as the job itself was running. So that was fine if you had a job with 100, maybe less than so, pods running. But if you're running at a major machine learning frameworks, where you're running thousands or even more than that of pods in a job, and you have those thousands of pods lingering around until the job finishes, which can take even days or weeks, you're wasting a lot of the resources in the Kubernetes cluster.

ABDEL SGHIOUAR: Ah, I see.

MACIEJ SZULIK: And when we initially wrote the Job controllers, we were aware of that problem. But because that was the first installment, the first version was like, we're going to accept that kind of limitation. We want to give something away to the users to put into their hands so that they can test-- verify what's working, what's not, what kind of feedback we would have received, and based on the feedback, address them. So the rewrite that Aldo was doing over the past two years-- well, we actually finished that up roughly a year ago-- was that he changed how the Job controller is calculating the finished pods and keeping that information as the pod is finishing and properly updating those counters so that we don't have to keep those pods lingering around.

So previously, if you had a job of 100 pods, you would have to keep the 100 pods running until the job finishes. Now, every time a pod finishes in a job, it gets properly accounted for, and it can be removed from the cluster, releasing the resources. So a lot of those small improvements-- very often in the back, but not always because there has been a lot of work around allowing people to retry or primarily say whether a particular problem in a job is retryable, whether it should be retried.

There was also the expansion of the Job API where-- and that was also one of-- if you look back in the Kubernetes issues, that was one of the very first issues shortly after we created the job controller-- the index job, which basically allows you to assign a specific index to every pod. We were aware that kind of functionality will be useful to the job users, but again, we didn't have the capacity to push that kind of functionality forward.

So that's where the Batch Working Group again helped. They stepped up and implemented that kind of functionality. So currently, there's a lot of users of the index jobs. We're actually evolving the index job. In 1.31, there is something that is called elastic index jobs, which allows you to define that only certain indexes within a job can be finished or not.

So there has been a lot of changes like that over the past-- I would even say maybe more than two years. The official working group started about early 2022. I remember we were presenting for the Batch Working Group already in the KubeCon Europe in 2022. And if I remember correctly, that was in Valencia.

ABDEL SGHIOUAR: Valencia-- yes, 2022.

MACIEJ SZULIK: Yeah, so we were already asking around whether we can present the working group even before it actually was officially accepted by the steering committee back then. But yeah, the work has been going on for even a little bit longer. And I will admit that the Job API-- the Job and CronJob API which, a fun fact-- I'm not sure how many users are aware CronJobs, when we initially wrote it, it was called ScheduledJob.

Eventually, over the next two or three releases, we figured out that the ScheduledJob might be a little bit confusing to a lot of the users because of the kube-scheduler, which is responsible for placing the pods on your Kubernetes cluster. And we wanted to bring it more similar to the Unix world, where a lot of people are familiar with the Cron daemon responsible for scheduling-- well, scheduling is again the wrong word, but--

ABDEL SGHIOUAR: Running processes.

MACIEJ SZULIK: --ensuring that a particular task at a specified point in time or repeating those executions. So that's when we decided to rename the ScheduledJob to CronJob, which was, on its own, a very interesting and very involving task because a lot of users by then started actually using heavily the ScheduledJob.

And we wanted to make sure that the transition is pretty smooth, so we were actually allowing them to use both names at the same time. There's a lot of API machinery, and that was the time when I spent a lot of the time with the API machinery, with Clayton, adding and expanding the API machinery to be able to support both the CronJob as a resource name and the ScheduledJob.

So we were reading-- we had the capability, and that capability exists until today even in the kube code. We had the capability to read both the CronJobs and ScheduledJob. But in response, we would always return to the user information about the CronJob to slowly force using the users through that transition. But I will admit that that was an interesting time.

ABDEL SGHIOUAR: I have a question for you, and this was not even planned on the talk, but I just thought about it. Particularly for CronJobs on Kubernetes, if you-- the example you gave earlier, if you run a job that will end up running 1,000 or 2,000 pods, isn't the churn rate a limiting factor in Kubernetes itself by how fast those pods can get spun up, or that's not a problem?

MACIEJ SZULIK: Well, you as the job author have the ability to decide how much-- how big parallelism you can allow for your particular execution. So I'm not sure how much familiar you are with the Job API, but basically, Job API has two main configuration options. The first one is how many pods you want to run of your particular action that you want to execute, and the other one allows you to define how many are executed at any point in time.

That's called the parallelism. So you can define that, oh, my job is a 100 pod action or a task. But for whatever reason-- I don't know-- you know that at this point in time, you can have only at most 5, whether that will be your quota limitation or that will be your resources usage, you know that at this point in time, you can only allow 5, and there will always be 5 running and no more, plus/minus some [? SKU ?] when we are closing a previous one and starting a new one.

But for example, if you're running some kind of backup task, or some heavy computation overnight, or over weekend when you know that resource usage will be very small in the cluster, you can go a little bit crazy and bump that limit to 20 or maybe even 50 pods running in parallel. So I would say that's the limitation. And even if you decide to go full on board with completions 100 and parallelism at 100, when we are actually creating the pods for you, we're not creating all of them at once. We're actually batching the creation of the pods to ensure we're not overwhelming the API server with those creations.

ABDEL SGHIOUAR: Got it. So that was actually going to be my follow-up question. So there is some logic in the controller to make sure that pods are created at a specific rate, if you want, or--

MACIEJ SZULIK: It's not like there is a specific rate, but we're just ensuring that we're slowly adding more. And if we're seeing that there is a problem with creating those pods, we will actually back off and slow down the creations or stop it for a little bit longer. So it's not immediate running full on board. It's slowly getting to speed to whatever is allowed in the cluster. Because obviously, you can run into stuff like quota resources, or anything can happen in a live cluster.

ABDEL SGHIOUAR: Got it. So then I'll pause here and go back to talk a little bit about, What's your experience with all of this? What's your background? And how did you end up working on these HPC/batch kind of workloads on Kubernetes?

MACIEJ SZULIK: Since the early days, I was involved in the controllers-- like I mentioned, the Job, CronJob controllers, that was one of the first major tasks. It's funny because for the Kubernetes 10 anniversary, I was looking at my contributor card, and I also remember that someone mentioned that to me-- that my actual first PR-- Pull Request-- to the Kubernetes code was actually bumping or updating to newer version of Cobra.

Cobra is the library that we use for writing kubectl. So all the nice things with regards to how we parse flags, arguments, that is provided by the Cobra library. And then slowly, from that, I transitioned to writing the Job and CronJob controller. And that was because in my previous jobs, I was writing something very similar.

The only difference that back then, it was behind the curtain. I could not share anything that what I was doing or anything like that. So when I had the opportunity to work on the Job and CronJob, I just jumped the ship and slowly started working on it. Initially, the goal was to just write something-- like I mentioned, ScheduledJob, we didn't even thought about dividing that into those two controllers.

We started with, oh, we want to have the ability to run backups or basically some kind of periodic tasks. And then while we started talking with Brian Grant, Eric Tune, and a couple of other folks-- Dario-- we realized that we actually want to do first the Job API and then build the ScheduledJob or CronJobs later on top of the Job API because the Job API will actually-- first of all, we look at what the Borg already had on the Google side.

And if you look in the paper, there are clearly two primary types of workload. There is the continuous workload, so think deployment replica set. And there's run to completion. So basically, in our case, that's a job type of workload. That's how we started working with that.

That's how my first love, I would say, to Kubernetes-- that's why the Batch API, because both Job and CronJob controllers and the APIs are so dear to my heart. And that's how I've always kept a very close eye to what's going on in both of the controllers. Even though by now, both the Job controller has been rewritten by Mihaj, Aldo, and a couple of others so much that it does not resemble what I wrote originally.

The CronJob controller was rewritten from scratch by [? Ali. ?] Because for a very long time, CronJob controller was written using the initial primitive using the mechanism. So basically, instead of listening to what kind of changes are happening in the cluster and reacting to the changes, the first version of the CronJob was iterating over all CronJobs in the cluster every five minutes or so and then reacting to those changes, eventually triggering the job-- the creation of a job from that CronJob, which was very limiting.

Again, the similar problem as with a job, it had very significant impact on performance. And we knew for a very long time that we need to rewrite the CronJob controller to use the shared informers and the mechanism where we are listening to changes that are happening to the CronJob resource and reacting-- and to the Job resource-- to them constantly.

When we did that, [? Ali ?] did the majority of the rewrite a couple of years back, like three years. I remember he was presenting this topic during one of the NA-- North America KubeCons, and that was still the virtual one. The performance gains were pretty significant. So I will probably encourage everyone who are interested in that history to try to look up my name and KubeCon NA presentations where that was being discussed.

ABDEL SGHIOUAR: We will find it and add it to the show notes for the episode, too.

MACIEJ SZULIK: Yeah, that would be perfect.

ABDEL SGHIOUAR: Yeah, we'll do that.

MACIEJ SZULIK: So that naturally-- and then I eventually, over the years, I got involved and become one of SIG App. So that's the special interest group that is responsible for all the workload and the batch controllers. I became the chair and the lead, and that has been for three, four or five years by now, I guess. I can't remember.

Probably something that we could look up if someone is interested in. And when Aldo and Abdullah pinged me about the ability to create the Batch Working Group, I was-- oh, yeah, that's definitely something because I know that there are limitations in the Batch API. I can name you at least two or three that I would like to see being resolved, but I just don't have the time myself-- the capacity to be able to work on it, but I can provide or guide people.

There are even presentations from back in-- I think I did a presentation back in Copenhagen, so that's 2018, KubeCon Europe. I did a presentation about writing controllers, and during that presentation, I asked people-- I'm like, who is interested in rewriting the CronJob controller to the new pattern using the shared informers and so forth? I think it took me another year or two years to actually find the volunteer, and work with him, and be able to push this idea forward.

I will admit that I'm planning to do a refresh of that talk because until now, there are two presentations that I did that I'm still seeing that a lot of people is looking after them, and I've used them multiple times as a reference. One was how to write kubectl code, so basically a SIG CLI tour, because like you mentioned, there has been major refactors in kubectl.

We tried to rewrite kubectl such that it had a specific form. There are initiation where we create the command. Then we do validation of all the parameters that the user provides. And eventually, we actually invoke the command. The idea is to be able to first build commands on top of one another.

For example, weight is one of my perfect examples, where we are actually using the weight code implementation and delete command, where we're waiting for the finish of the deletion. And I did a presentation how to write and look through the kubectl code, and that's when-- that was one. And the other one was how to write Kubernetes controllers.

And I'm actually planning a refresh of the latter later this year in KubeCon North America. Fingers crossed so that my proposal gets accepted. But this time around, I'm planning to play around with the idea.

Because currently there are multiple tools that you can use to write a Kubernetes controllers. That's not only the built-in shared informers and client-go, but there's also controller-runtime, there's Kubebuilder, there's Operator Framework, all the other tools. And I think it's important to remind people and for those people to have those references. Because as you're probably aware, more and more newcomers-- even though we are 10 years into this game, there's more and more people, especially newcomers, that are coming to kube and are struggling.

Because for us that we've been with the project for 10 years, a lot of those topics are coming naturally. And I am catching myself very frequently-- that for me, a lot of those discussions topics just come naturally, and I know where to look for it, even though I'm not 100% sure I can find the right piece of code. But because I know what I'm looking for, even though the particular piece of code moved-- but because I know what I'm more or less looking for, I can quickly navigate the code and find that place.

For example, what I'd mentioned-- cohabitating resources, which is what allowed us to have CronJob and ScheduledJob. I could easily find it where-- if I'd be interested in API machinery and how to do it. But for a lot of the people, the Kubernetes code today is very intimidating because it's a lot of work that has been put over the past 10 years by so many amazing people that put sweat and tears into making this project where it is today.

ABDEL SGHIOUAR: Nice. So then I'll bring us back to the original topic of the conversation, which is AI and ML. And since you've been involved in quite a lot for very long time, I think you'd be the perfect person to help us and the audience make sense of all the terms. So HPC, AI/ML and batch-- what do these four words mean actually?

MACIEJ SZULIK: So let's start with the simplest one, AI/ML, because that's probably on people's minds these days a lot. So AI is Artificial Intelligence and machine learning. And for a lot of the people, that's probably a synonym. And actually, that is because AI these days is nothing more than much better machine learning from a couple years ago.

The batch is pretty broad topic because it comes from the ability to put several resources and try to either place them on the cluster at once within a single unit of invocation, or somehow from the Unix environment where people are talking about batching is usually packing them together and somehow execute in the single unit. And lastly, HPC, I will admit that that has been popping up and back and forth.

And I have some time hard to figure out what it means because-- but basically, it's about High Performance Computing. So if you again go back to machine learning, the idea of machine learning is to put our computers to use 100% and just analyze what kind of data you have.

And HPC is just that. It's actually using the full power and the full performance of your computer to be able to-- whether you're looking at the images and trying to find similarities between the images, or whether you're looking at text generation, you're trying to digest all that information into a binary representation that you can then just use for-- whether that will be predicting what kind of output a user requests, or whether the output will be text and image, or more sophisticated like video or audio.

ABDEL SGHIOUAR: Yeah. And also, we have to say that-- just to add to that, that HPC have been around for a very long time in our field in the tech industry. It's just that it probably today means something broader than before, when it used to mean very, very expensive, very proprietary hardware essentially.

MACIEJ SZULIK: Yeah, correct. I mean, that's one of the gains that we have with the Kubernetes, and that is also one of the topics that has been popping up frequently during the Batch Working Group meetings. We've been lucky to have some folks from the university areas. So those are the folks that actually have been doing HPC for multiple decades, I would even say, before AI started to become--

ABDEL SGHIOUAR: A thing.

MACIEJ SZULIK: A thing, famous, and the hype. They've been doing that for a very long time. And they actually did a lot of the tooling. But in most cases, the tooling was very highly dedicated to the environment that they had. And what we're currently trying to do is we're trying to combine their dedicated pieces of software with the genericness-- let's call it that way-- of the Kubernetes and the ability to run any kind of workloads on the Kubernetes.

And there's also the recent two additions to the Kubernetes. So we've created two brand new working groups this past month, two months ago. One is devoted to ensuring that the serving workloads-- so once you have the model built--

ABDEL SGHIOUAR: You have the inference part.

MACIEJ SZULIK: Exactly. You want to make sure that the serving part is also working properly, and there are multiple approaches to how to do it best. So that's one thing. But the other side of that equation is how to actually use the Graphical Processing Units-- so GPUs-- to actually perform that. It's not only for the batch. Because there is a different usage pattern for batch and for the serving workloads. So there's a separate group that is specifically focusing on allowing users to efficiently use the GPUs. Because a lot of the primitives that we currently have in the Kubernetes, they were only focusing on CPU, and memory, and a couple of other primitives like quota and whatnot.

But we didn't originally wrote kube such that it will support GPUs, especially that they are used a little bit differently than CPUs are. So the working group is currently working on making sure that that enabling and the dynamic resource allocation-- that's the whole enhancement that we're building around-- is possible in the Kubernetes cluster, and that people can both allow their users and for the users to come and actually efficiently use those resources.

ABDEL SGHIOUAR: Nice. Yeah, that's actually pretty interesting. That was going to be my next question on how all of this is impacting Kubernetes. And I was not aware that there was two new-- I was aware of the inference one, the serving one-- the new working group, but I was not aware of the--

MACIEJ SZULIK: Device management--

ABDEL SGHIOUAR: Yeah, for GPUs. I mean, for GPUs, I think so far, Kubernetes, one of the few things you could do is basically influence the scheduling decision by asking the scheduler to put your pod on a node that has a GPU, right?

MACIEJ SZULIK: Yes, but that's very high level, because I'm not an expert in that area. I will say that there are people that are so much more smarter than I am in the GPU-- in the device area. But basically, GPUs are allowed to be divided into various pieces. And you can use either a fraction of a GPU or--

ABDEL SGHIOUAR: Like multi-slicing. Yeah, you can slice a GPU.

MACIEJ SZULIK: Exactly. You can slice a GPU to whatever. Theoretically, you can think about it-- oh, well, I'll just figure out that my GPU will be divided into 10 equal pieces. But actually, there will be people who would want to divide it into 16 or 32 pieces.

There will be people who will be using more, and there will be people who will be using less. And the ability to modify that usage-- and by that, there's a little bit of dynamicity needed to be able to pick and choose what you want. Whereas with CPUs, we can only say, oh, yeah, I have 10 millicores, and you have to divide it statically. Whereas with GPUs, the dynamic side is much more important.

And that's the big thing that the Device Working Group is doing currently. And they are building on the DRA, which if you look at it, the DRA, or Dynamic Resource Allocation, the topic has been in the works for over a year now. But it was slowly moving, and it just so happens that the current hype pushed a lot more power and a lot more people towards that area to be able to push and focus Kubernetes into proper usage of those resources.

ABDEL SGHIOUAR: Nice, nice. Well, I think we should probably add this to our list of topics to explore because those two new working groups sound interesting. So before I go to the question that we, of course, ask every guest-- what do you see for the future of Kubernetes-- but before we go there, can you explain to us what PRR is? Because I think we have never discussed this in the show?

MACIEJ SZULIK: Yeah, so Kubernetes is trying to-- first, we've been around for 10 years, and we want to make sure that the people who are using the Kubernetes project have always the best experience. So we need to make sure that the stability is there, the functionality is there. What kind of new feature-- especially the new features, you're trying to advise-- that it fulfills the requirements for both the cluster administrator to be aware of what's going on in the cluster. So you can think about the monitoring, but also for the users, and how to divide the responsibilities, but also how to monitor, if I'm rolling out a new change, what kind of signals that I should be looking for that the new functionality is not working correctly.

How to turn on or off this new functionality? What is the upgrade path? Are there any requirements with regards to system-level components, whether you need to have something to be able to use a particular functionality or not?

So Production Readiness Review is actually a part of every enhancement document. So Kubernetes Enhancement documents-- KEPS for short-- is a document that describes your new feature, your new functionality. And as part of that document, there is a section towards the end of this document, which goes through all those questions, where you're thinking about what kind of additional storage requirements will be.

If you're adding new fields, what is the maximal potential size that the new field will come into the existing APIs? Whether you're creating a new API, what is the size of the new API? What kind of monitoring requirements you have? So you can call out specific metrics.

What is the acceptable SLI for those metrics? So what is the level at which point you should start to be worried that your functionality is not working correctly, whether you should start thinking about rolling back or you can actually work on it, or maybe at which point you should decide to actually turn off and report a particular problem with the functionality? And we're helping the authors of every enhancement to ensure that the production readiness-- so that your feature, it will be working in a full-fledged clusters.

Because oftentimes, there will be people who are thinking about, oh, I want to add this particular functionality only. And it's a simple thing. But if you start thinking about and looking at the interaction that this particular feature will have at different components of the cluster, it suddenly looks not as simple. So that additional set of eyes that is looking at it from a different angle is a critical piece to ensure that we are maintaining the high standards for future Kubernetes releases.

ABDEL SGHIOUAR: Nice. Thank you. That's a really good explanation. So my last question is, where do you see the future of Kubernetes?

MACIEJ SZULIK: I think the biggest challenge that we're currently seeing is to ensure that the community has the potential for further growth. The biggest problem that I've noticed myself over the past two or three years is that there's-- with the majority of the layoffs happening in the IT sector, we've lost several contributors to the project, whether that's to the main Kubernetes or a lot of the projects that are built around the Kubernetes project.

So I would say that's the biggest risk and that's the biggest investment that the project has to continue to make. Because on one hand, we need to make sure that the experience for new people coming to help to work on Kubernetes is simple and it's picking up any kind of task-- like, for example, going through an enhancement process-- is achievable and it's not overwhelming. Either that's through mentoring or helping people understand how the process works.

But also to ensure that the ease of use is there because that has been mentioned several times in the past-- that there are so many options that we currently have in the Kubernetes-- which I believe is probably a problem of any medium-size to high-size project, that the entry line for every user is so hard.

So probably that will be the focus-- the ease of use, the friendliness for both the users-- and by users, I mean both the actual consumers of the platform and the cluster administrators-- but also for the developers to be able to sometimes fix a simple thing rather than just command [? in. ?]

ABDEL SGHIOUAR: Yeah, nice. Well, Thank you very much. I just-- before we leave, we just have to acknowledge the fact that your audio quality is good because of your son who gave you a gaming headset, right?

MACIEJ SZULIK: Correct. Yes, my oldest son. I did reach out to him. I needed a proper cable mic, and he was like, oh, yeah, I have a very good gaming headset. And he was like, you can definitely use that one. Like, thank you.

ABDEL SGHIOUAR: Awesome. I didn't want to leave before we acknowledged that. [CHUCKLING] All right, thank you very much, Maciej. Thanks for your time.

MACIEJ SZULIK: Thank you.

[MUSIC PLAYING]

KASLIN FIELDS: Hello. I am excited today to be talking with Clayton Coleman. Clayton is a long-time Kubernetes contributor, having helped launch Kubernetes as open source, being on the bootstrap steering committee, and working across a number of SIGs to make Kubernetes a reliable and powerful foundation for workloads. At Red Hat, he led OpenShift's pivot into Kubernetes and its growth across on-premise, edge, and into cloud. At Google, he's now focused on enabling the next generation of key workloads, especially AI/ML in Kubernetes and on GKE. Welcome to the show, Clayton.

CLAYTON COLEMAN: Thank you, Kaslin. It's a pleasure to be here.

KASLIN FIELDS: And this, for everyone's awareness, is the episode is so nice, we're doing it twice because [CHUCKLING] we actually recorded this interview once and then had technical difficulties. So I didn't know that you were also a tester. Is testing one of your passions?

[CHUCKLING]

CLAYTON COLEMAN: Actually, when I graduated, I graduated from college in the winter of 2001, and it was at the depths of the dotcom crash, and there was no one hiring. Except I went to work for IBM actually out of college as a tester, and I was such an annoying tester because I would not only find issues but also file bugs at the same time. They were like, we really need to get this guy out of test. He's super annoying. We'll just make him fix the bugs that he introduces. So I became a developer.

KASLIN FIELDS: And still today good at finding those issues. Perhaps we do need to file an issue with our recording software. But thank you so much for being on again and for testing that. And testing is one of my personal passions, so I certainly appreciate that.

And I also started out in QA, so we have that in common, which I didn't know last time. So we learned something new from this recording already. But today, we are here to talk about AI/ML and HPC on Kubernetes. And you've been here since the very beginnings of Kubernetes before 1.0 when it was mostly Google and Red Hat kind of working together to get it going. So could you tell me a little bit about your early days with Kubernetes?

CLAYTON COLEMAN: So I said I started at IBM. A lot of us left IBM about that time to go work at this really small mid-- well, not that small, maybe 4,000 people, but a small open source company that was also in the Raleigh area called Red Hat. And I was working on OpenShift, and it was Platform as a Service.

That was when Heroku, and EngineYard, and dotCloud were all trying to make it. And so OpenShift was open source Platform as a Service with this really cool idea of containers. And one of the things that we really noted was there were two problems, one which was nobody wants simple things.

Red Hat has been working with enterprises for a really long time with Red Hat Enterprise Linux and supporting production workloads with-- alongside the growth of cloud, Linux became huge. And the reality is that everybody says that they want simple, but once they get past the point where they need simple, they then need complex. And everybody's complex was different.

So we were seeing PaaS was too simple, and we got a call out of the blue at Red Hat from some folks at Google working on this weird project called Seven of Nine, which was their code name for it. And it was they wanted to do something with containers and Docker. And kind of listened, and we were like, well, that's interesting.

And then about-- we didn't hear anything from Google for a while. They're like, well, we don't know if we're going to open source it. And then about six months later, right about this time of year, just a little bit before the DockerCon where Kubernetes was announced, we got a call-- hey, are you still in? And we did that last minute. Well, if we're going to go big with containers, it's going to be a lot bigger than PaaS, so this seems like the right opportunity.

And so I've been contributing to Kubernetes ever since then. And it's great having the 10-year celebration because it's been a pretty crazy journey as we've all-- how do I say this-- jointly recognized that even though people say they want simple things, they really just want to get what they need to get done, which is often quite complex, and then they want it to just keep working. And so a lot of the last 10 years has been like, Kubernetes is neither simple nor does it try to be too complex.

And that's a really hard place to be in the middle of because everybody comes to it with a different set of assumptions. And so that's actually-- that is the most interesting and difficult part, which is something that everybody hates. Because they have to use it because it's so widely supported. It does so much of what they need. It's been an interesting 10 years, and I couldn't imagine doing it any other way.

KASLIN FIELDS: That is true. Enterprise kind of has this entropy to it, doesn't it?

CLAYTON COLEMAN: Absolutely.

KASLIN FIELDS: Nothing stays simple for long in enterprise.

CLAYTON COLEMAN: And it's actually-- this has been a-- I think a group of us have always kind of chatted about this. Everybody needs a platform. I think Kelsey had a great quote about this-- is everybody needs a platform, but it's really important that people build their own platform. And part of the reason for that is everybody's requirements are just a little different. You need a different set of tools.

Your teams have different set of experiences. You're building different workloads. And so you can't get 100% agreement on the right way to do canary rollouts or the right way to run microservice workloads. And the best you can do is offer something that most people will choose over building it themselves. And I think that's actually a worse-is-better kind of approach.

And I think as we're getting into AI, that's the classic-- everybody is so excited about AI, and then you realize that it's just doing this big statistical average of every human on the internet, which is kind of-- if you think about it, you're like, well, humans are, in aggregate, not all that good at telling you the right answer every time. So as I've been getting into AI, I've really appreciated that worse-is-better mindset.

KASLIN FIELDS: Worse-is-better mindset, that's quite the way to frame it.

[BOTH CHUCKLING]

I think we're all figuring out how to think of AI and what it's good at. That's a question that I'd like to ask that is not on our list, actually. What do you feel like AI is really good at this stage?

CLAYTON COLEMAN: I think this is the exciting part about AI, is there's such an enormous amount of hype-- it's such a cool and amazing experience. Because pretty much up until now, you have to translate what you want to the computer. And there have been various chat bots over the years. There's been ELIZA bot where you say things and it just asks you questions back, which works really well because humans love to talk about themselves.

But I think that natural language interface, it's still really early. LLMs, as they are now, are just the start. But we can actually now imagine user interfaces where you just type things and stuff happens. And it's not perfect. I think there's an interesting tension, which is, how good does something need to be to be reliable?

And different use cases actually have different needs. So a really great example would be, if you have the time, you can run a whole bunch of-- you can ask the AI a whole bunch of times, and you'll eventually converge. That's what a lot of people find value in chat bots-- is maybe the first answer isn't perfectly correct, but you can keep refining until you get something-- you talked it through. It was your rubber ducky. That's a very valuable use case. I've seen super useful translation of different types of programming languages between each other, which is just a time saver.

If you see an example-- and you could go manually do it, but if you can have the chat bot or the code completion take a chunk and then generate it out, overall, you're getting a net improvement. Even simple things like looking at pictures, you can speed up a lot of repetitive actions. And I think that's actually the real superpower of today's AI is, it's not that it can give you something precisely, but it can automate doing things that you can do but have better uses of your time.

And that's what we've been doing with computers for 60 years is-- of course I could learn long division and linear algebra, or I could write a program and then everybody can benefit from it. I think we're still in that early phase. The surprising thing for me is how many people are finding productive uses for this. There's a little bit of the cynicism creeps in, which is this is overhyped.

It's absolutely overhyped, but there are people in production today in very traditional businesses doing very traditional, boring things that are using LLM under the covers, and that's awesome. And I saw that in Kubernetes as well. Early on, probably a lot earlier than people would have expected, your credit cards were running on top of Kubernetes somewhere.

Transaction processing was done on Kubernetes. And you don't really see this as a consumer. Sometimes it's scary, but it was always scary. You just didn't know about the details of it. You have to hide some of these problems, and you have to trust that other people are going to put the due diligence in. And that's how we can help-- helping people run LLM workloads.

KASLIN FIELDS: I have occasionally talked to some folks who worked on some early days of Kubernetes-ish era, like credit card systems, and that is always terrifying. Maybe we can find someone to have on the show one of these days.

CLAYTON COLEMAN: We had to go debug one of those ones, and I got to look at-- I think it was we had to go look at a bug on a system that had been running for a few years on Kubernetes 1.1 code. It was running in a production enterprise environment. I was actually surprised.

KASLIN FIELDS: Scary.

CLAYTON COLEMAN: The early version of Kubernetes was pretty simple. We hadn't gone and added all the extra features. And I think that was actually a really strong point of Kubernetes in the early days, which is if you can make something that's simple enough that you can go back and understand that it does a basic thing well-- I actually like to think about that design principle.

When we think of-- if everybody was going to go run inference workloads-- like, inference is the new web app. I got to say that at KubeCon. Tim quoted me on it. I felt very proud that I've been quoted for something.

But I think it's pretty reasonable that most of our applications in the next few years, at some point in these big web-facing workloads, in enterprises, in personal life, something you do is going to touch an LLM. There's going to be a lot of it. It's very compute intensive.

And there's going to be so much of it running. It's actually not that complicated. What we can do is, how do we go find those simple things that help most people run inference workloads? And Kubernetes is actually pretty good at that, helping you run lots of workloads and standardizing it.

I think there's some real opportunities for Kubernetes to streamline that, make it more reliable, make it more secure. How do you make sure that it's easy for people running lots of workloads to keep those workloads apart? You don't want data for one part then you put it together. That's something Kubernetes can help with.

KASLIN FIELDS: I do want to dive into that. I know that you're involved with the brand new Working Group Serving. And we'll dive into what that means in a moment. But first, I want to address terminology. So we've used the words inference, we've used the word serving, High Performance Computing-- HPC-- AI.

ML is very common. I don't think we've really talked about it. But we have all of these different terms for these workloads, and I feel like they're often used interchangeably. Do you think that they can really be used interchangeably? How do we use these terms in the community, and what do they mean to you?

CLAYTON COLEMAN: I will say that over the last year, I've struggled with figuring out what-- there's a lot of new terminology emerging, and there's some old terminology we're trying to map. So terminology is a hard problem, and it's very important. So I will say, from my perspective, one of the things that Kubernetes started with was helping make stateless web applications work well. And that was a need people hadn't handled.

It wasn't just the one Java monolith that everybody contributed to. In 2012, 2013, we had 50 different programming languages, 150 different language frameworks, and Rails was big, and all these new Java frameworks. Like, oh, no, don't forget about Java. We're still cool and we're really fast. You just have to learn Java to use us.

All of those workloads were trying to run together. So the early days of Kubernetes, is those stateless web applications-- the term I think that I can't say with authority is that what we should all use for all this, but we tended to use the word serving to describe a workload that expected to run for a very long time-- indefinitely-- and to take traffic.

And conversely, the dual of serving is batch, where you say what I want, and the computer, at some point in the future, will respond and give you back an answer or move a bunch of data someplace and tell you that it succeeded. And so Kubernetes definitely focused on serving. And actually, I think Kubernetes is a really great system for serving. It doesn't solve all the problems, obviously. But more than any other system in the world, Kubernetes runs serving workloads.

I think maybe the most number of workloads, serving workloads, run on Kubernetes, but I couldn't prove it to you. And batch was important, but it wasn't the most important problem people had. So we knew that there would be batch, so we wanted Kubernetes to support both serving and batch. In the last few years, batch has become very important. There's a lot of mature batch systems, especially in the high-performance computing area.

There's very popular things, like Slurm, that are kind of orthogonal to Kubernetes. And then you see a bunch of people who are doing a lot of serving suddenly have all these big data workloads alongside their web apps. Well, the natural thing is to add those to Kubernetes. So batch has gotten better on Kubernetes, but it's by far not the only way that you can run batch.

And now with AI/ML, we have two big workloads. We have training, which is your development process where you go do a bunch of complex computation. And you have inference where you take the model and you ask it a question. And even inference can actually be both batch and serving.

So traditionally, inference was a lot of batch. You'd make a model, and then you'd gather up all your day's data, or look at your data lake and build a big thing. And you'd send it off as a job and get back some data that you'd put in a database somewhere. The big change with LLM is if chat-- if natural language is a human interaction, then suddenly, all of the systems that we've built for humans clicking around, we actually now want to wire those up to LLMs.

So suddenly, especially with large language models, inference has really become an online thing, very interactive, and so that requires low latency. And if you need low latency, you might as well just keep the thing running all the time. And so we have inference that is more batch-focused, and we have inference that's more serving-focused. And so sometimes, inference and serving get overused. I tend to say online inference or real-time inference, and then there's maybe offline inference, or fungible, or low-latency, or batch inference. That's kind of the difference. But like all things, I don't think you can get 10 people in a room and get one answer.

KASLIN FIELDS: Yeah, I wrote a blog post about this recently, which you reviewed. Thank you very much for that. [CHUCKLING]

CLAYTON COLEMAN: You're quite welcome.

KASLIN FIELDS: I am very much of the opinion that I think serving is very subtly different from inference, and like you're saying, that inference also covers some batch kind of workloads. But training is very much in the world of stateful workloads, where it's very resource intensive and very long running, so it tends to be run as batches as well. So I feel like training is kind of its own area of use cases, but inference covers a couple of different ones. So I like that.

CLAYTON COLEMAN: As an industry, we've never come up with an official name for the stuff that services are made out of. Sometimes you say services, but what does service mean? We talk about web apps, but web apps are an older term. There's API front ends, but sometimes you can have API back ends. So I think I've certainly struggled as we're trying to clearly communicate what we are and what we aren't in these new working groups that we are putting together. Terminology really matters.

KASLIN FIELDS: And speaking of terminology, to dive into those working groups further, let's go over the Kubernetes terminology here for a second. So a SIG, or a Special Interest Group, are the groups that work is split up into in Kubernetes, but they're the highest level of groups. So we have SIGs like SIG Networking, SIG K-8s-Infra, SIG API Machinery, all of these very high-level concepts.

And those are Special Interest Groups is what SIG stands for. I don't know if I said that. And then SIGs usually break down their work further into subprojects. And working groups are this concept beside of all of that. They are topical groups, I would say.

They're spun up when there's something going on that the Kubernetes community thinks that we need to focus on that maybe doesn't fit neatly into a SIG, or it's really important but might not live forever. So the SIGs are things that are going to be important to Kubernetes basically forever. Like, networking is always going to be a part of Kubernetes, so we always need to have a group focused on it.

But sometimes, we have these areas come up where we think we really need to focus on this, we're not sure if what it's going to look like in the long-term, but let's spin up a group to figure that out. So I think that's where working groups come in. They don't have subprojects, but they're kind of treated like SIGs in that they have their own meetings, they have their own space within the project, though they're usually kind of related.

CLAYTON COLEMAN: Yeah, and a really important distinction is that ownership-- a SIG owns code. And so working groups are composed of people maybe from multiple SIGs creating code, but ultimately, for the project to have that control over what is Kubernetes, what are we going to support, what are we going to maintain, the mechanism is, SIGs still have to take ownership.

So a SIG sponsors-- multiple SIGs might sponsor a working group. The working group can definitely generate code, but ultimately, somebody at the end of the day is going to need to support that over time, and so the SIG is fulfilling that responsibility. So the working group's job is to help bring together SIG members around key problems. But they don't get a free pass. Somebody still has to pick up the pieces when the working group is done.

KASLIN FIELDS: That's a very important component I had not thought to explain, so thank you for that. I'm going to use that when I explain working groups from now on. [CHUCKLING] So you are part of the founding group, I suppose, of Working Group Serving. And we also in the project have Working Group Batch, which has existed for a bit.

We were talking about how Kubernetes has been focused on serving, or web server, or stateless type workloads for a long time. And one thing that I've noticed when I would talk to folks in the HPC, or High Performance Computing, community is Kubernetes has been a strong pick for those types of workloads for a long time. But what I've heard from them is that the scheduler is not as fully featured as they need.

So sometimes they end up implementing their own custom schedulers but still using Kubernetes for some of the infrastructure management underneath. So it makes sense to me that we would spin up a working group batch to address those types of workloads. And on this episode, we're going to talk to the head of Working Group Batch as well. But you are in Working Group Serving, so do you want to talk a little bit about Working Group Serving and Working Group Batch, and how the beginnings of Working Group Serving are going?

CLAYTON COLEMAN: Absolutely. And Kubernetes-- I've been around for a long time. When we were just getting started, we didn't think in terms-- we were focused on stateless. We knew we would want to do stateful eventually. And we wanted to make sure there was something to support batch, but it wasn't the biggest focus for the vast majority of people adopting Kubernetes.

And so the SIGs were actually set up around what we thought about at the time. And then over the years, we've tried to figure out how we evolve the project. And so working groups-- the early working groups, we weren't quite clear on what they were. Now, we're starting to really lean into that.

One of the most successful working groups, I think, in a model is Working Group Batch. It was a big, ambiguous problem that crossed a lot of domains. The folks involved identified some key points of what Kubernetes should do best at, which is bringing the ecosystem together, providing a great foundation so that even if the foundation itself is somewhat arbitrary-- some of the benefit of Kubernetes is you just don't have to argue about how to implement this basic stuff. You can go argue about how to implement the higher level stuff.

And so the folks in Working Group Batch looked at the problem space. They started a number of things to improve scheduling based on some of that feedback. They looked at things-- a big part about batch is the ability to queue-- to say, well, I need this to run, but because I've already given it to you and I know it's going to take some time, sometimes I can wait longer.

And so that queuing concept was really missing from Kubernetes. And we said, well, we don't want Kubernetes to be this big monolithic thing that keeps shoving everything into the base. We wanted clean layering. And so the Kueue project was actually created to try to find the common primitives among a number of these above Kubernetes batch orchestration systems so that you could say, hey, I have this job, if somebody else comes along who's more important than me and they submit their job, when the resources become available, maybe that other person, their job runs first versus my lower priority job.

So that idea of a queue and the idea of prioritization was not something that was really appropriate for us to add to the Kubernetes scheduler, which is about pods and workloads. And so that time-fungible aspect, the Kueue project was really successful at that. And I'm excited to listen to the podcast and hear the batch team talk about it.

We used Working Group Batch as a model because batches-- if we say Kubernetes was about-- there's these two big classes of workloads-- those that run forever and those that don't-- well, maybe in the last 10 years, we've actually taken our eye off of, well, what's the next step for web applications for backends?

And AI is a great opportunity because it's novel. You've got these new use cases that need a lot of compute. They're kind of like web applications. Inference is simpler than web apps. There's only a few things. Everybody kind of has similar problems.

It needs hardware accelerators for the biggest stuff. That's something kube doesn't support well. So even though we wouldn't actually go implement all that support, we can provide requirements. We can say, well, what do all of the people running inference need when they're serving workloads on top of Kubernetes?

But we don't want to just end up and walk away from this and say, oh, cool, we made accelerated large language model inference on top of Kubernetes. Awesome, we're done. We'd like to, I think, continue some of the key challenges that-- when we first did Kubernetes 10 years ago, we put in services, and deployments, and stateful sets, and pod disruption budgets, and horizontal pod autoscaling.

Those are really great primitives. They're not that powerful, but what they do is they allow a broad range of workloads to work. There's a lot of problems that people in Kubernetes have today that involve lots of dense workloads running together, where there aren't great tools to let Kubernetes manage that density for you, because Kubernetes doesn't know.

And so one of my hopes with Working Group Serving is we've got a focus. We want to make that work well, but we want to leave behind improvements that make all serving workloads on Kubernetes better. So disruption budget is a great example. It's a really simple construct. You can say, I don't want Kubernetes to allow anyone to go take away one of my pods if it's below this number.

That's the simplest possible thing. It's so simple that a lot of stateful workloads use it and have to work around its limitations, which then causes operational impacts because they say, you can't take away any of my pods. And Kubernetes really wasn't designed to communicate why you can't upgrade a node because there's one pod. We had to go build that stuff.

I think there's some stuff coming that we can do that says things like-- just as one example-- this workload has some really strong guarantees as long as you keep those higher level guarantees. Not just remove one of my pods. But for inference, everything's about latency.

As long as you keep my traffic, my latency distribution for responses at about this, do whatever you want. And that actually, I think, is a real opportunity. Because if we have that higher-level guarantee, it would work for LLMs, but it might work for web applications. And so if you remove an instance and your latency is outside your target, we just stop taking upgrades.

But that also means that somebody can look at that and say, oh, well, all I have to do is go add more instances to let you upgrade? That could be something where platform teams and Kubernetes as a Service offerings could add new capabilities that let you do preemptive maintenance. So I'm actually pretty excited about-- we want to make a use case work well and leave behind a better foundation for others.

KASLIN FIELDS: And that is something that I've heard commonly across the Kubernetes Contributor community is that AI workloads are special in that they have a lot of focus on them, and there's a lot of hype around them, and so everyone has to focus on them. But they're really not that special in that they have these components of a workload that are common to other use cases that Kubernetes has been working to solve anyway. So this is really just a great excuse for the project to focus on some areas where maybe it hasn't gotten to focus as much in the past. And it's going to have benefits across the entire system of Kubernetes that can benefit all types of workloads, not just AI ones, but AI is the excuse to do it.

CLAYTON COLEMAN: And I think that you always have to have a reason to make a change. Sometimes "I can make something work 1% better" isn't enough. I think if you have a really concrete example-- I think I love that example that you gave because a lot of what we benefit from in Kubernetes-- and this is another important part, is at the end of the day, it's not just about this one workload. It's about giving people enough of a common foundation that other workloads can exist so you don't have to come back and ask Kubernetes.

So we often try to design things that are as open-ended as possible. And this gets back to that whole too-simple story, which is, there's a really hard balance to strike between, what are the things that somebody who's familiar with the problem would be like, yep, that's the right way to do it, finding several sets of people who agree just enough-- because if several different sets of people in different use cases agree that this is something that's valuable, it's probably got legs.

The challenge then is that that's a new construct and concept that you have to expose to users. And I'm always sensitive to-- I still see this 10 years in. There's people out there who, the first time they hit Kubernetes, it's like being hit with a tsunami of concepts. You gotta learn pods, and services, and workloads, and the controller pattern, and spec, and status, and watching, and nodes, and disruption budgets.

And so I think there are things that we should think about while we're doing this. How do we balance between adding so much flexibility for accelerators and supporting all this new hardware? What do end users actually want most of the time? They're mostly going to be running these classes of workload.

Let's make sure that the support that we add is no more complex than is needed. And that's super hard. I don't think we've always succeeded at it, and we'll always bias a little bit more towards flexibility than simplicity because that's why people use Kubernetes.

KASLIN FIELDS: Wonderful. Kubernetes is a platform for building platforms.

CLAYTON COLEMAN: That's right.

KASLIN FIELDS: And so to close this up here, is there anything else that you'd like to highlight about the work that Working Group Serving is doing that our audience out there might be able to help you all with or would be good for folks to know about?

CLAYTON COLEMAN: This is a personal appeal. It's often hard-- I know-- when you're building something and trying to get something done on top of Kubernetes to take the time to come to working group meetings, to listen in. It takes time out of everyone's lives. The only way that we will be able to make progress on this project called Kubernetes and this effort is when people listening to this say, oh, I'm running inference on Kubernetes, I've actually got some pretty strong opinions about what I need.

And sometimes those are lower level opinions. I can't always fix all of them. But what we really need are the use cases, like, this is important, and that there are people out there who know far more about running inference on Kubernetes than I do. My job-- and part of the reason why this podcast is so important is, as a community, if you're out there and you're hoping that somebody is going to make your future life easier, that person exists.

That person is me. Come to Working Group Serving. Just give us a use case. Give us 10, 15 minutes of your time. Write a blog post and share it with us. That kind of feedback-- what do you want out of inference, and accelerated inference, and large language models on top of Kubernetes-- will help not just folks today, but the next 10 years of people building things on top of Kubernetes. And only you have the power to make sure that we implement the right thing.

KASLIN FIELDS: If we don't hear your use cases, then it's possible that we could miss something that you could have told us about. And we need to know those examples in order to build the right things. So if you want to see the right things built for your use case, come on down to Working Group Serving.

CLAYTON COLEMAN: Please!

KASLIN FIELDS: Thank you so much. Yes, [CHUCKLING] please, please do. Join the Slack. Join the meetings. Come say hello. Just send us a little abstract about the use case that you're building. Even that would just really make a difference, I think.

So thank you so much for being on today, Clayton. I really enjoyed learning about the early days of Kubernetes and about the future of serving workloads, whatever that may mean, on Kubernetes.

CLAYTON COLEMAN: Thank you very much, Kaslin. It was a pleasure to be here.

[MUSIC PLAYING]

KASLIN FIELDS: I am very excited to now be speaking with Dawn Chen. Dawn has been a principal software engineer at Google Cloud since May 2007. Dawn has worked on an open source project called Kubernetes before the project was even founded, really. She's been one of the tech leads in both Kubernetes and GKE and founded SIG Node from scratch.

She also led Anthos platform team for the last four years and mainly focuses on the core infrastructure. Prior to Kubernetes, she was one of the tech leads for Google's internal container infrastructure, Borg, for about seven years. Outside of that work, she is a wife, a mother of a 16-year-old boy, and a wonderful friend. She enjoys reading, cooking, hiking and traveling. Welcome to the show, Dawn. [CHUCKLING]

DAWN CHEN: Hello, Kaslin, and thank you for inviting me for this podcast. Yeah, I'd love to talk to the community and talk to you, and also especially for this special moment, Kubernetes's 10th anniversary. And I feel there's so many things to share. And also, I want to engage with the community even more through this kind of thing, like larger, even wider community and through those-- all kind of the channels, including of the podcast.

KASLIN FIELDS: I am so excited to be speaking with you. You also got to speak at the KuberTENes Birthday Bash at the Google Bayview campus in California very recently on Kubernetes's birthday on June 6. And you're one of those folks who has been working on it because you were one of the engineers on Borg, and so you involved with Kubernetes in the very early days, like you said in your bio, even before it was a project really, because you were already working on Borg. So can you tell me a little bit about what those early days of Kubernetes and working on Borg, what all of that was like?

DAWN CHEN: Yeah. Even recently at the birthday, I shared some of the story. So I even mentioned that very earlier, [? node ?] agent, which is kind of like the-- is kind of today's Kubernetes, which is running on every single Kubernetes node. I worked on that agent before Kubernetes funded, and that agent actually can work-- support both the pod spec, which back then we call it container manifest, and it also can support [? worker ?] API, and also omega API.

So it's kind of the-- we actually at that time think about we want to open source that agent, have the single agent have the common component, but supports three different type of the API. But of course, that project is being killed by ourselves, even myself, because it's too complicated. It's just like a beast.

And we killed because to feel out the real project-- the real open source project, Kubernetes. And so when we found Kubernetes 10 years ago and the earlier days after that-- when the earlier days is so exciting and there are intense collaborations across many companies-- immediately got the media the attention and from the industry. And at that time back then, our team is really small but passionate. We have a bunch of books file I think maybe just five of us from work team and two or three people from the Cloud, the Google Cloud, but we are passionate.

We share the same vision to transform how application are deployed and managed. And I remember then after we have the first talk about that one at DockerCon, year 2013 announced this-- Eric Brewer announced this kind of thing-- so we are going to do this kind of things.

And then we immediately got attraction from Red Hat, OpenShift folks, and IBM, and many other companies. So of course, we also got a lot of attention from Docker back then. So thereon, there are many long hours, the heated debates, and also have a lot of-- plenty of the pizza. And I really mean pizza because back then there are so many meetups and hangouts together.

People drop you a message and want to meet you and talk about this idea even if you have no idea where they are. And so that's why-- I remember that time-- the summertime back then, I ate most of the pizza in my entire life.

[LAUGHTER]

Drink most of the Coke, iced tea. Most-- I remember most is the shared sense of purpose. And we really believed what we build in Borg container technology and container orchestration help Google. And we open source, and the builders can help more people and can change a lot of the way how-- and build this standard, so the API standard and all those kind of things. So that's what we are thinking, yeah.

KASLIN FIELDS: That's so awesome, and I didn't know about the agent piece of it, where you considered open sourcing just an agent from Borg. I would imagine that would have had a lot of trouble catching on if it was just a single agent rather than this full distributed system like Kubernetes ended up being.

DAWN CHEN: Totally. So that's part of-- actually, I recently shared with the Tim Hockin. I said, he I just triggered Kubernetes to kill our own project. And so that kind of a problem So it's really hard to kill your own project that you initiate, you start, you put a couple years' effort.

But then we see the big picture. We see even better-- the more community-friendly, easy collaborate project like Kubernetes, and written in the Go language instead of written in C++. It will have much, much more value, have a much bigger impact and can leverage community. You can pick up more talented folks from all over the world. So that kind of things. So yeah, I feel proud that we did that.

KASLIN FIELDS: That's so awesome. And as you said, you were an engineer on Borg for, like, seven years. And the way that you described it, I hadn't really thought about before, honestly-- that Borg is not part of Google Cloud. It's part of Google proper, isn't it? You have the way that Google runs its own data centers and runs its containers across the whole company. So even though Kubernetes, which came from it, is very cloud-oriented, Borg itself, of course, is part of the core business of Google and not part of the cloud.

DAWN CHEN: Totally. Even when I first joined Google back to the 2007-- and it's almost the same time Google start-- they're serious about the cloud business. So when I was joining Google-- and in the Linux kernel, the cgroup container technology, it's just a term.

It's in some email. [CHUCKLING] Nothing really serious or finalized. So in any case, I shared my first project-- how I'm trying process using the, back then, kernel module. And track the process and then figure out what group of process, this belongs to one job, which is container, right?

So then it goes to the old technology to track the CPU and memory usage. That's the earlier form, how we doing this task management and provide feedback loop to the Borg. But then at the same time, there's the-- Google proposed the cgroup technology to Linux kernel. So when that time also-- because when I first heard that proposal, I said, why don't we do this for cloud?

That time, definitely is the question asked Paul Menage, who was my tech lead back then. He is the one, together with Rohit Seth, together proposed the cgroup technology, memory cgroup technology, to Linux kernel. Totally changed this whole thing. And then he just laughed. He said, oh, this haven't happened yet, this is too early.

And also he said, oh, there's the security problem. Of course, even today, security problem, it is a pain. So that's why we have some other technology. So even Kubernetes support VM-based or hypervisor-based to provide a stronger security.

So that's kind of-- all those kind of things I just want to share. So this is why from day one, literally, we are thinking about this. And like, nobody just joined. Google. And when I heard this one, I said, why don't we do this?

KASLIN FIELDS: Yeah, that's so cool. You started out not only just from the beginning of Kubernetes, not just from the beginning of containers, but from the beginning of cgroups, [CHUCKLING] the fundamental technology and the Linux kernel that is behind part of containers.

So one thing I want to get straight on this timeline here is, you were there when they were starting to come up with the ideas that would later become containers. Where was Borg at when you joined Google? Was it something that had already been established as a project, or was it just starting up as you were joining?

DAWN CHEN: Actually, Borg already started as the project. They were already built, but Borg initiative is try to build for Google internal batch workload. So back then, Google have two systems to do this cluster management job or [? orchestration. ?] One, it is for the batch workload, and called [? work ?] [? queue. ?]

Another one is for those other production services. Latency of the workload is called BabySitter. We have the two systems parallel. So the cluster, actually, it is data center. Also, it's dedicated-- like many other company today even-- it's dedicated for different teams, different [? PA. ?] And also, they divided it into multiple clusters, and those certain clusters may be dedicated for batch workload, a certain is for production service-- long-running services jobs. It's not shared.

So when the Borg first built, it's for batch workload. But batch workload already built something similar, so they didn't immediately see the value from the Borg. Instead, those products and services now running services job, latency services job. Those companies-- those team, like web search-- and if I remember correctly, it's web search infrastructure. And this is the value.

Because original BabySitter is the static-- allocated those-- placed those job on those dedicated machines. So when the machine need maintained, when the machine need repair, they need to statically change those configurations, just like many traditional company before we have the Kubernetes.

So they see this value, like the dynamic-- and place when there's the machine have the problem and how do I have the failure. And also, there are even-- I have the potential-- we see this resource starvation or bin pack issues. So they could move workload from one machine to another machine, a bad machine to the good machine, and all those kind of things.

So they deploy off the Borg using that, but they don't really have the feedback loop from-- what I mentioned that my first job is the tracking process, provide feedback loop. And the cgroup is still the working stage, so it's the proposal stage, wild imagination stage.

So they don't have any of those kind of things. So that's why when I first joined, Borg is just only rolled out to one or two clusters and for those one type of job. Utilization is not the thing, performance is not a thing, efficiency is not a thing. Until 2008, there is kind of this Google [? internally ?] because economy, all those kind of things. And also, Google has some-- because we couldn't get enough hardware.

KASLIN FIELDS: Yep, common problem today, too. [CHUCKLING]

DAWN CHEN: Yeah, so we have the resource hardware, resource starvation issue. All of a sudden, sharing-- resource sharing, cluster sharing becomes a thing. We just totally change our [? fortune, ?] and that's the momentum. And Borg flag internally becomes standard.

So at the same time, we quickly build those feedback loop, and so then we can share resource much better. We can guarantee those latency-sensitive jobs resource requirement and how often they can access those CPU and memory. But at the same time, we also can lead those batch workload to utilize the slack on the machines.

So then the utilization from the 10%, 15%, like the lowest one, to the highest one, 70, 75. So that's how we change around. And so then after that, then we continue. We see this value-- quick feedback loop.

And then we evolve this with Borg. So all those kind of-- the CPU, memory, disk I/O and all those kind of things can evolve. The innovation is just booming all over the place-- control plane.

KASLIN FIELDS: Yeah, so many concepts.

[CHUCKLING]

DAWN CHEN: And also on the control plane perspective, and then they also evolve how to accelerate faster scheduling and parallelizing, all those kind of things. And then we enable about the hierarchical scaling for that-- to those-- something that you mentioned today, like the topic of HPC workload and MapReduce workload, all those kind of things. So that works very well for us.

KASLIN FIELDS: I had no idea that the early days of Borg were so focused. I mean, it makes sense that they were focused a lot on using the resources efficiently. That's kind of always been part of the concept of distributed systems-- making sure that we're distributing our applications and our workloads across the hardware that we have efficiently.

But I didn't know that it had a focus on batch workloads, which is really interesting because I was just talking with Clayton Coleman, who was at Red Hat when Kubernetes started up. And in the early days of Kubernetes, they were very focused on stateless workloads and web applications and those types of things, and batch workloads kind of took a back seat for a long time.

But Kubernetes always felt like it made a lot of sense with batch workloads because it's this way to connect your applications with the hardware that they're going to use. So it was always kind of there. And I've met a lot of folks in the HPC space and AI space from before the current era who used Kubernetes because it made a lot of sense for their batch workloads and AI-type workloads. So it's very interesting to hear that is in the roots of it in Borg.

So let's dive a little bit into what's going on right now. We've talked about a number of different terms. Let's start with the terminology as I did in my interview with Clayton as well. We've talked about batch workloads on Borg. We've talked about HPC a little bit. We're going to talk a little bit more about AI today. There's a whole bunch of different terms in play-- inference, serving, all sorts of things. So what do you think of the AI terminology right now, and what do you use to describe things?

DAWN CHEN: Just like what you mentioned, at the earlier time, we are more focused on Kubernetes, more focused on the stateless. So there's the-- actually, initially, from day one, when we start, we actually want to support all the workload. We do.

KASLIN FIELDS: Of course.

[CHUCKLING]

DAWN CHEN: But the Kubernetes, if you really think about it, even today-- Kubernetes back then-- so people just started to realize thanks to Docker. And people started to realize how important the container technology. And especially Docker make-- this is-- I think Docker did extremely well job to make that the technology is accessible to everybody, and how to figure out how to package those kind of things, M that fundamentally changed this whole thing.

But still, Kubernetes come from the first dockercon, and it's really complicated. Even today, still complicated. And it's really difficult conceptually and logically, and it's just really hard compared to Docker. Even if you really think about compared to Docker Swarm, so people started from a single node running their workload.

Then it's actually-- there's the naturally migration, mind shift from the Docker through the Docker Swarm, but that's not enough. That cannot drive off this large-scale problem. So that's why we have to introduce the Kubernetes abstract and the concept from the top.

So then we are thinking about what kind of things we can support. Then one of the things we try to make people thinking about, it is-- which, I disagree from day one, but it turns out that's right thing back then. So many people try to add, OK, they say, oh, thinking about your pod can be pushed, it can be drained. Your container can be drained.

Your node is the kettle, your pod is kettle, not Pet. When you think about statefulness, that's perfect. But because from day one I'm thinking about support all the workload, I say, no, no, no, no, what are you talking about?

[CHUCKLING]

Once you have the data-- so once you have-- when you support the database, when you support it-- even you can remove the mounted, but you still cannot-- because there is the disruptive. There's the disruption between when you drain the node, when you do this. So obviously, there's the high availability achieved by the duplication, and ReplicaSet give us those kind of things, and all those kind of things.

But still you have the disruption. And so that time is perfect. So we started from the stateless and then later expanded to support the StatefulSet. And then later, we pay attention to the batch workload. So you can see that. So later, we have introduced the high-level abstract-- so Job or CronJob and all those kind of things.

And then we also from day one-- long time back, not day one. Actually, after we introduced those kind of things at almost the same time, then we also see that needs about the AI machine learning workload. And so you can see as the Sig Node, I have that luxury to always to always have to engage about people want to run different types of the workload.

So that's why from the node, and then we can do the-- we have to focus on how we enable GPU support, even TPU support. So we build off those device plugin API, enable those kind of things. And then we also need to-- the resource topology aware.

So make sure-- on the node side, make sure when this HPC workload or AI machine learning workload land on the node, they can access that resource easily and make sure guarantee of those special requirement. Then we even expanded that to enable about not just the node level of the topology, numa awareness, and even at the cluster level, topology-aware scheduling.

So all those kind of things you can think about, we started from small and very simple, really simple, stateless workload and expand it to the StatefulSet, and for all those CPU memory management to guarantee the resource access, all those kind of things. And also at the same time, we expanded to batch workload to make sure we have the [? conceptual ?] to capture their scaling requirement, parallelization requirement. And also, we expanded to support the special device-- like GPU, TPU-- and also make sure the performance being guaranteed. So that's why have the CPU management, memory management, and the quality of the services special-- those kind of things-- and also numa association, all those kind of things.

KASLIN FIELDS: I love the way that you look at this. And I think it's a way that a lot of folks maybe struggle to wrap their head around. A lot of folks will think about use cases from higher level perspective of, here's what I'm trying to achieve, here's what I'm trying to build in things. But from the Kubernetes perspective of someone who's building Kubernetes itself, and especially from someone who has focused on how the nodes themselves are designed in Kubernetes, as you have for so long, the real consideration here is not what you're trying to achieve.

It's the requirements that your workload has. And so the words that we use need to be helpful in determining what the requirements are that a system like Kubernetes needs to provide to those workloads. So as someone who works with the node deeply, of course, you think about what different types of hardware are on that node and how you connect the workload with those resources that it uses.

And I'm wondering, what kinds of work are you doing today? Are you specifically focusing on AI workloads much? Are you still in the world of the node? What are you doing today?

DAWN CHEN: I always have the-- my mission on the data plane, which includes the node network storage. Because fundamentally, those user workloads will run on the data plane on the node. But the node is my basecamp.

The reason it is focused on-- pay extra attention on the node. So I'm not limiting myself on the node only. But I pay extra attention on the node, always making me have the earlier engagement with user, with the vendor-- the device vendor, like the NVIDIA, Intel, AMD, all those things-- and also have the earlier engagement with the Linux kernel folks.

So that's kind of-- and also, even the new technology about monitoring. I'm still working on-- I'm focusing on the data plane, but additional focus on the node. But I'm not really-- recently, I'm looking into how Kubernetes can evolve to support multiple host inference job and can evolve to support some of those large distributed applications, such as the framework, like the array reserve, all those kind of things.

I'm looking to those and work with a lot of experts. I'm not an expert, but I'm just trying to make sure I listen to them and throw some idea-- maybe stupid and maybe late but just try to motivate those-- encourage those conversations, healthy debate, healthy conversation.

So then we can evolve Kubernetes. Just like, in the past, I think one of the critical differences today compared to before-- it is, in the past, we cannot support a lot of those workload for Borg. We had some experience, even a lot of evolved performance limit. There's a lot of boundary.

There's the performance-- push it to the new limit. But we kind of understand-- just everything's changed so fast. General AI-- and AI push everything to the new limit, new boundary. We just don't know. We don't understand.

And so it is-- we need to understand. I need to keep telling myself, OK, forget about everything. And it's not really forget, but you have to [? decompress ?] everything in your mind.

Don't treat those kind of things become for granted because there's new type. Everything is new. You need to understand the fundamental what it is, the difference, the new type of workload, what new challenge it is. Then you can help them to evolve Kubernetes.

Because I truly believe if we can do good job, you are evolving Kubernetes to support more workload. That's the common interest for the industry. So you can always start another [? orchestration, ?] but there's a cost to start new, the cost to-- even for a company, not Google, outside company.

They already spend the time to move to Kubernetes. There's the cost to fund the Kubernetes move to other. So evolve Kubernetes is common interest. And also, I feel like if being owner, to be part of this one, that's kind of things I can continue to contribute if we work together to do things better together. That's kind of where that came from.

KASLIN FIELDS: Perfect. And I think that is exemplary of a lot of what the Kubernetes community is feeling right now, is the fact that AI is the hype thing is neither here nor there. What's exciting about it for the Kubernetes community and the infrastructure community is that it is a workload that uses so many pieces that have been put in place over the years in these infrastructure systems, and it just pushes them to new heights.

And so we're just evolving on the things that we were already working on really, and using that expertise in infrastructure, in the data plane, in the node, in all of those pieces working together underneath to support this new level of workload in batch and serving, and all of these different areas. So it's really just an evolution of the work that you've already been doing for a long time. Wonderful.

DAWN CHEN: It's a honour I'm just lucky. Evolved with Borg and also Kubernetes, and now the AI and all those kind of-- I just feel so lucky.

KASLIN FIELDS: Yeah, I feel inspired to go read the Borg papers again. So thank you so much, Dawn, for being on. I definitely learned a ton from that. And I think we'll wrap it up there. Are there any last things that you want to mention about AI or the history of Kubernetes for our listeners? I think we covered a lot.

[CHUCKLING]

DAWN CHEN: We have. But most-- I think that Kubernetes can be today's-- achieve such success, mostly it's community and all those collaborative contribution, collaboration. So fundamental problem, it is we have-- I think this industry have the same set of the problem to manage it.

So no matter you are doing the closed source or open source, there's one set of problem is common problem. The infrastructure-- how to build up reliable, elastic, and efficient infrastructure, and so the application can run on top of that and worry nothing.

So we need a shared mind together-- work together. And we want to more talent, younger generation, join this force. And I always feel like the work on the infrastructure is amazing because you are like-- sometimes I feel like we are the kingmaker.

A lot of people They can quickly try to iterate, not worry about the hardware, or not worry about the infrastructure, or not worry about how to use the network. They just-- so we are trying to evolve the [? steady ?] to move forward but quiet.

It's not like-- the people always got most of the attention is those sharing project running on top of the infrastructure. But infrastructure people are actually supposed to be quiet, hidden. So that's why I keep saying this is kind of amazing for me.

But you always have to evolve. You always have to-- there's new hardware, new device come out, the new technology, new requirement come out. So you have to marching toward. Sometimes you actually have to upfront marching because you have to lead in certain way, so because once you break that boundary, so then the infrastructure, and more technology, and the more advanced things. So that's amazing. So I'm really hoping more people join Kubernetes, and work together, and evolve Kubernetes together.

KASLIN FIELDS: Yeah, I think that's a beautiful message to close on. Infrastructure is always going to be there. It's going to be at the core of what we do in computing, and so it's a great place to get involved and learn. So we hope to see more of you in the Kubernetes project as contributors. Hope to see you soon. And thank you so much, Dawn.

DAWN CHEN: Thank you.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Kaslin, we did it.

KASLIN FIELDS: We did it.

ABDEL SGHIOUAR: Four episodes, one month.

[CHUCKLING]

That was a lot of work.

KASLIN FIELDS: It was quite the adventure. We developed some new systems along this journey for running a series of episodes, like planning them all in advance. And we had one that was off our usual bi-weekly schedule, and it was kind of an adventure. I don't know if it's quite a month, but very close.

ABDEL SGHIOUAR: Very close. I mean, just the fact that it was-- 3 plus 2 plus 3 plus 2-- so that's 6-- 10 guests in total-- or 11, I think.

KASLIN FIELDS: Yeah, and we still have a long list of folks that we didn't get to talk to. So looking forward to the rest of our year's schedule as well.

ABDEL SGHIOUAR: Yes. We have some really cool stuff coming up, so let's not spoil it up. But I did one episode today. I recorded one episode, which is coming today, and there is some other interesting things happening.

KASLIN FIELDS: And I've got one I'm recording very soon as well, so look forward to that. It's someone that I'm very excited to talk to. But let's talk about Kubernetes-- Kubernetes and AI/ML in this case.

ABDEL SGHIOUAR: Yeah, that was actually super interesting to plan those episodes because Kubernetes has been going through quite a shift in terms of supporting AI and ML workloads.

KASLIN FIELDS: And I think another interesting thing that we got from this is that a lot of the considerations around AI/ML workloads have kind of been there from the beginning in Kubernetes, but a lot of it hasn't been as focused on. We talked a lot in Clayton's interview and in Dawn's interview about how-- what I learned from Dawn is that batch was actually a primary focus of Borg originally.

So in the inspiration for Kubernetes, batch was kind of the main focus. And I've learned from talking to folks in the HPC space-- it just makes sense-- Kubernetes and AI/ML workloads. But the way that Kubernetes has been designed in the world of web applications has been more focused on those types of applications, which have been more common traditionally over Kubernetes's lifetime. But now we're seeing this shift, and it feels like actually it's going back to its roots to me.

ABDEL SGHIOUAR: Yeah, I think Maciej also mentioned that because Maciej mentioned that batch was in the white paper for Borg-- that the white paper of Borg actually mentioned the fact that Borg supported both long-running applications and batch type workloads. So it kind of makes sense that that made it into Kubernetes because that's the inspiration where Kubernetes came from, right?

KASLIN FIELDS: Yeah.

ABDEL SGHIOUAR: But I think that a lot of times, people don't realize this, so here is one thing. OpenAI, they have actually a blog where they write about how they use Kubernetes for training GPT. So there is a blog. It's called scaling--

KASLIN FIELDS: Something I need to go read.

ABDEL SGHIOUAR: --yeah, it's called Scaling Kubernetes to 7,500 Nodes. So there is already a blog about that.

KASLIN FIELDS: 7,500-- that's-- all right.

ABDEL SGHIOUAR: That's nothing. GKE can support 15,000 nodes, so--

KASLIN FIELDS: Yeah, we hear that a lot working for Google.

ABDEL SGHIOUAR: And then there were also a bunch of other-- I saw articles right and left about this. Some of these super popular AI frameworks used today, like Ray, for example, has a Kubernetes operator. So yeah, it's taking Kubernetes in a very interesting direction, I guess.

KASLIN FIELDS: Yeah, so it's this shift that's happening with a foundation, I would say. We've got the fundamental concepts in Kubernetes already, and there's a lot of perfecting that needs to go into it. And also, AI/ML workloads these days are, of course, quite different from what they were 10 years ago when Kubernetes was created, or even longer ago when Borg was designing how it handled batch workloads. So there's batch, and there's also the serving components. And LLMs are also quite the different evolution of AI workloads. So there's a lot of details, I think, for the Kubernetes community to dig into, and it seems like they are doing so with enthusiasm.

ABDEL SGHIOUAR: Yeah, and in my discussion with Maciej, we talked about the Job Controller and how that Job Controller got rewritten at some point. I think it was a complete rewrite to support some corner use cases. So yeah, it was quite interesting. I learned a lot.

And then there is this new working groups that we chatted about with Maciej, which I went ahead and added to our backlog of topics to eventually explore at some point. So you have Working Group Serving, which is inference, and Working Group Devices, which is for accelerators-- for GPUs and TPUs.

KASLIN FIELDS: And Working Group Batch, of course, has been around for some time. There's also a couple other working groups that have been spun up recently. So I did really enjoy the focus on working groups in this episode. Because I feel like that's something that, the general public, you're not going to really know about how Kubernetes itself-- contributing to it and how the project functions, is that we have these working groups where, when something comes up, we have a way to spin up a group that's going to focus on it. So I love to see that happening, and I love that we have a chance to talk about it.

ABDEL SGHIOUAR: Yeah. No, it's pretty interesting. So no, the episodes were cool. I think Maciej, of course, has a very long experience, have been around for a very long time.

KASLIN FIELDS: All three of them. This feels like another folks who've been around since the beginning episode. They all have an element of that, don't they?

ABDEL SGHIOUAR: Yeah. Clayton, I remember particularly that I spent two hours discussing with him at KubeCon Paris this year. We were supposed to sit down for half an hour, and then the conversation dragged along into some super deep stuff, and we ended up talking for two hours. I think at some point, somebody had to kick us out because they were closing the venue.

KASLIN FIELDS: And Dawn being an engineer on Borg before Kubernetes was even thought of.

ABDEL SGHIOUAR: Of course. Yes.

KASLIN FIELDS: A very interesting perspective to get there, and I learned some things from that.

ABDEL SGHIOUAR: Exactly. Yeah, it was cool. We did it. We did four episodes.

KASLIN FIELDS: Thanks for sticking with us, folks. If you haven't listened to all four of them, would recommend. I think we covered a good variety of topics, and I think each episode has something a little bit different to offer.

ABDEL SGHIOUAR: Exactly. And I think we're going to go back to our regular schedule, which is once every two weeks.

KASLIN FIELDS: Better. Because we've learned quite a bit about setting up process around planning our episodes.

ABDEL SGHIOUAR: Exactly. I like the way we work now because we can work slightly independently from each other for-- at least for the guests. There's little dependency on bringing people, finding them, scheduling them and all that stuff.

KASLIN FIELDS: Yeah.

ABDEL SGHIOUAR: So yeah, and I had to do all of this while traveling, which is challenging. So now I have to carry this extra 1 kilo, and I have microphone with me all the time.

KASLIN FIELDS: Yeah, I don't know how you do that.

ABDEL SGHIOUAR: I travel mostly to run, eat and sometimes there are conferences happening. This is what I tell people. There are two key priorities--

KASLIN FIELDS: And do podcasts, I guess.

ABDEL SGHIOUAR: Yeah, but that's just-- I happen to be in hotel rooms a lot, so. All right, thank you very much for listening.

[MUSIC PLAYING]

KASLIN FIELDS: That brings us to the end of another episode. If you enjoyed this show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on social media @Kubernetespod or reach us by email at <kubernetespodcast@google.com>.

You can also check out the website at kubernetespodcast.com where you'll find transcripts, show notes, and links to subscribe. Please consider rating us in your podcast player so we can help more people find and enjoy the show. Thanks for listening, and we'll see you next time.

[MUSIC PLAYING]

View More Episodes