Kubernetes Podcast from Google: Episode 263 - Kubernetes AI Conformance, with Janet Kuo

#263 December 16, 2025

Kubernetes AI Conformance, with Janet Kuo

Hosts: Kaslin Fields

Janet Kuo, Staff Software Engineer at Google, explains the new Kubernetes AI Conformance program.

Do you have something cool to share? Some questions? Let us know:

Intro

News of the week

Links from the interview

Transcript

Show full transcript

KASLIN FIELDS: Hello, and welcome to the "Kubernetes Podcast" from Google. I'm your host, Kaslin Fields.

[MUSIC PLAYING]

KASLIN FIELDS: We've had a bit of a break between our last episode and this one due to KubeCon. KubeCon North America 2025, in Atlanta, was a blast, as usual, and I recorded two interviews on site. The first of those interviews is today's episode, an interview with Janet Kuo about the new Kubernetes AI Conformance Program.

Janet is a staff software engineer at Google Cloud, where she focuses on Google Kubernetes Engine. She's been one of the primary leaders as the AI conformance program has been established over the course of 2025.

There was a ton of news at KubeCon North America 2025, including the launch of the AI conformance program, which our interview is about. So get ready for a substantial news update before we dive into that interview. But first, for 2025, there will be one more episode after this one, the Kubernetes 1.35 release episode.

The release episode will likely come out on Monday, December 22, though the release itself is expected to be out on Wednesday, December 17. Couldn't quite make the episode and the release line up perfectly this time, but we'll close out the year with info on the latest and greatest in open-source Kubernetes.

[MUSIC PLAYING]

But first, let's get to the news. The Kubernetes Project is preparing for the 1.35 release planned for December 17, 2025. This version will see in-place updates for pod resources graduate to general availability, allowing users to adjust CPU and memory without restarting pods. Additionally, image volumes are expected to be enabled by default.

Helm, the package manager for Kubernetes, has released version 4, featuring new upgrades, such as WebAssembly plugins, which further enhance its capabilities for managing Kubernetes applications.

Google Cloud successfully demonstrated and operated a Kubernetes cluster with 130,000 nodes within Google Kubernetes Engine, setting a new public record for this scale. This achievement involved re-architecting the Kubernetes control plane and replacing etcd with a custom spanner-based storage system, showcasing GKE's readiness for large-scale AI and data workloads.

Microsoft continues its significant investments in Azure Kubernetes Service, or AKS, focusing on enhancing reliability, performance, and security for AI-native workloads. AKS has already met the requirements for the new Kubernetes AI Conformance Program, which we'll talk about in this episode.

Furthermore, Microsoft is transitioning away from Ingress NGINX, as we all should be, given its pending deprecation and removal. The maintenance for Ingress NGINX in AKS is ending in March 2026. And they are shifting towards the Gateway API as the future of application connectivity in AKS.

AKS also announced AKS Automatic Managed System Node Pools, where core cluster components run on Microsoft-owned infrastructure, so you no longer provision patch or scale system nodes. AWS has rolled out new managed features for Amazon Elastic Kubernetes Service, or EKS, aiming to simplify workload orchestration and resource management on AWS.

These capabilities include enhanced integrations with other AWS services and the management of controllers as managed services, allowing users to focus more on software development rather than infrastructure management.

The 10th Anniversary KubeCon CloudNativeCon North America in Atlanta celebrated a decade of cloud-native progress and focused heavily on the industry's shift from cloud native to AI native. Key themes included the rising importance of inference, the potential of WebAssembly, and increasing complexity in observability and infrastructure.

The Cloud Native Computing Foundation announced the launch of the Certified Kubernetes AI Conformance Program. That's what this episode is going to be about today. This community-led initiative aims to establish and validate standards for reliably and consistently running AI workloads on Kubernetes, addressing the growing demand for AI infrastructure.

The Linux Foundation has officially established the Agentic AI Foundation, or AAIF. This new foundation provides a neutral, open foundation to ensure Agentic AI evolves transparently and collaboratively. Founding projects in the AAIF include Anthropic's Model Context Protocol, or AMCP, Block's goose, and OpenAI's AGENTS.md.

KServe, an open-source Kubernetes-based platform designed for deploying, serving, and scaling predictive and large language models, has joined the Cloud Native Computing Foundation. OpenFGA, an authorization engine for complex access control, and Lima, providing secure, isolated environments for cloud-native and AI workloads, have both been accepted as CNCF incubating projects.

The Lima Project has also released version 2.0, bringing new features specifically designed for secure AI workflows, expanding its focus beyond containers to include artificial intelligence. The Cloud Native Computing Foundation has announced the schedule for KubeCon CloudNativeCon Europe 2026 to be held from March 23rd to 26th, 2026, in Amsterdam, in the Netherlands. And that's the news.

[MUSIC PLAYING]

Hello, and welcome. We are live on the show floor at KubeCon CloudNativeCon North America 2025 in Atlanta, and I am speaking with--

JANET KUO: Janet.

[LAUGHS]

JANET KUO: Hi, everyone. I'm Janet Kuo from Google.

KASLIN FIELDS: --and, Janet, I'm really excited to have you today. I saw you in the Keynote earlier, which was incredible, where you were talking about the AI Conformance Program for Kubernetes.

So I think a lot of people don't know about Kubernetes conformance to begin with, so could you tell us a little bit, first, about the concept of conformance in Kubernetes?

JANET KUO: Yes. Kubernetes Conformance is a program that, on a platform, must pass a set of tests to say it's Kubernetes conformant. It makes sure that every platform provides similar experience for running workloads on Kubernetes.

KASLIN FIELDS: And now we're announcing this AI Conformance Program. So something that Chris pointed out earlier in the Keynotes today, that I really liked, was that the whole idea of the original conformance program for Kubernetes is to make sure that Kubernetes Everywhere works the same way.

So it becomes the standard for the industry to use because, no matter which environment you're running Kubernetes on, you know that your workloads are going to work in pretty much the same way because it has that AI-conformance certification-- or the regular-conformance certification. And now we're talking about AI conformance. So what's different about AI conformance versus regular Kubernetes conformance?

JANET KUO: Yeah. So first, you need to be Kubernetes conformant before you can be Kubernetes AI conformant, so it's a superset of Kubernetes conformance. And the reason why we bring the Kubernetes AI Conformance Program is that we are starting to see, on AI workloads running on Kubernetes, they start having different requirements, for example, different networking requirements, or accelerator, how things run.

And then we see an opportunity for us to, again, bring the conformance to the AI space to make sure that AI workloads runs the same anywhere, just like all the workloads runs the same anywhere in the Kubernetes-conformance cluster.

KASLIN FIELDS: And I think a lot of folks these days are, hopefully-- I hope, at least-- just Kubernetes admins, where you're taking some Kubernetes service, someone else created the Kubernetes cluster for you, and you're administering it using kubectl and those kinds of things. But conformance, we're talking about how the cluster itself is deployed and how the networking under the underneath works.

And when I talk to folks about AI workloads, a big thing that comes up is Kubernetes is really well known for running stateless workloads really well. And AI workloads are often stateful. They often have these really strict hardware requirements.

So AI conformance, it sounds like, is about creating the Kubernetes cluster in such a way that you can run those types of workloads, get those kinds of hardware reservations, have the kinds of networking and low latency that you need for AI workloads.

JANET KUO: Yeah. So the conformance is mainly on the platforms. We are not enforcing how people want to run AI workloads, but it's more about what capability or guarantees that the platform should provide so that when customers or users, they want to run an AI workload, they know what to expect on the platform.

And they will also want to help the industry and help the ecosystem to come up with new standards out of the conformance so that we have a common way to execute. For example, we want to have a standard for how the accelerator metrics look like or how the accelerators are exposed. And for example, we are adding a DRA API to the conformance program so that every platform will support DRA API. And you can request for accelerator resource through DRA API.

KASLIN FIELDS: That's another good thing for us to talk about. So DRA is Dynamic Resource Allocation. And that's one of the, I think, big, highlighted features that Kubernetes has created over the last year or two, specifically to serve AI workloads.

And the idea behind dynamic resource allocation is that you are making it easier to control the specific types of hardware that your workloads are using, and understand, I think, more the specific types of hardware that are available in your cluster. Could you talk a little bit about what you see as how DRA works and what its, really, core use cases are?

JANET KUO: Yeah. So DRA is really useful for you when you want to specify really sophisticated or fine-grained requirements for asking for accelerator. You're not just asking for a single, simple count.

You are asking for this number of memory, that number of GPU, or TPU, or different kinds of accelerators. You have special requirements. And then that really gives you the flexibility to specify that without you figuring out the right place or the right accelerator I want.

KASLIN FIELDS: And Kubernetes, traditionally, is very well known for, you define your workload, and Kubernetes figures it out from there. And something I'll point it out to folks a lot is that, in the AI world, you really care about what kinds of hardware are underneath there. So dynamic resource allocation is a feature to make that clearer for folks to get you all the way down to that specific type of hardware that you really need.

And you gave a demo this morning of an AI-conformant Kubernetes cluster running a workload. Could you tell me a little bit about what you showed in that demo?

JANET KUO: Yeah, sure. So what I showed in the demo is that I want to highlight a few requirements that we put in the AI conformance program, for example, the platform supports DRA API, so I'm using the DRA API in my inference workload so that I can show you that it can easily get accelerator resource I want to run the workload.

And then I also show some requirements. For example, the platform needs to support a monitoring system. And I should be able to auto scale my pods based on custom metrics. So I showed how to auto scale based on the number of requests returned from my inference workload.

And, eventually, I also showed that I can also get the performance metrics out of the accelerator in the platform, so this is really showcasing what you can do with AI conformance clusters. And then this is just the beginning of the program. So we welcome everyone to come, and tell us what you want, and what you need, and help us shape the future of AI on Kubernetes.

KASLIN FIELDS: I think that's a great message here. And AI conformance in Kubernetes, it's been created, thus far, through a working group in the open-source project, right? What do you think is in the future cards for this program or for the types of features that are required by conformance?

JANET KUO: Yes. So like I mentioned, a lot of things are not standardized yet, so we hope to bring people from the community to figure out what we want to standardize on and what standards we want to have.

For example, I was just talking with other folks from the community. We think that we need to standardize on the metrics so that it's easier for users to figure out how they operate their applications, or else they have to figure out different metrics for different accelerator or different workloads. It's just too tedious to use.

And then we also want to standardize on things like DRA attributes so that it's easy for you to get the information you need without figuring out what attributes are exposed in different accelerators. So basically, standardize on things that users are using for AI workloads will make it simple. And we want to provide a consistent experience, and make it easy and affordable for everyone.

KASLIN FIELDS: And a key feature of working groups in open source Kubernetes is that they typically don't own any code. The code that they have will then be owned by the kind of stakeholder, special-interest groups, within Kubernetes, which are long-running bodies, like subcommunities within the contributor community.

And, occasionally, a working group will turn into a special-interest group itself. But, for example, this metrics discussion that's happening in the community, is that going to be part of a different SIG going forward, do you think? Or will the working group continue to work this stuff out for a while? What do you see as how that works out in the group?

JANET KUO: So the working group is currently sponsored by SIG Architecture. So we're going to get feedback from the SIG Architecture. And we are also getting help from SIG Testing because we want to introduce automated testing in the next year so that every platform has a simpler way to get certification.

And then we're going to work in the working group until it matures. And after that, maybe we'll give the code back to SIG Testing or SIG Architecture. And potentially, we can have it merged with the Kubernetes conformance.

Maybe one day Kubernetes AI conformance is just Kubernetes conformance, but that's a very bold goal. And for now, we just want to make sure that we are doing everything that covers the community needs.

KASLIN FIELDS: This is a really fundamental step, I think, in the transformation of Kubernetes in this AI environment that we are coming into. So thank you so much, Janet, for sharing a little bit with me today about AI conformance.

JANET KUO: Thank you for having me.

[MUSIC PLAYING]

KASLIN FIELDS: That brings us to the end of another episode. If you enjoyed the show, please help us spread the word, and tell a friend. If you have any feedback for us, you can find us on social media @Kubernetespod, or reach us by email <Kubernetespodcast@google.com>.

You can also check out the website at Kubernetespodcast.com, where you'll find transcripts, show notes, and links to subscribe. Please consider rating us in your podcast player so we can help more people find and enjoy the show. Thanks for listening, and we'll see you next time.

View More Episodes