#207 September 4, 2023

Kubernetes 1.28 with Grace Nguyen

Hosts: Abdel Sghiouar, Kaslin Fields

Guest is Grace Nguyen. Kubernetes 1.28 release lead and student at the University of Waterloo. Grace had to juggle exams and community work to bring Kubernetes 1.28 to life. We will get to know grace and learn what work went into release, where the theme come from and what’s special about it

Do you have something cool to share? Some questions? Let us know:

News of the week

Links from the post-interview chat

KASLIN FIELDS: Hello, and welcome to the "Kubernetes Podcast" from Google. I'm your host, Kaslin Fields, and--

ABDEL SGHIOUAR: I am Abdel Sghiouar.

[MUSIC PLAYING]

KASLIN FIELDS: This week, we chatted with Grace Nguyen, Kubernetes 1.28 release lead and student at the University of Waterloo. Grace had to juggle exams and community work to bring Kubernetes 1.28 to life. We'll get to know Grace and learn what work went into the release, where the theme came from, and what's so special about 1.28. But first, let's get to the news.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Docker Desktop 4.22 is live. The new version promises faster startup, less resource consumption, optimized memory usage, and better CPU utilization.

KASLIN FIELDS: The CNCF announced the creation of the End User Technical Advisory Board. The new body will be composed of representatives from End User members who are highly engaged in the community and have chaired End User groups or have served on the Technical Oversight Committee. The initial selection process will start at the end of August, with votes planned for September and the launch of the group in October.

ABDEL SGHIOUAR: The Go community released version 1.21 of the popular programming language. The new version comes with changes to the programming language itself, new standard libraries, improved performance, and an experimental port for WASI, the WebAssembly System Interface. More details can be found in the link in the show notes.

KASLIN FIELDS: Configu raised a $3 million pre-seed round for its configuration-as-code platform. The Tel Aviv-based startup aims to solve fragmented application configuration, feature flags, and secrets management by providing a single place to store, manage, and connect environments and systems. The solution follows an open-core model, with an open-source version of the tooling in an enterprise pay-as-you-go cloud-hosted version. And that's the news.

[MUSIC PLAYING]

Hello, and welcome to the show, Grace. Grace Nguyen is a computer engineering student at the University of Waterloo. She's worked on the release team for more than two years and recently led the release of Kubernetes 1.28. Welcome to the show, Grace.

GRACE NGUYEN: So excited to be here.

KASLIN FIELDS: Yay. We're excited to have you. The release episodes are always some of my favorites. I love learning about the new things that are going on. And for you this time, this is an especially exciting time for you, I imagine. The release was around a bit of a busy time, it sounds like. You just finished your exams.

GRACE NGUYEN: The release came out on Tuesday the 15th, and my last exam, which is computer security, was on Wednesday the 16th.

KASLIN FIELDS: Oh my goodness.

GRACE NGUYEN: It was adorable. We had our fun little release party, and we hit the button. And then I hung out for a little bit, and then I had to go study for my exam.

KASLIN FIELDS: Wow. Previously on the podcast, we interviewed Leonard Pahlke, who is also a release lead and was also a student. So I feel like there's a trend here of release leads also being students, which I think is wonderful and exciting.

GRACE NGUYEN: For sure.

KASLIN FIELDS: As I said already, you were the 1.28 release lead. You're a student at the University of Waterloo. Anything else that you'd like to tell us about yourself?

GRACE NGUYEN: I also love to dabble in the SIG security side of things. So they are a great SIG. If you're into security, they are super nice.

KASLIN FIELDS: So that was a fun final exam for you then.

GRACE NGUYEN: Absolutely. My favorite course.

KASLIN FIELDS: Well, that's good. I've heard a lot of good things about SIG security being a very welcoming place and a good place to learn new things. Have you been involved with them long?

GRACE NGUYEN: A little bit since I started, but I am helping with the security self-assessment, so that should be exciting.

KASLIN FIELDS: Yeah, that's definitely exciting. So folks that are listening who are interested in security and might be interested in contributing to the project, you might want to check that out. It sounds like they've got a cool project going on right now. But let's dive into the 1.28 release. And the release theme this time is Planternetes. I love the logo. It's got Monsteras and all kinds of lovely plants in it. What was your inspiration for the theme this time?

GRACE NGUYEN: For sure. So I'm a big houseplants fanatic. I love having plants around me all over the house. And it's also summer in the northern hemisphere, so it really has that spring-summer energy. And I thought it was a really good analogy to the release team but also the bigger open-source ecosystem. We all play a small but critical part in it. And together, we build this wonderful thing.

KASLIN FIELDS: I noticed that in the release blog, it starts out with a little blurb that I thought was kind a general introduction to how the community works. And then it kind of wrapped the Planternetes theme into it by talking about how we all work together to build this thing together. And I thought that was really nicely done. So recommend reading the release blog.

GRACE NGUYEN: A lot of work has gone into that. So yeah, please do.

KASLIN FIELDS: And speaking of the release blog, there are a lot of new features in this release that are outlined in much more detail in the release blog. But would you like to give us a little overview of the release?

GRACE NGUYEN: Yeah, so the thing that I heard the most buzz about this release is the API awareness of sidecar containers. So the sidecar pattern has existed for a long time within Kubernetes, but there hasn't been an official way to implement it. So we have init containers, which is the thing that start and die before your pod comes alive. And now if you specify in your init container, restart policy always, this init container is going to transform into a sidecar containers.

What this means is that your sidecar container is going to live throughout your pod life cycle. And a couple really cool implications from this is-- folks in service mesh told me that they're excited about this-- you can use it to pull updated secrets into your pod. But one that resonated with me is logging. So with the sidecar container, you can have logging of before your pod started and after it dies, which is great.

KASLIN FIELDS: Oh, interesting. And before this update, folks were doing sidecar containers, like you said, but it wasn't something that was officially recognized by the API. There was no real way to create an object within Kubernetes that's like, this is basically a sidecar container. So it sounds like what we've done is we've updated the API to be able to say this init container, it's restarting all the way, so that essentially makes it a sidecar container. So it's not like there's something called a sidecar container in the API.

GRACE NGUYEN: Right.

KASLIN FIELDS: But I guess the API server has a little bit more awareness of how these sidecar containers should work.

GRACE NGUYEN: Right. It's just a different flavor to the init container.

KASLIN FIELDS: Yeah, interesting. And like you mentioned, the logging thing of the sidecar container now being able to take logs as the container is spinning up and as it's dying as well.

GRACE NGUYEN: As the pod that it's sidecaring, coming up, it can kind of log what's going on. And so that will help with debugging.

KASLIN FIELDS: Is that a change with the way that the API is aware of it now? Or were they able to do that before? Do you know?

GRACE NGUYEN: So before with the init container, it can lock your startup, but it will just die once your pod has started because it's an init container. And so now the kubelet will actually only wait for the init container to get started instead of being complete, if you specify that you want it to be sidecar.

KASLIN FIELDS: So it'll definitely make the sidecar logging capabilities a bit smoother, it sounds like.

GRACE NGUYEN: Yeah.

KASLIN FIELDS: Excellent. I've definitely heard some hype about that in the service mesh space as well. So I'm excited about that one. There's a bunch of other new features though.

GRACE NGUYEN: So many.

KASLIN FIELDS: Tell us a little bit more.

GRACE NGUYEN: So one other alpha feature I'm excited about is there's another step in this transition to community-owned infrastructure. So one of the big moves in 1.27 was the transition in the registry. And so now we take a step further, and we are offering folks a way to opt in to community infrastructure for the packages.

KASLIN FIELDS: Interesting. I don't know if folks out there listening are very aware of this shift to community infrastructure, but there's a lot of things within the Kubernetes project that have been supported by the various companies that support the open-source project.

And there's been this shift recently to try to move more of the infrastructure to more Kubernetes and CNCF-owned things. So this is another piece of that where packages can now be part of community infrastructure. Would you describe a little bit more about how that works?

GRACE NGUYEN: Yeah, so you can opt in. So there's a way for you to opt in. We actually have a specific feature block for this feature that folks can look into. But essentially, it's not automatic. You have to opt in. And otherwise, our Google-hosted repositories will continue to work the same way it does. So it's not a breaking change, but you can opt into this new cool thing that we're trying out, and give us feedback.

KASLIN FIELDS: Yeah, and this also reminds me of another thing that I would like to remind our audience about at every opportunity, which is the registry change.

GRACE NGUYEN: A little late now. But yes, please.

KASLIN FIELDS: Yeah, but please. If you haven't changed things from gcr.k8s.io, I think it was before, to registry.k8s.io, make sure you do that because that's a very important part of this community-owned infrastructure change because, obviously, k8s.gcr.io is a Google-owned repository that all of the Kubernetes images used to be hosted on. And now they're hosted on registry.k8s.io, which is more community owned and will be much more stable.

So if you haven't moved things, please move them. We're just getting started with the new features of this 1.28 release, though. One that I noticed in the release blog that I was excited about is ever since I've been talking about Kubernetes, from the earliest days, I remember people asking questions about the versions, because the versions of Kubernetes come out so fast.

So there can sometimes be skew between the control plane, like the brains of Kubernetes, and the nodes, where all of your workloads are running. And traditionally, the way that we explain that is that if you are upgrading your Kubernetes clusters, there can be up to two minor versions of skew between the control plane and the nodes. But that's changing in this release.

GRACE NGUYEN: Yes, we put it at the top of the release blog because we are also excited about this. Yes, there's feedback within the community that upgrading Kubernetes is not the easiest thing.

KASLIN FIELDS: Really?

GRACE NGUYEN: And so now we are allowing a three instead of two-version skew between your control plane and your nodes. Hopefully, that makes it easier to upgrade your control planes first and then your nodes.

KASLIN FIELDS: It is important, though, to note that it is still minor versions. You can't have your control plane and your nodes a whole major version away. And a major version would be like 1.28 to 1.27, right?

GRACE NGUYEN: Yeah.

KASLIN FIELDS: And a minor version is 1.28.1, for example.

GRACE NGUYEN: Yeah.

KASLIN FIELDS: So you've got to go one more dot for the minor versions. And that's what we're talking about here. The control plane and the nodes can be up to three minor versions. So like 1.28.1 to 1.28.3 would be OK.

GRACE NGUYEN: Yeah.

KASLIN FIELDS: Granted, I hope those don't exist yet. Another one that I noticed in there that I'm excited about is recovery from non-graceful node shutdown. And this is mainly a feature that's targeted at stateful workloads, it sounds like.

GRACE NGUYEN: For scenarios where a node fails for some reason that is abnormal. Also this happens on a node that runs Window but also--

KASLIN FIELDS: Windows is its own--

GRACE NGUYEN: Your OS is broken or your hardware fails, and it is not immediately recognized. This is a way to kind of airlift the persistent volumes into new, healthy nodes, where new pods are spawned.

KASLIN FIELDS: Interesting. So does it move the volumes, or does it move the workloads, or does it move both?

GRACE NGUYEN: I'm not 100% sure.

KASLIN FIELDS: I think, according to the blog-- that's OK. We have talked about on the show before that the release lead's job is to have a general overview over everything. So I don't expect you to know all of the details of every one of these. But according to the blog, it sounds like it handles moving the workloads, but there's still an issue with the volumes, that they may not be moved.

GRACE NGUYEN: The workload. It's for pods that is part of the stateful set, so migrating pods, helping the pods clean up better. But what happens with the volumes there?

KASLIN FIELDS: We are literally reading the blog post as we're speaking to you. It says in the blog post that there's a problem with storage. If there are volumes used by the pods, existing volume attachments will not be disassociated from the original. And now shutdown node. So the persistent volumes used by those pods cannot be attached to a different healthy node. As a result, the application running on an affected stateful set may not be able to function properly.

If the original shutdown node does come up, then the pods will be deleted by its kubelet, and new pods can be created on a different running node. If the original node doesn't come up, which is kind of the situation that we're trying to address here, those pods would be stuck in a terminating status on the shutdown node forever. So there's definitely some more detail to read into there on the volume.

GRACE NGUYEN: For sure. So I think the gist of it is that this helps the transitions of the pods from being stuck in this dying node into migrating over to a healthy node.

KASLIN FIELDS: Right. That's a little bit confusingly worded. That is a good point, though. The whole point of this is that it's moving those pods from being stuck on the down node to this new node. So maybe this partially addresses the volume thing. I would recommend that our listeners go and check out the links in the blog post so that you can learn a bit more about that one. It's definitely exciting either way, though, because we're taking stateful sets running on Kubernetes.

And when there is a non-graceful node shutdown, we're doing a little bit more to make sure that those workloads can keep running smoothly. If that's important to you, definitely check out the details. And another thing that I noticed in the blog is that there are a couple new things for jobs.

GRACE NGUYEN: Yeah, so two alpha features for jobs that I want to point out. First is pod replacement policy for jobs. So there is now a new field, but you can specify if you want the control plane to have specific behaviors with the previous pod termination.

So if you want new pods to spawn as soon as it's being terminated or only once the existing pods are fully terminated, that's something you can specify. And then the other thing there is we have another alpha feature for job retry backoff limit. How many times do you want it to retry before it gives up?

KASLIN FIELDS: These are really core bits of functionality for job-type workloads running in Kubernetes, so I'm really excited about these new alpha features. And I've also been learning a lot about jobs myself, personally, recently because of batch workloads, which are closely related to high-performance computing, and AI/ML workloads.

So if you're interested in how AI-type workloads run on Kubernetes, looking into these new features of jobs might be really interesting to you. One other thing that I actually didn't take a note on and we didn't really talk about before this was that I noticed a couple of updates in 1.28 about admission webhooks.

GRACE NGUYEN: Oh, the CEL stuff.

KASLIN FIELDS: There's matching conditions for admission webhooks. And there's also validating admission policies. For when you're creating a CRD, instead of using an admission webhook, you can use this new validating admission policies thing.

GRACE NGUYEN: To me, it's almost a regex situation. There's this language that you can use to specify what you want to allow.

KASLIN FIELDS: I feel like most people in tech hate regex, but I actually love regex.

GRACE NGUYEN: It's really powerful.

KASLIN FIELDS: It's really useful sometimes. And sometimes that's just what you need is you need to be able to find strings that match strings. These are a couple more interesting bits about admission webhooks. So if you do a lot with admission webhooks within Kubernetes, you might want to check out the new features in this release, specifically the validating admission policies, graduating to beta, and the matching conditions for admission webhooks. I think we've covered a good amount of the release with that.

GRACE NGUYEN: So many features. We have 45 total this release.

KASLIN FIELDS: So many. Do you know how that compares to other releases?

GRACE NGUYEN: I think it's about the same.

KASLIN FIELDS: OK, yeah. Still it feels like so many every time.

GRACE NGUYEN: It is a lot, yeah.

KASLIN FIELDS: And normally, with each release, there's also a deprecations and removals blog post. But I haven't seen one of those this time.

GRACE NGUYEN: Yeah, those usually come out in the middle of the release, actually. And we didn't have one because the deprecations and removals we have are quite minor. And Kaslin, you told me those are storage items moving out of tree.

KASLIN FIELDS: Yeah. So I was looking at the deprecations and removals this time. There's only one removal and two deprecations, and they're all about storage plugins, basically. So originally, with Kubernetes history--

GRACE NGUYEN: Back in the day.

KASLIN FIELDS: Yeah. I love doing Kubernetes history lessons. Long, long ago in a galaxy far, far away. I love to say that. The storage plugins for Kubernetes, if you were running a stateful workload, you needed access to storage, there would be all of these different plugins and drivers for working with different types of storage. So at the time, actually I was working at NetApp. So we were very concerned about working with storage plugins and making sure that there were storage plugins to work with our storage.

And the way that was done originally was to create the plugin, and it would be part of Kubernetes, essentially. You get Kubernetes, you get all of these storage plugins built in with it. Over time, we've found that that's not the best way to do these, because things change and shift. And the responsibility for maintaining those storage plugins is generally not with folks who are regular contributors to the project, but with folks who are part of the companies or part of the groups that are making these storage solutions.

So there are all kinds of problems with these storage plugins being part of Kubernetes itself. So over time, we've moved more toward this model of moving those out of tree, as we say. If they're a part of Kubernetes as a project, they're in tree. And if they are outside, they're something that you can add to Kubernetes, than they're out of tree. So all of these deprecations and removals are just that kind of storage stuff that's happening, it looks like. So it's very exciting to me that this release is not exciting in terms of deprecations and removals.

GRACE NGUYEN: So we're keeping Kubernetes boring.

KASLIN FIELDS: Yeah. That's how we prefer it to be. I'm sure lots of folks out there will be relieved to hear that this is a pretty unexciting release in terms of deprecations and removals. So as we said, definitely check out the release blog. You don't need to check out a deprecations and removals blog, because those are just at the bottom of the release blog, and they're pretty minor. And there are still lots of blog posts coming out about the individual features, right?

GRACE NGUYEN: For sure, yeah. Throughout this month, there will be more coming out.

KASLIN FIELDS: Excellent. So that's kind of an overview of the release in general. We covered a lot of different pieces of it. What is the most exciting part of the release for you this time?

GRACE NGUYEN: Honestly, I'm just happy that it came out on time. Right before my last finals. But like all releases, it's such a big effort. And the last two weeks before its release, there's always this fear that we're not going to make it because x number of docs has not been reviewed. And so that was a very real fear this release as well.

KASLIN FIELDS: Anyone out there who has been involved with releasing anything can probably relate to that stress. And it has happened a few times that we've had to delay things a little bit, but it's really exciting that you all hit your target goal.

So the release came out on time, which is very exciting, around the time of your exams, which sounds horrifying to me. But it sounds like it worked out well for you. It's got all of these new features. Basically no deprecations and removals that folks should have to worry a lot about.

So it seems like a relatively low-risk one to upgrade to in terms of deprecations and removals. There's all sorts of other things to consider there. But is there anything in particular that you'd like to point out for folks out there who may be listening and thinking, should I upgrade to 1.28 right now?

GRACE NGUYEN: I feel like the sidecar stuff is really exciting.

KASLIN FIELDS: That's true.

GRACE NGUYEN: And it's one of those patterns that's just so ingrained in the way we operate Kubernetes. So it might be cool to see the sidecar container existence.

KASLIN FIELDS: And if you're relatively new to Kubernetes or you haven't been using sidecar as much, this might be an opportunity for you to pick up a new tool to make your workloads run better in Kubernetes by having these sidecars that can do things, like we were saying, like logging, providing some additional security functionality, that you can add to your applications without having to change the source code of those applications. You can do it outside of them in these sidecars.

GRACE NGUYEN: Yeah. And I don't know if folks know, but especially for features that is an alpha or beta, if you have feedback about how they work or you would like something different or you don't like something, come on into the enhancement repository inside the Kubernetes GitHub Org and leave a comment. That's why they're going through these different phases. We want to get community feedback.

KASLIN FIELDS: That is a great point that I feel like we don't talk about enough. But Kubernetes is open source, folks. If you are a user of Kubernetes who has never contributed before, but you have a lot of opinions about how it works, it's out there for you to make comments on and share your experiences about. And we would love to hear from you. So this is a, like I said, kind of a relatively low-risk release in terms of the number of deprecations and removals and the severity of the deprecations and removals.

And there's some really exciting stuff in here with the sidecars. And if you're running stateful workloads, potentially the non-graceful node shutdown, definitely check out 1.28. And if folks want to learn more about the release, there's a webinar coming up that you all are going to be doing about the release. When is that happening?

GRACE NGUYEN: That is happening on Wednesday, September 6 at noon EDT time, AKA 9:00 AM PT time. I believe we've made a mistake in the release blog that I will put a PR out for surely. But September 6.

KASLIN FIELDS: Yeah, it looks like that was posted wrong in the release blog. But you heard it here first, folks, or maybe not first. But hopefully, you heard it here. And if you heard this in time, Wednesday, September 6 at you said noon.

GRACE NGUYEN: Noon ET, Eastern Time, and 9:00 AM Pacific Time.

KASLIN FIELDS: You can learn more about the release blog from the release team. Who all is going to be on that webinar? Is that going to be you? More folks?

GRACE NGUYEN: Me, the Enhancement lead, Atharva, and our Comms lead, Brad.

KASLIN FIELDS: Excellent.

GRACE NGUYEN: Yeah.

KASLIN FIELDS: Atharva was the what lead again?

GRACE NGUYEN: Enhancement.

KASLIN FIELDS: Wow. He's doing so much good stuff. I've worked with Atharva in the open-source community. It's so good to hear about that.

GRACE NGUYEN: He's great, yeah.

KASLIN FIELDS: Excellent. So check out the webinar for more formation. And one other thing that I wanted to mention. 1.28 is done now, so that means that the planning for 1.29 is ramping up, right?

GRACE NGUYEN: It is in progress. We have chosen a lead so far. So Priyanka will be leading 1.29. And our wonderful 1.27 lead, Xander, will come back and become our emeritus advisor.

KASLIN FIELDS: Excellent. And folks might not be aware of the concept of emeritus. I don't know how common that is everywhere. But within the release team and a lot of different places throughout the open-source project of Kubernetes, we have this concept of emeritus, where someone who has been a leader in the past and has moved on from that role provides guidance to the person who is currently in the lead position. So Xander we've interviewed before for 1.27. He'll be the emeritus lead.

And I'm really excited about Priyanka Saggu, who will be the 1.29 release lead. She also just became a tech lead of the Special Interest Group for Contributor Experience, which I am a co-chair of. So I've been working more closely with her lately, and she's been doing lots of great work. So I'm excited to see the 1.29 release. And with that ramping up, there will also be a shadow application opening up soon, right?

GRACE NGUYEN: That is right. So we've just chosen the lead for now. And within the next couple weeks, keep an eye out for the 1.29 shadows applications. We were talking about this, Kaslin, but it's such a great way to get involved in the community. And folks from the release team go on to join all sorts of different SIGs within Kubernetes.

KASLIN FIELDS: Yeah. One common question I get from folks who are starting out contributing to Kubernetes is I started out over here, but I'm really interested in this other thing. Can I go do that? We strongly encourage cross-pollination within the Kubernetes community. If there's something you're interested in, definitely go for it.

The release team is a bit unique in that it has these very well-defined leadership and shadow roles. So there will be specific leads that will be chosen for the different parts of the release, like Communications and Bug Triage.

GRACE NGUYEN: Oh, actually we're merging our Bug Triage and CI Signal team. So they're going to be one team in 1.29, so that should be interesting.

KASLIN FIELDS: So CI Signal and Bug Triage are merging. What did those two do in the past, and how is that merger happening?

GRACE NGUYEN: Yeah, so CI Signal is the folks that look at tests. And if it fails, flag or pull in folks. And those are also the folks who give signals before our cut. And then the Bug Triage folks is the one that keep an eye on the KK boards and just see if there's bugs open and PRs that need to be closed before we release. And with the shift from the Google Sheets-- do you remember those days, Kaslin? It was not nice.

So we had this giant Google Sheet for Enhancements, and everyone operated on the same sheet. So we don't do that anymore, thankfully. But that has made it really easy to track bugs because the GitHub project board is native to GitHub. So we can make filters and those sorts of things. And so with that, the Bug Triage team is now being merged with the CI Signal team.

KASLIN FIELDS: Yeah, the GitHub project boards have been getting more handy in all sorts of ways for me lately. Sounds like those two pieces work really well together. We've got your CI of continuous integration of all of the changes that are going into the release and checking to make sure that that's all going well, and then the Bug Triage of making sure that all of the bugs that are going into the release are going well. So they kind of work really well together. It makes sense to merge them.

GRACE NGUYEN: We're calling them the Release Signal team, which I think is a good name. They give signal for how well our release is doing.

KASLIN FIELDS: So that's a change to the roles that help to make this release happen. If you are interested in getting involved with the release, this shadow application that we've been talking about will be your opportunity. Because these roles are so well defined and because the release is so visible and such an interesting part of the project, a lot of people are interested in getting involved with it.

So there is a process to apply to become a shadow. And not everyone gets selected. I know lots of folks who have applied multiple times and got it eventually or eventually gave up. But sometimes, you have to apply a couple of times because there's just so much interest in it.

GRACE NGUYEN: Yeah, we do keep an eye out for folks who have applied a couple times.

KASLIN FIELDS: Yeah. So if you're interested, definitely get started applying. Don't be intimidated if you don't get it the first time. Apply again. And learn a little bit more about the different roles that go into the release so that you can apply for a specific shadow position. Because when you apply to be a shadow, you'll ask to be part of a specific area of the release, like Comms or Release Shadow. What did we call it again?

GRACE NGUYEN: Release shadow

KASLIN FIELDS: Release shadow

GRACE NGUYEN: nd Release Signal.

KASLIN FIELDS: Signal. There we go.

GRACE NGUYEN: Yeah.

KASLIN FIELDS: Excellent. So I hope folks out there enjoy learning about 1.28 and are excited about 1.29. Thank you so much, Grace, for your work on 1.28 and for being here with us today.

GRACE NGUYEN: Thank you for having me.

KASLIN FIELDS: And you can also find grace on Twitter, @gracenng. Cool. Thank you so much for being here.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Well, Kaslin, thank you very much for the interview. That was pretty cool.

KASLIN FIELDS: Yeah, always cool to talk about the new release.

ABDEL SGHIOUAR: Yeah, it's actually very interesting that the community keeps bringing in people who are still students. So we had a chat with Leonard and now Grace.

KASLIN FIELDS: Yeah. I love seeing students doing awesome things in the community.

ABDEL SGHIOUAR: Yeah, it's pretty cool to have a very early start to get to understand how things work, actually. It's very interesting.

KASLIN FIELDS: In the real world.

ABDEL SGHIOUAR: Exactly. So yeah, let's chat a little bit about this. On the surface, it sounds like a boring release, but there's actually a lot of interesting things going on in 1.28.

KASLIN FIELDS: Yeah, you would think a release that doesn't even have a deprecations and removals blog would not be that exciting, but there's quite a lot of things going on in here.

ABDEL SGHIOUAR: Yeah, I would argue there is one of the most important features coming, which we'll talk about in a bit. But let's start with the theme. How do you pronounce that name? What is the theme?

KASLIN FIELDS: Planternetes. That's how I would say it.

ABDEL SGHIOUAR: It's like a word game.

KASLIN FIELDS: Yeah. Fun fact. I was just working on some graphic design for the Contributor Summit that's going to be happening in Chicago for KubeCon this year. And the logo that I made for that I referred to as Pizzanetes.

ABDEL SGHIOUAR: Yes, I saw that one. Yeah.

KASLIN FIELDS: Yeah, so when saw Planternetes, I was like, yes.

ABDEL SGHIOUAR: You felt validated.

KASLIN FIELDS: You get me.

ABDEL SGHIOUAR: Yes.

KASLIN FIELDS: It's a community thing.

ABDEL SGHIOUAR: Yeah, it's pretty cool. All right. So let's go through some of the interesting features, starting with I think the one that a lot of people are writing about, the native sidecar containers.

KASLIN FIELDS: Yes. I've definitely heard about that, but I've never dove into it. So this was very interesting to hear a little bit more about.

ABDEL SGHIOUAR: I think we covered it during the episode with Xander. The cap, the Kubernetes improvement plan. I don't remember the name. It was seven something, which is the most voted-for cap slash issue on GitHub.

KASLIN FIELDS: Yeah, that's right.

ABDEL SGHIOUAR: Right. It came--

KASLIN FIELDS: We did talk about that, didn't we?

ABDEL SGHIOUAR: Yeah, and I think we chatted quickly about how important this is, but we didn't really know how it's going to end up being resolved. And now it is resolved in a very interesting way.

KASLIN FIELDS: Yeah. I was kind of hoping for something that would use the term "sidecar" so it's more clear that this is meant to address this. But it's just a change to init container, so I feel like that's going to be a little confusing for folks.

ABDEL SGHIOUAR: Correct. So let's try to clear the confusion. There are a bunch of articles and a bunch of Reddit threads that we found that we're going to have in the show notes, so people can dive more into it.

But effectively what the change is, it's adding a field to init containers that if that field is set to where the field restarts, right, restarts policy, which if it's set to always, then Kubernetes will consider that as an init container, as a sidecar, although it's an init container, technically. So that's the change, right? As a reminder, I think we have to go a little bit back.

The problem with sidecars in Kubernetes is that there is no native way in Kubernetes in which you can define in which order containers start. So if you have something like a sidecar that does login or metrics, or login as in logs, not login as in log in, or sending metrics somewhere or maybe an invoice sidecar in the case of service mesh, you might end up in a situation where your application will be blocked from accessing the network until the sidecar becomes available.

KASLIN FIELDS: Yeah, that's been a long, ongoing issue.

ABDEL SGHIOUAR: Yes. The other problem also is that if you have a job with a sidecar, then your job might end processing, so it may terminate, but the sidecar will not because there was no native way of understanding, OK, well, when the main container finishes its processing, also kill the sidecar. And the pod gets stuck in some sort of weird state, right?

KASLIN FIELDS: I've never actually looked into how people have been implementing these, but now I'm curious.

ABDEL SGHIOUAR: So there have been a couple of ways. One of them was to add dependencies using health check props, right? So you make your main container, health check, the sidecar container. But then this require you to actually sometimes implement even logic in the code for the termination part, right? So to say if the health check for the sidecar container fails, or the other way around, make the sidecar fail or terminate when the health check for the main container-- basically you create dependency between the containers, right, yourself. That's essentially how people solved it, right?

KASLIN FIELDS: You'd have to do it manually, and you'd do it with health checks, right?

ABDEL SGHIOUAR: Correct.

KASLIN FIELDS: That makes sense.

ABDEL SGHIOUAR: And so now what they did is they added this field, which means when you set the field on an inits container, the kubelet treats it as both an init container and a sidecar, and the advantage of init containers is that the kubelet has already the logic to start the init containers first, wait for them to be ready, then start the main container of the application, right?

KASLIN FIELDS: So it makes sense that it's combining the two concepts.

ABDEL SGHIOUAR: From what I understand it, reading the Reddit thread, as it was explained by Tim Hockin, it's basically a non-invasive way to try to come to a better solution without basically introducing a whole new schema with sidecars. That's essentially the TLDR, right?

KASLIN FIELDS: Yeah. Bloat of the Kubernetes API is an issue.

ABDEL SGHIOUAR: Exactly. I mean, to a large extent, I understand it because designing a new type, which would be sidecars, I am for the the idea of putting types between specs, which mean new schema basically in the pod spec, which would be for the sidecar, would be, I guess, a massive undertaking because you will have to redesign the whole thing from scratch. So I guess it's a better solution than how it used to be before, but it's probably the beginning of something better, bigger. We'll see.

KASLIN FIELDS: Yeah, we'll see how folks use it and how actual users respond to it.

ABDEL SGHIOUAR: There was actually a couple of interesting articles. We're going to add them to the show notes. There is one by Buoyant-- I hope I'm pronouncing that right-- that explains this more in detail. And then there was actually also a link on the Istio documentation that explains how Istio with sidecars, with this new feature, will be better. So then yeah, if you're listening to this, you can just go read those.

KASLIN FIELDS: And of course, the thread with Tim Hockin.

ABDEL SGHIOUAR: Yes, the Reddit thread.

KASLIN FIELDS: I still need to catch up on as well.

ABDEL SGHIOUAR: Yeah, yeah, yeah. Yeah, then moving on. I guess the community-owned package repositories.

KASLIN FIELDS: Yes, that is, of course, a huge thing that we've talked about on the show before.

ABDEL SGHIOUAR: Yes, so that's not to confuse with the registry because the registry is for containers. This is for the Debian and RPM packages for some components that Kubernetes need. So the community is basically moving from Google-owned repositories to community-owned repositories. That's the tl;dr there.

KASLIN FIELDS: Yeah, kind of an across-the-project thing. Anywhere where they can move it to community-owned things, they're trying to do that.

ABDEL SGHIOUAR: Yeah, it's certainly a theme.

KASLIN FIELDS: Yeah.

ABDEL SGHIOUAR: Yeah.

KASLIN FIELDS: Overarching theme.

ABDEL SGHIOUAR: Yeah, I think that the package repositories more particularly was problematic in the past because releasing stuff on Google-owned repositories require somebody from Google, like a Google employee, to be involved. So now they remove the dependency, which is good.

KASLIN FIELDS: Yeah, and well, we've talked about all of the issues with the registry before.

ABDEL SGHIOUAR: Exactly. I don't think we need to rehash that.

KASLIN FIELDS: All sorts of issues for the project with that, so make sure that you're using the new one.

ABDEL SGHIOUAR: Yes. Actually the guide is quite-- the page that described this change is quite extensive. It shows how you can migrate already, how you can fall back, if you need to. And the Google repositories will remain in place for the foreseeable future. They're not going to disappear any time soon.

KASLIN FIELDS: Yeah, but definitely check out the documentation if you want all of the juicy details on why those changes are happening and how it's going to help the community to move those. I was also really excited about the skew change because I've been confused by that basically my whole life in Kubernetes of the two minor version thing. And now it's three minor versions, so that's good. But we were talking about it before the show, actually. And you looked up the documentation.

And I had always been told that for some reason for the skew, the minor version is not what we normally think of as the minor version. Because normally, major version is 1. Minor version is 28, in this case, 1.28. And then the patch version is going to be the dot 1. But for some reason, I've always been told that for the skew version, the patch version is what they mean by minor version, which does not make any logical sense now that we think about it.

ABDEL SGHIOUAR: Yeah, we looked up the documentation. It's actually the minor, right?

KASLIN FIELDS: Yeah, we looked up the documentation. And it does, indeed, say pretty clearly that it is like 28. And so it's 28 to 25, thereabouts.

ABDEL SGHIOUAR: Exactly. So they're expanding support from n minus 2 to n minus 3, essentially.

KASLIN FIELDS: Yes.

ABDEL SGHIOUAR: Yeah.

KASLIN FIELDS: So excuse my confusion.

ABDEL SGHIOUAR: Yeah, I'm wondering, or probably if they're speculating, I think what's coming would be the CLI as well because the Kubernetes CLI supports the same skew policy, right?

KASLIN FIELDS: I hadn't thought about separating those two out. Yeah, I kind of figured they would just go along together.

ABDEL SGHIOUAR: Yeah, it's all about the client. I was wondering the same when I was hearing the interview. Can you use kubectl 1.24 with 1.28, which are four versions away from each other, right?

KASLIN FIELDS: Yeah, that is a good question.

ABDEL SGHIOUAR: We'll see. I'm suspecting they will extend this policy also to the CLI because it doesn't make any sense that it's only between the control plane nodes and the node components, right?

KASLIN FIELDS: Yeah, I mean, I guess the kubectl is a separate installed component, so they will have to tell us explicitly if that is changing. We'll have to keep an eye out for that.

ABDEL SGHIOUAR: We'll probably hear from the CLI SIG.

KASLIN FIELDS: Yeah, I haven't heard anything on that, but we'll keep an eye out.

ABDEL SGHIOUAR: Yeah. Yeah, and there was a bunch of interesting things. I think, for me, the non-graceful node shutdown, that's also another confusing one.

KASLIN FIELDS: Yes. We were a little bit confused about it in the interview, and we've looked it up a little bit, right? Well, you have.

ABDEL SGHIOUAR: Yeah, I mean, I guess I did some research. It's confusing in its wording. But basically what it is, essentially it's a new capability or a new flag that allows you to taint a node with a specific taint, which would force the node or force the scheduler, in this case, to evacuate the pods or reschedule them on a new node and reattach the volumes when a node falls in a non-graceful shutdown state, which is basically the node just disappeared, and we don't know why, right?

KASLIN FIELDS: So after the node shuts down non-gracefully, you would set this taint, and then it would move everything that was on that node?

ABDEL SGHIOUAR: Correct. So you would set the taint. Yeah. So if the feature is on, because you have to have the feature on in the API server, right?

KASLIN FIELDS: OK, yeah.

ABDEL SGHIOUAR: Then you would go and taint. My understanding of why they implemented this way, why it's not automatic, right, it's because as an admin, you want to check that your workloads are not impacted, right? You don't want the cluster to just suddenly move stuff around because the node disappeared.

KASLIN FIELDS: Yeah, and there's also the existing timeouts of retrying things Kubernetes has to deal with. So you'd want to do this manually, probably.

ABDEL SGHIOUAR: Exactly. So it's a capability that you can enable manually to basically force the pod to be rescheduled somewhere else and then force the volumes to be reattached. And this is, of course, only-- I think it's only useful or important in the context of stateful workloads, because for stateless workloads, you don't care.

KASLIN FIELDS: Yeah. Very interested in the various things going on in the project right now to try to make stateful workloads smoother.

ABDEL SGHIOUAR: Yes, more native.

KASLIN FIELDS: Kubernetes. Yeah.

ABDEL SGHIOUAR: Yeah, it's going in that direction.

KASLIN FIELDS: Yeah, the way that I like to think about stateful workloads on Kubernetes is really about dependencies. A lot of folks think about it as a storage thing. But I think really a stateful workload is just one that has a bunch of dependencies outside of Kubernetes most likely.

So Kubernetes has some weirdness with making sure that your workloads can work with all of those other things outside of Kubernetes and making sure that all of that goes smoothly, even on failure states. So I'm excited about the changes that are going on to the project to help those types of workloads because a lot of that was not known when Kubernetes was created.

ABDEL SGHIOUAR: Yeah, yeah, I mean, it's all--

KASLIN FIELDS: They kind of focused on a simpler case with stateless.

ABDEL SGHIOUAR: I mean, it's kind of implemented in a fail-safe mechanism, which is give people the capability to do these migrations without doing it in an automated way. So you don't end up with a situation where your workloads moved, and you don't know why. Now you know that they're going to be in a state of non-graceful shutdown until you forcefully, manually move them.

KASLIN FIELDS: Yeah, which is very helpful for certain types of workloads, especially when they're connected to other things outside of the cluster. And it doesn't know where they went.

ABDEL SGHIOUAR: Exactly.

KASLIN FIELDS: Yeah.

ABDEL SGHIOUAR: Yeah, and a bunch of other things.

KASLIN FIELDS: There's so much.

ABDEL SGHIOUAR: There's a lot of interesting stuff. I think we can talk quickly about the pod replacement policy. Maybe that one is like-- it's kind of confusing. So I think the overarching theme of this release is confusing because you have to really read stuff over and over again to understand what's going on.

KASLIN FIELDS: Yeah, also the first iteration of docs doesn't always get it from the perspective of someone who needs to use it, I would say, because the people who build something are not necessarily the people who use it.

ABDEL SGHIOUAR: True. So actually I think this is probably going to be a recommendation for myself because when I was reading the Reddit thread about the sidecar stuff, somebody who is an author of that page went onto Reddit and said I'm one of the authors. Please let me know what's going on. And then people were giving them feedback. And they were updating the documentation as they were receiving feedback on Reddit. So maybe going forward, just post stuff on Reddit, and let people tell you if they understand what you meant to explain to them.

KASLIN FIELDS: Maybe that should be part of the release process.

ABDEL SGHIOUAR: Yes, post stuff on Reddit.

KASLIN FIELDS: Post this new feature on Reddit to get feedback.

ABDEL SGHIOUAR: Yeah, because I think that what a lot of people probably don't realize is you write stuff in a certain language, which you understand if you are a native speaker, English speaker. But then it doesn't resonate the same way to somebody who doesn't speak English natively, right?

KASLIN FIELDS: Another fun fact about the community. There actually used to be a-- I think it was called SIG Usability. No, SIG Usability still exists. I think it is SIG Usability. But anyway, there was a group within Kubernetes that was essentially user research. I went to a session at a KubeCon where this SIG showed off some of their findings where they've done user studies.

So you would think that a group like that would do these kinds of things for each release. That SIG actually, or working group or whatever it was, doesn't actually exist anymore. So it's kind of hard to get the user perspective in on these things. So if you have feedback to share, share it on Slack, share it on Reddit threads you see. The folks building this stuff need it.

ABDEL SGHIOUAR: Yeah, tell people how they could fix and improve the documentation. I think it's important.

KASLIN FIELDS: Yeah, and the folks who work on this, of course, care about it a lot.

ABDEL SGHIOUAR: Of course. Of course. Yeah, I mean, it was even a shout out from Grace on the interview. I think it was the first time we interviewed a release lead that was, hey, please tell us on Slack what you think. Give us feedback. Yeah, that was pod replacement policy. I think pretty straightforward. It's basically telling the kubelet how fast do you want it to bring a new pod when an existing pod dies.

Because you might have situations where pod names overlaps or where you want the pod to be terminated, AKA not in READY state. You want it to disappear before the new pod comes to replace it, essentially. This is alpha feature, by the way. Just FYI.

KASLIN FIELDS: Yeah. And also jobs is so relevant right now.

ABDEL SGHIOUAR: Yes. Yeah, it's for jobs. Yeah, correct.

KASLIN FIELDS: Yeah. Jobs is a style of workload that's very important for AI/ML, like I said in the episode. Being able to do batch processing and AI/ML workloads are a lot of work to spread across machines, so jobs come in handy.

ABDEL SGHIOUAR: Yeah. Yeah, I mean, I just realized for such a boring release, we spent 17 minutes discussing this. There's a lot of stuff going on.

KASLIN FIELDS: Thanks for sticking with us.

ABDEL SGHIOUAR: Yeah, I hope this was helpful to explain some of these changes that are coming. Yeah, I think I would like for us to find some time to go back and do Twitter Spaces, and then we can discuss this more in details with people because I think having other people's perspective into these changes are important.

KASLIN FIELDS: And I was going to say if you have opinions on these changes, if you want to give some feedback about 1.28 or about the podcast, please let us know. You know where to find us. We say it at the end of every episode. We would love to hear from you too.

ABDEL SGHIOUAR: Yes. And also, if you want to get involved, at the time of reading this, I think the team will be assembling 1.21. So there is opportunities to shadow or to help or to be involved.

KASLIN FIELDS: 1.29.

ABDEL SGHIOUAR: 1.29. Sorry. Thank you.

KASLIN FIELDS: Yes.

ABDEL SGHIOUAR: Yeah, numbers.

KASLIN FIELDS: There will be shadow opportunities. Keep an eye out for that.

ABDEL SGHIOUAR: Yeah, so keep an eye. Look at the links in the show notes. Read the docs. And yeah, that's it. Thank you very much for the time, Kaslin.

KASLIN FIELDS: Yeah, thank you, Abdel.

[MUSIC PLAYING]

That brings us to the end of another episode. If you enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on social media @kubernetespod or reach us by email at <kubernetespodcast@google.com>. You can also check out the website at kubernetespodcast.com, where you'll find transcripts and show notes and links to subscribe. Please consider rating us on your podcast player so we can help more people find and enjoy the show. Thanks for listening, and we'll see you next time.

[MUSIC PLAYING]