Kubernetes Podcast from Google: Episode 131 - Kubernetes 1.20, with Jeremy Rickard

#131 December 8, 2020

Kubernetes 1.20, with Jeremy Rickard

Hosts: Craig Box, Adam Glick

The final — and raddest — Kubernetes release of 2020 is 1.20. This week, Craig and Adam talk to its release team lead, Jeremy Rickard from VMware. Jeremy talks about migrating to newer Kubernetes versions, sooner or later; what was added, what was deprecated, and what that really means; and what happens when you Google your own nane.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

Ready Player Two

News of the week

Links from the interview

Transcript

Show full transcript

ADAM GLICK: Hi, and welcome to the "Kubernetes Podcast from Google. I'm Adam Glick.

CRAIG BOX: And I'm Craig Box.

[MUSIC PLAYING]

CRAIG BOX: Welcome back, my friend. How was your hiatus?

ADAM GLICK: Oh, it was lovely. Got to spend time with a small group of family, given that most of us are quarantining. So we couldn't really get the full family together. But it was nice to unplug and also spend a little time doing some of the things I haven't had time to do, like play a couple of games and read a book-- so not bad. How about you?

CRAIG BOX: I've been under lockdown for most of mine. So I'm well overdue a haircut at this point.

ADAM GLICK: Comes through on the podcast.

CRAIG BOX: It does. Tell me about the book.

ADAM GLICK: I read "Ready Player Two." For those that have read "Ready Player One," it's--

CRAIG BOX: --a sequel?

ADAM GLICK: Yes, indeed. It's more of the same. If you love his formula and his writing, you'll probably enjoy it. I enjoyed reading through it. It's a little more focused on pop culture, music, and movies as opposed to video games, as the first one was-- but a really interesting view. And for some of those things that I have some knowledge and experience with, I was pretty impressed at the depth of the research that he put into the things that happened in it without giving anything away. It's a fun read. And if you grew up in the '80s and '90s or are familiar with the pop culture of that time, it will be comfort food in written form.

CRAIG BOX: If you read the first book, is it possible that you could be giving anything away?

ADAM GLICK: Well, you can always give away the ending. He's not much of a mystery writer. It's more of a fun kind of taking you down the path.

CRAIG BOX: I've heard it's more a rehash of the first book.

ADAM GLICK: I'd say the plots are different. I mean, you've got a different enemy in this one. You've got a different set of challenges to overcome, a different set of pop culture references to work through. If you like the formula of the first one, the second one doesn't deviate much from it.

CRAIG BOX: It sure does sound like the same formula.

ADAM GLICK: It is.

CRAIG BOX: And that's perfect.

ADAM GLICK: [LAUGHING] If you liked the first one, check out the second. If you didn't, probably a pass.

CRAIG BOX: Well, do check it out. But in the meantime, shall we get to the news?

ADAM GLICK: Let's get to the news.

[MUSICAL PLAYING]

CRAIG BOX: Kubernetes 1.20, the final release for 2020, is out. Dubbed "the raddest release", a lot of feature work has landed in 1.20. There are 16 enhancements entering alpha, with 15 graduating to beta, and 11 to stable. Highlights include volume snapshot operations going GA and kubectl debug moving to beta. Major revisions also landed for IPv6 dual-stack support and cron jobs in preparation for their moving to new stages in the upcoming release.

ADAM GLICK: With the news of Dockershim support in the kubelet being deprecated Kubernetes 1.20, a bit of concern ran through the community last week. You'll hear more about it in the interview later in the show. But for anyone who needs continuing support, Mirantis and Docker have agreed to partner to maintain the shim code standalone outside of Kubernetes. This means that Mirantis will provide a commercially supported Docker engine version and that it will be CRI-compliant.

The two companies stated they plan to offer a product that will work just like before, just with the Dockershim being external to the Kubernetes open source release. They also stated that they are targeting this new shim to pass all conformance tests so people can continue to use their existing workflows and tools. The shim will ship with Docker desktop and in Mirantis Kubernetes Engine.

CRAIG BOX: etcd, the storage engine underpinning Kubernetes, has graduated to a top-level CNCF project. The move comes after two years in incubation and reflects the project health and maintainership, infrastructure, security, and governance. Learn more about the history of etcd in our interview with its co-creator, Xiang Li, in episode 95.

ADAM GLICK: Pushkar Joglekar of the CNCF SIG Security, has announced the release of a cloud native security white paper. The paper talks through how to think about cloud native security broadly, and not just with Kubernetes. It's built around a seven-step runtime acronym for recommended practices, with the R standing, conveniently, for Read the white paper. More details, including what the other six letters stand for, can be found in the associated blog post.

CRAIG BOX: Version 1.8 of the Istio service mesh is out, bringing improvements to installation and upgrade, especially in the areas of multicluster and VM-based workloads. A new DNS proxy mode simplifies name resolution for services outside the Kubernetes environment but inside the mesh. And new tooling makes it easier to debug and file bug reports from the command line.

ADAM GLICK: Kuma, another Envoy-based service mesh, is now generally available, with its maintainers at Kong announcing that they have released version 1.0. Meanwhile, Linkerd service mesh posted a blog answering the frequently asked question of why they chose not to use Envoy as their proxy.

CRAIG BOX: The last of the big provider conferences for the year, AWS re:Invent was bin-packed with container news. In 2021, Amazon's Elastic Container and Kubernetes Services will be able to run in your data center in the form of ECS and EKS Anywhere. Powering the latter is the EKS Distro, a public Kubernetes distro pegged at the same component versions used by EKS and now open source.

The cloud version of EKS gained add-ons, a new console, and spot instant support. Serverless platform Lamda gained the ability to use container images. Proton, a new templating system for container deployment, was announced. And the public container registry pre-announced in November is now available.

ADAM GLICK: If you want your Kubernetes anywhere right now, Google Cloud has released Anthos on bare metal to general availability. With this version, you can bring your own OS running on select versions of Red Hat Enterprise Linux, CentOS, or Ubuntu. Deployments can be done in standalone configurations for scenarios like edge computing with as few as two nodes required, as well as in multi-cluster configurations, offering centralized control and automation.

CRAIG BOX: Consolidation continues in the cloud native space with IBM announcing that it is buying Instana, an application performance management and observability start-up. IBM plans to make the Instana product available on multiple clouds and on premises, and it even calls out support for IBM Z-series mainframes. The purchase price wasn't disclosed.

ADAM GLICK: Opstrace has launched early access to its open source observability platform. The platform looks to provide the power of Prometheus and Grafana with the ease of use of a hosted service like Datadog. The project exposes a Prometheus API that's backed by the Cortex project. You can point your existing Prometheus, Fluentd, or Promtail instances at Opstrace to get going. A quick-start guide is also provided in the launch announcement.

CRAIG BOX: Weaveworks is now bringing this signature GitOps model to any Kubernetes cluster with the release of Weave Kubernetes platform 2.4. The new release lets you manage existing clusters as well as clusters provisioned by their tooling and adds multi-tenancy and application portability with team workspaces. Both features, of course, are powered by Git.

ADAM GLICK: Spectro Cloud, which launched in March of this year, has announced their Kubernetes as a service is now available in private clouds on premises and on bare metal. The new release is meant to bring their managed services to customers wherever they need to run.

CRAIG BOX: Now that Kubernetes 1.20 is out, you might notice that its API reference docs are substantially easier to read. That's down to Phillipe Martin, who just completed a project to improve them in the Google season of docs. Resources are now categorized under headings such as workloads and services, reflecting how they're actually used by operators. You can read all about how this work was done in the post to the Kubernetes blog.

ADAM GLICK: The CNCF is running its latest cloud native survey in Mandarin to help get a broader set of feedback and to continue to gain insights into the growth of cloud native technologies in China and beyond. The survey is now live. And you can find a link to the survey in the show notes.

CRAIG BOX: Finally, way back in episode 32, we talked to David Anderson, the author of MetalLB and then an SRE at Google Cloud. Now at Tailscale, Dave has used his experience building for, managing fleets of, and operating services on Kubernetes to write an interesting treatise on what he would do if he could start it all over again. Pods would become mutable. All resources would be version-controlled. VMs would become first-class citizens. And the system would be secure by default. A boy can dream. Unfortunately, he also wants it to be IPv6 only, which will probably ensure that this future never actually comes to pass.

ADAM GLICK: And that's the news.

[MUSIC PLAYING]

CRAIG BOX: Jeremy Rickard is a staff software engineer at VMware working on an internal Kubernetes platform. Before that, he worked at Microsoft and contributed to the service catalog, virtual kubelet, and helped maintain the CNAB spec and the CNAB tool Porter. Welcome back to the show, Jeremy.

JEREMY RICKARD: Thank you.

ADAM GLICK: You're only the second guest who's come on this podcast twice. And when you were last on the show with Ralph Squillace back in July of 2019, we were talking about the CNAB pieces. Since then, you've moved both projects and companies. But through it all, you've stayed part of the Kubernetes community. Can you share what it's like, as a community contributor, to make such a big change and how it's affected the contributions to the projects you were working on?

JEREMY RICKARD: Sure. There's an old Kelsey Hightower tweet where he said, "Same team, different companies." For me, that's been really true for my Kubernetes contributions. When I made the decision to come back to VMware, I specifically made sure that doing open source upstream contributions would still be OK. And VMware is very supportive of that, both in terms of doing things that are related to your day to day job, but also just any kind of personal contributions that you want to do that align with things that you want to do for personal growth.

It's been really good for me along the Kubernetes lines. Microsoft, with CNAB, specifically Porter, the tool that I was working on before I left, is also pretty open to community contributions. I've maintained my maintainer status on Porter. I've helped do reviews and some of those things since leaving. VMware's, again, been supportive of those things. It's been pretty good, I think.

CRAIG BOX: Did you move location as part of the move between companies?

JEREMY RICKARD: I didn't. I've been in the same spot for quite a while. Before going to Microsoft, I was at VMware. So this is my boomerang opportunity. I actually worked in my basement for VMware. And then I went to work for Microsoft remotely and stayed in my basement. And now I've come back. And I'm pretty much in the same spot.

CRAIG BOX: It's been a wee while since we've seen you, Jeremy. And I have to say that you're a lot beardier than you were last time. Is that a lockdown thing? Or is that a Colorado mountain man thing?

JEREMY RICKARD: It's both. I think it comes and goes. I'll get to the point where it's too long, and I just can't take it anymore. And I shave it off. And then it comes back. And I grow it back because I realized that I really don't like my face totally clean-shaven. I've kind of vacillated back and forth between having a long beard and a short beard, long beard, short beard.

ADAM GLICK: What's the appropriate length? Where, between 5 o'clock shadow and sea captain, would you think is appropriate?

CRAIG BOX: ZZ Top.

JEREMY RICKARD: Yeah, probably not ZZ Top. Maybe a good inch or two is proper, for me at least. Beyond that, it gets like, how do I part it? How do I style it? There's too much upkeep for it.

CRAIG BOX: Well in the more clean-shaven times back in episode 61 when you were working on CNAB-- first of all, how's Ralph?

JEREMY RICKARD: Ralph's great. One of my biggest regrets of leaving is not having that day to day interaction with Ralph on Slacker teams. Ralph's a great person. If anybody hasn't followed him on Twitter, he's a great person to get humorous bits from, and funny takes on tech, and wise takes on tech, good wine recommendations.

CRAIG BOX: You mentioned you're still a maintainer on that project.

JEREMY RICKARD: Mm-hmm.

CRAIG BOX: What did the hand-off process look like? Did you have someone else come on from Microsoft to take on day to day responsibility when you left?

JEREMY RICKARD: Yeah, a little bit of that. We had another person joining the team. And he was very focused on doing Porter things that were specific to Azure. Porter is a generic CNAB building tool. It's pretty opinionated. And it has this ability for you to provide mix-ins, or little pluggable pieces of functionality. And his original scope for coming onto the team was to build some of these things. He just kind of picked up some of my additional maintainer responsibilities.

Also, some people in the community have come by. Porter's grown from being a thing that Carolyn Van Slyck and I worked on together as an initial piece to having a much broader community-based contribution set.

ADAM GLICK: I'm sure you're not the first person who has been an open source contributor and changed roles. But can you share any insights and tips on how that transition works and best ways to make sure that projects move forward even if what your contribution or your part of that contribution shifts?

JEREMY RICKARD: It's not unlike changing jobs in general. When you're leaving and making the change like that, having some sort of transition in mind and a plan to make sure that it's successful is pretty important. As I was making the change from Microsoft to VMware, identifying things that were outstanding that I needed to finish and get those things over the line was pretty important. Documenting anything, opening PRs to the repo or issues on the repo to capture that stuff, assuming that I was going to go away forever, was pretty important. And I think that's a good strategy for people to pick up when they're making that kind of change.

CRAIG BOX: So it was almost a year ago that you made the move to VMware.

JEREMY RICKARD: Yeah, almost a year ago.

CRAIG BOX: We said up at the front, you're working on an internal Kubernetes platform. In their time, VMware has made a lot of noise around their external platform, Tanzu.

JEREMY RICKARD: Mm-hmm.

CRAIG BOX: What's the difference between the platform you work on and what the public see?

JEREMY RICKARD: The platform I work on is really suited and targeted towards the VMware SaaS team. VMware is going through this transition of shipping mostly on-premise software to doing more subscription SaaS-based things. The big one where the genesis of this project came from was supporting the VMware cloud on AWS project, running your VMware stack on AWS infrastructure, part of the VMware Cloud. Ideally, this thing could eventually be on Google infrastructure, or on Azure infrastructure.

As part of that, we started identifying-- originally, this team was really chartered with developing a lot of shared services-- so things like billing, identity management. We needed to figure out where to run those things. So the team that I'm on now is actually the team I was on before. We originally decided to run Mesos and ran Marathon as our container scheduler.

CRAIG BOX: Sensible choices, once upon a time.

JEREMY RICKARD: Yeah, this was 2016. Kubernetes hadn't really fully emerged. The writing was on the wall. The container wars were still going on. We started making the transition to Kubernetes. And that's what really got me interested in upstream things. I think the community around it was pretty vibrant, and really interesting, and welcoming. Really interested in Service Catalog. We were thinking about using it internally at VMware for the offering we were building out.

The platform we have now-- your question earlier was, how does it relate to Tanzu? Because this thing predates the Heptio acquisition and a lot of those things, and it's been in production for about two years now, it's based off of kops deploying into AWS. Because our partnership was with-- the VMware cloud thing was with AWS. So the logical choice was to run our infrastructure in AWS. Kops was a really great tool at the time for building a lot of that out.

We've deployed those things. And it's been in production for about two years in a variety of environments. We're deployed in a commercial environment, the AWS GovCloud to support government customers. It's got a lot of restrictions and things we've built on top of kops along the years in terms of policy and things like that.

Looking forward, we have a desire to be able to run in multi-cloud environments. And kops isn't super great for that. We are looking at using the Tanzu products, specifically TKG and a lot of the cluster API stuff, to move our deployment methodologies to that, probably not like lifting and shifting all of our existing deployments and customers to that, but looking forward, dogfooding our own stuff and using that to free us up a little bit from the lock-in we have to AWS with kops.

CRAIG BOX: If you were a customer who was running an early Kubernetes installation, perhaps from a vendor who's moved on or no longer maintains it, how would you recommend they go about generalizing a move between an old Kubernetes distribution and a supported more modern one?

JEREMY RICKARD: We actually have similar problems. They're going through a process right now to do PCI accreditation. So we basically had to rebuild a lot of the infrastructure for that, and put in place different security controls in AWS, and flip the deployment topology we had and the CI/CD systems. And migrating people from that cluster to another cluster was pretty difficult.

A lot of people don't treat Kubernetes like cattle. They treat it like pets. That's definitely true for a lot of our tenants. One of the interesting things coming back, and working on this, and getting a lot of interaction with customers is that it's really apparent that a lot of developers and a lot of people that are building services, they're building great business value, but they don't really understand Kubernetes. And they don't really want to understand Kubernetes. There's a whole spectrum, I think.

There are people, for instance-- we just upgraded to Kubernetes 1.16. We are not in support for our version right now, getting there. But doing the 1.16 upgrade is pretty impactful for people. Going from deprecated versions of the deployment, API versions, to supported ones, even though they've been around for a while, was still challenging for people. They didn't really understand, why do I need to do this? What changes do I need to make to do this? When should I do this? Also, they're using Helm 2, which is now deprecated.

CRAIG BOX: Should talk to your friends at Microsoft about that.

JEREMY RICKARD: Using quite old versions of Helm at that, working through them through that process-- if you're going to make that transition, I think, realizing that doing a cutover is probably a good idea, getting into a flow where you can do that move more seamlessly, and thinking about all of the pain points, the things that lock you into that cluster, and figuring out how to work around them and abstract them out of your process. Make your deployments a little less sticky is really the advice I would give for that. Keep up on updates. And make sure you're moving as fast as you can to pick up these new versions just to reduce some of the pain that you'll see down the road.

CRAIG BOX: And so that migration is something you're actively looking into at the moment?

JEREMY RICKARD: Right now we're going through some proof of concepting efforts to figure out what it looks like for us to adapt our CI/CD pipelines to use TKG, Tanzu Kubernetes Grid. There are a lot of changes that go along with that. It changes a bit of how we deploy things now. So we've got a lot of processes, and run books, and accreditations too built around that existing system. So there's a whole lot of things that we have to change. Our end goal is to move to using that instead of kops as our main cluster deployment technology.

ADAM GLICK: Is there any interesting pull that you feel as you're working internally on 1.16 and, in the open source community, working on the release of 1.20?

JEREMY RICKARD: The most difficult part of it is seeing new features come in that are really interesting that I want to use that are alpha that we're not going to be able to pick up for a while until we get to later versions of things. And I see that across releases. Having been on the release team since 1.17, I see things like the GA of Ingress coming along. It'd be really cool to use some of those new things that are defined in the Ingress spec, to be able to use kubectl debug in our clusters. And we can't do that, because we're running those older ones.

CRAIG BOX: Well, congratulations on the release of Kubernetes 1.20, which has just come out today.

JEREMY RICKARD: Thanks.

CRAIG BOX: You've had various roles in release teams, as you say, since 1.17. What's the sequence of roles that you had to get to being release team lead today?

JEREMY RICKARD: So I've actually been on the release team as far back as 1.12. I shadowed bug triage in 1.12, which was way, way long ago. And then just due to various changes of work and other things, I didn't reapply to shadow any roles until 1.16. I applied to shadow enhancements in 1.16. Didn't get picked. It's pretty competitive.

Questions I've gotten from people is, how do I get onto the release team? A lot of people apply for a very small number of roles comparatively. I applied for enhancements in 1.16. I did the same thing in 1.17. So I came back to the 1.17 release as an enhancement shadow.

In general, the release team is made up of the lead and then sub-teams-- so like enhancements, or bug triage, CI signal. Each one of those is focused on a very specific function or phase of the release. So enhancements is front-loaded. They're tracking all the things they're going to land in the release. CI signal is checking test gridlock to make sure that the jobs are healthy and that we look like we're in a good spot.

In addition to those leads, you have shadows. And the shadows are kind of like apprentices or people that are learning the role, A, kind of figuring out if it's for them, or B, do I want to do this again in the future or move on to something else, and really learning the process so that they can maybe step up to take on that subsequent leadership role. I did the shadow role with Bob Killen in 1.17. And then Bob tapped me to be the enhancements lead for 1.18.

I got to step up that responsibility, make sure we were executing 1.18 correctly, and identifying all of the things that were going to go in, tracking them, making sure that we had docs and stuff aligned with that. And then I was kind of at a crossroads at the end of 1.18 figuring out what I wanted to do. In SIG release is another role caledl the release manager associate, which is kind of like a shadow to the release managers, the people that are doing the actual mechanics of cutting the branches, and publishing the binaries, and whatnot.

There's a lot of code and tooling that's been built for that, and processes. And it's kind of an important but not super flashy job. But it's very critical to the success of the project. I expressed interest in that. But I also got pinged by Taylor, who was going to be the 1.19 lead. And he said, hey, you did a great job in enhancements. Do you want to come and shadow me for release lead?

I thought about it for a little while. And I just thought that that would be a really cool opportunity. You get to see a lot more things. Having done bug triage, I saw that world. And having done enhancements twice, I saw that world pretty well. And they gave me a little bit of view into the other things. But didn't get a lot of deep dive into CI signal, figuring out, how do you triage a failure or a flaky test grid, or figuring out how comms worked, or any of that sort of stuff. So I thought that it would be an interesting job.

Bob Killen had done release lead shadow in 1.18. And he was going to come back for 1.19. So I assumed that Bob would step up and be the 1.20 lead. And maybe I would come back and shadow again, or maybe I'd move on to something else. But in the end, Bob wanted to do some other things. So I stepped up to be the 1.20 lead.

It's been a really interesting journey. I think in a meeting, a release team meeting, I think, Rob Kielty, who's the CI signal lead this time around, made the comment that the release team's like a Montessori school of Kubernetes. Because you get to see a lot of different things, kind of experiment. Like, am I interested in this CI signal stuff? Do I want to go move into SIG testing later? Am I interested in any specific area or any specific SIG, like seeing the enhancements come through? Do I want to go help out SIG node later on?

CRAIG BOX: Is their finger painting?

JEREMY RICKARD: There's a lot of fingerpainting. There's also Excel spreadsheets to work with and all kinds of great, fun things that you would want Montessori kids to do. I think it's a pretty apt description. You get exposure to a lot of different things. You get exposure to a lot of different people, get exposure to a lot of different processes, and see where things are maybe not the best, where you can make improvements later on.

Having gone through the enhancements process, I also stepped up to be one of the sub-project owners for enhancements, which is like a sub-project in SIG architecture. And we're really thinking about, how do we make that process better, both from how you're defining KEPs-- so Kubernetes Enhancement Proposals-- if you want to make change to Kubernetes, it goes through the process of the KEP. How do we make those things a little bit more transparent? How do we make the release process a little bit better along the way?

It's been really interesting. Because I think I've gotten to figure out where I'm most interested in helping out going forward. Stepping off of the 1.20 release team now, I have a couple of areas that I'd like to focus on going forward.

ADAM GLICK: Speaking of the 1.20 release, is there a theme for the release?

JEREMY RICKARD: If you look back through the previous releases, 1.18 was a bit quirky. The chilliest release was 1.17. It's one of the fun tasks of the release lead is to come up with the unofficial theme. If I look at the content that's going in right now, starting in 1.19, there was a push towards not having permanent beta anymore. So there's a lot of things that have been in beta forever. Starting in 1.19, there's a clock on those things.

You either need to graduate-- so move to stable-- or undergo some major change. You can't just sit in the state that it's in right now. And we see a lot of stability in this release, stability in terms of alpha things promoting up to beta, or beta things undergoing pretty major changes. Cron jobs is one that's pretty interesting. It's been around for a long time-- 1.4, I think, as scheduled jobs. Then it became cron jobs in beta.

It's still in beta. But there's a new cron job controller coming in with 1.20. And all of that's to really support getting it to stable probably in 1.22. Another one, CRI, that's been in alpha forever. And it's promoting to beta in 1.20. Seeing some of these long-standing things has been really, really cool.

Another interesting aspect of the release is that we track these enhancements. And that's really what makes up the content of the release. In 1.17, which is the corollary to this release, end of the year-- sure, overlapping with Kubecon and stuff-- there were only 22 enhancements that made it into the release. But in this one, there were 44-- actually 45, because we had one that was in the release but not tracked, going back to those enhancement process improvements.

So it's been really interesting to see how the community has stepped up and contributed a huge amount of content, both in terms of new features-- there's 11, maybe more than that, brand-new alpha features or alpha things that have undergone a pretty major rewrite. Like the dual stack IPv4, IPv6 stuff has undergone a really major change staying in alpha. It's a really major change coming in that, eventually, we'll promote up to beta and stable. Things like kubectl debug promoting from alpha to beta-- we just have a ton of stuff.

And I'm not sure if it's because the 1.19 release was pretty protracted and long. There was a really, really extended code freeze period as we tried to get CI Signal healthy and just deal with the events of the world. But it was nice to see just so much focus on getting stuff ready to go with the early part of the release. One of the challenges with the release is people try to land content.

So it starts with, here's my KEP. I need to get the KEP into a proper state before enhancements freeze comes along. And then I need to get things ready and merge before code freeze comes along. Interestingly, this time around, we had a lot of things that were ready to go, or almost ready to go, at the beginning of the release. And maybe that had something to do with the prolonged release.

The IPv6 dual stack thing definitely had something to do with that. Talking to Lachie and Kal, they knew they were going to redo that functionality. But they didn't want to try to land it at the end of 1.19. When we try to land those major changes at the end of the release, it's really stressful. Because this breaks CI Signal. Is this going to make things unstable or hard to use, even at the alpha level? So they kind of sat on that for the 1.19 release. And then they were ready to say, hey, at the beginning of 1.20, here's this big feature. We're ready to land this.

And I think that was really, really helpful for the release team, but also gave a lot of time for soaking and testing of that stuff going forward. We actually had one enhancement this time that I think will be in that same kind of boat. They were really, really close. And it was enhancement 1769, memory manager. So it's about making NUMA memory available to the pods in an easy-to-use way.

So they got really close to code freeze and were still waiting for an API review. So they proactively asked for an exception. So when we get to code freeze, there's this process of saying, I'm not going to make code freeze. Can I maybe land this three or four days later? So they applied for that exception early, and we granted it.

And they still didn't quite get it over the line. And rather than try to shove that in, give them a couple more extra days, they made the decision that it would be better to land that in 1.21. So they've taken the rest of this time to work through some of the comments that came up during the API review, to work through some more testing, and just get it to a better state overall. I think it'll be one of the first things that lands in 1.21 once we open that stuff back up.

ADAM GLICK: Some of the previous releases have been fit and finish releases really focused on stability and performance with not a ton of new features. And this really seems to buck that trend, being a lot more feature-rich. Do you think that's a function of the no permanent beta change? Or is there a shift that you've seen in terms of the focus on the group? Or is that just a randomness of time of all things just happened to come together for this release?

CRAIG BOX: A Christmas miracle.

JEREMY RICKARD: It's definitely a combination of all of those things. I definitely think that we're seeing more things either updating themselves in beta or making a lot of strides to get to stable. Definitely, the work that's happening with the CRI promotion to beta, the new version that's going on there, I think it's been a long time coming. And I think just the push there is going to make that happen, same thing for cron jobs. We'll probably see more of this in 1.21. Because there are a lot of things moved into deprecation starting in 1.19. And their clock has kind of started.

It'll be interesting to see things like pod security policies, where there hasn't really been a strong replacement identified yet. There's definitely, I don't think, agreement on what a new version of PSP should look like. But it'll be interesting to see, I think, in 1.21 and 1.22, how much effort goes into those. I definitely think we'll see more of that. This is feature-rich.

There are a number of new alpha things. If you look at the content that comes out in this release, there are a number of new features. But if you group together the beta and the stable things, there's more beta and stable things than there are new features coming in the release. So still, there's a large focus on getting to that stability point.

CRAIG BOX: One of the changes in 1.20 is deprecation around container runtimes, which caused a little, tiny news cycle last week, where people were of the opinion that Kubernetes 1.20 is deprecating Docker. That sounded like a kind of a big deal. It turns out it's nothing like that, of course. What exactly is happening? And why did that get out there? Why wasn't that communicated in a different way?

JEREMY RICKARD: The summary of what's happening there is that, in kubelet, we use the Container Runtime Interface to talk to runtimes, things like CRI-O and containerd. There was also a legacy branch that used the Docker daemon, basically, as the runtime. And that was done through something called the Dockershim. So the Dockershim was a shim to make the Docker daemon appear as a Container Runtime Interface. Because it doesn't implement all the APIs.

What's happening in 1.20 is that that's being deprecated. It causes a lot of problems maintenance-wise, because you have to make changes to the actual CRI flow in those implementations. But then you also have to come back into the kubelet and make changes to the Dockershim, because it lives in the kubelet code. Well, what's happening in 1.20 is, alongside that graduation of CRI to beta from alpha, they are deprecating the Dockershim. So eventually, it'll go away.

In 1.20, there's not really any change. You'll start to see some log messages when the node starts up, when kubelet starts up. You'll see a message saying, you shouldn't use the Dockershim anymore. At that point, it's going to get removed at some release in the future, probably not until 1.23, maybe later than that. What it really means is that you should consider an alternate container runtime, something like CRI-O or containerd. There's other options that you can pick from too. Those two are the biggest ones.

Other changes you'll see-- like, if you're trying to mount the Docker socket into a pod, that won't be available directly anymore. You'll have to do some different things there. This all came about, the flurry that was on Twitter, and Hacker News, and Reddit, because the release note was written. And the release note said, we're deprecating this piece.

When somebody's writing a feature, or they're working on a pull request that comes into kubernetes/kubernetes, one of the fields in the issue template is release note. And we use those to build, automatically, the release notes for the release. When me, as an engineer, I'm working on a feature, I maybe have context in my head that makes it not as clear. I might write a release note that has some context baked into it.

I actually had this conversation with some of the people on my team. Because they were also taken aback, like, whoa, 1.20's coming. We're using Notary. And that only works with Docker directly. How do we handle this going forward? I think there were some really great threads on Twitter. Kat Cosgrove had a really, really succinct, really great explanation of what that means to you going forward.

There was also a lot of hand waving and people projecting their anger into these places about this coming. Because again, the release note wording itself probably wasn't as clear as it could have been. How do we fix that going forward, I think, is a really important question. There's two pieces specifically for this one.

Going back to something I mentioned earlier, the whole release process comes along initially from enhancements collection. Everything that's going to go into Kubernetes that's making a change goes through the enhancements process. And you write a Kubernetes Enhancement Proposal for that. And there is one for this deprecation. There should also be an issue, Kubernetes enhancements issue, to go along with that KEP so that it can be tracked. And that's really the thing that we use to track things from release to release.

There wasn't one for this. So it kind of fell off of our radar. And we didn't really follow it much until quite late in the release. Because we discovered, hey, there's this thing. It's in the release notes now. Where'd it come from? Oh, well, this is happening.

Again, a lot of these things are human-driven. Humans make mistakes. Humans miss things-- areas for process improvements. That's one angle. We could have caught this earlier. And we could have spun up some extra communications about that.

Then I think, on the other end, as contributors, we can work more closely with technical writers and docs people from SIG Docs to make better explanations for these release notes, maybe not as technical, maybe translated into less-- unpacked, I guess, pull the context out of that, and turn it into something that's a little more consumable. Stephen Augustus opened an issue in the Kubernetes community repo last week in response to this. As all this flurry was happening on Twitter, a bunch of people came together in the SIG Contribex channel on Slack and said, hey, how do we address this? They worked really, really closely. It was a group effort.

There were several people. Kat Cosgrove came in. And Jorge Castro was involved, and Bob Killen. Several other people came together to write two blog posts that were able to be released before the release came out to give a lot of context around what was actually happening to try to diffuse some of this stuff. I think it would be better to do more of that in the future as we identify these things. And Stephen Augustus has opened this issue in kubernetes/community. It's number 5344.

CRAIG BOX: I love how prepared you are. Let's just put that out there. The show notes practically write themselves with you.

JEREMY RICKARD: I went through, and like-- oh, I got to find these links. Yeah, so Stephen opened this issue. And it's got a lot of feedback so far about how we maybe handle these things. Can the release team be a little bit more involved? And it really depends on us identifying these things early. Can the SIGs take a little bit more proactive role in identifying these breaking changes? Can we work with comms people or SIG docs to make sure that we have a good story going forward?

Definitely take a look at that issue if you have any feedback or comments. I don't think it'll be closed anytime soon-- a great place to go and add your thoughts and feedback. How did this experience of learning Docker was going to be deprecated, impact you? And then how did you feel when you learned it wasn't actually being deprecated? It's just a change coming into the runtimes.

ADAM GLICK: We talked before about the "no permanent beta" piece driving faster feature releases. Do you also expect to see an acceleration of deprecations?

JEREMY RICKARD: I do. You know, one of the changes that went into 1.19 was that things are getting marked, when they're introduced, with a deprecation date in mind. Another really great feature that came with 1.19 was telling you that you're using deprecated things. Before 1.16, and when people got to that 1.16 change, that didn't really exist. So you didn't know, oh, I'm using v1 beta 1 deployment's, and I shouldn't be.

You'll now get that back through a kubectl call. So when you make a call to the API or with kubectl, a header comes back. And the tool can spit that out for you. There's also a metric that is generated that tells you, oh, this deprecated API is being used.

I definitely think that because of that new policy of no perma beta, we're definitely going to see either a ramp-up in deprecations-- things like PSPs are probably going to go away unless somebody steps up with a great alternative to it-- or a lot of effort's going to be put into making them advance, like we're seeing with cron jobs in this release.

CRAIG BOX: On the topic of things going away, it's a perennial question. You're going to be expecting it. But every release team lead I speak to, I ask about a feature that I want, you want, everybody wants, sidecar containers, the ability to mark containers in the pod in a particular order and say, this one starts before that one. Now it's a little unfair for me to ask about this issue, because the enhancement proposal was actually closed in October. Tell us a little bit about what's happening with it.

JEREMY RICKARD: There's going to be a new version of that enhancement proposal. The way that enhancements work for Kubernetes is that somebody comes along, and they write a KEP. It's got a whole bunch of stuff in it that talks about what this future will be and what this feature won't be, helps get agreement, defines things like end-to-end tests, everything that you really want to have this definition of this new feature coming in.

What happened with that one, I think, is that nobody wanted to say no originally. People like to be inclusive. And people want to help contributors bring these new things that they want to use. And that one, I think, got into a state where there was a lot of work done for it. Some of that work wasn't going to be feasible with the way that the kubelet works now, technical debt from the past, maybe not picking up all the assumptions that are necessary for this feature to work.

What's happening there now is that people are collecting use cases for this. Andy they're trying to work through what a new version of this feature will look like. So I think we'll see a new version of the KEP going forward. That one that has been open for a while-- kind of near and dear to my heart also, because I want that feature. But in 1.18, when I was the enhancements lead, we saw this one, probably for, like, the third release in a row, getting real close to code freeze. Like, it hadn't gotten all the reviews that it needed. It hadn't gotten over that line.

And bless the author who had been pushing that thing along, been rebasing the PR from release to release. I think he was very frustrated, because it hadn't gotten to where it needed to be. Behind the scenes, one of the jobs the release team does is facilitate and coordinate between SIGs. And that feature in particular is kind of a crosscutting thing. It kind of lands in SIG node, but it lands in other places too. Because it has ties into other groups.

We got some movement on it. Reviews started happening. And that's when some of the issues started popping up. It became pretty clear that, in its current implementation, it wasn't going to go forward. But I definitely think that it has spurred a lot of people on to work on that feature. And I think we'll see it in the future.

ADAM GLICK: Is there anything else that didn't make it into 1.20 that you're really looking forward to?

JEREMY RICKARD: I didn't track as closely the things that missed the deadline. I felt personally bad about the memory manager one that we granted the exception to. And that's number 1769, if you want to go look at it. Through no fault of anybody-- just bandwidth is hard. Getting reviews of stuff is hard. They had been pushing this feature along. And they proactively asked for an exception request. And it just didn't make it. I'm excited to see that one go in.

Another one I think will be interesting is moving Kubernetes to distroless images. That one also didn't quite make the cut this time. That'll be an interesting one from a security standpoint. One of the problems that we have is that we're running quite old versions of things. And when we run on our platform, when we run in these more restrictive security environments, we have to fix critical vulnerabilities really fast, within 30 days.

And for some of these older versions, that's not going to happen. Maybe that's not going to happen with patch releases on other things. So as we shrink these containers down and have less of an OS footprint in them, I think it is just a great security thing going forward. So I'm excited about that one hopefully landing in 1.21.

CRAIG BOX: One of the last tasks of a release team lead is to nominate and help prepare the successor. Now you've nominated Nabarun Pal from Clarisights to be the release team lead for 1.21. What have you written in the proverbial envelope?

JEREMY RICKARD: I'm pretty excited to see Nabarun step up for a number of reasons. I shadowed with him on enhancements in 1.17. So we've been on the release team together for quite a while now. He was one of my shadows for 1.18. And then I picked him to step up to lead 1.19 for enhancements. So I'm getting to do that again.

When I became the release lead, I looked across people that had led teams before. And I had a great set of shadows this time. Dan, and Savita, and Nabarun were all super helpful. One of the things we tried to do this time was to make things a little bit more inclusive time-zone-wise. So we ended up making a meeting that was more Asia-Pacific-friendly.

It was still like OK for people on the East Coast of the US, not great for me or people in California. But we had a number of people on the release team this time that were in Europe and in India. So it's also not fair to make them do these meetings when it's convenient for us.

So Nabarun was super critical in helping make sure that happened. He facilitated those time zone meetings. He helped iterate on how we can more asynchronously collect some of these updates. If you've ever watched any of the release team meetings, a lot of it is, here's the status, here's the status, here's the status, here's the status.

CRAIG BOX: You're really selling it to me. I'm going to go catch up on the back catalog as soon as I can.

JEREMY RICKARD: We're trying to move through those things and make it a little more asynchronous. It'll be really interesting to see Nabarun lead this from outside of the US. If you look back at the release team leads, it's really been US-centric going back as far as I can look back in GitHub. It'll be really cool, I think, to have a lead outside of the US. Somebody leading from the Asia-Pacific region will be really cool for the project.

One of the interesting things I've told him is that, coming in as the release lead, I definitely didn't prioritize some things as much as I should have. And I've definitely given him that advice, pay attention to the release note stuff ahead of time. Because we got into a tight spot at the end where we hadn't quite gotten to where we needed to be. So it was a lot of catch-up. Same thing for comms-- I think there's a lot of proactiveness that needs to happen there.

Also, watching the exception requests come in this time, one of the hardest things you have as the lead is the job of saying no. I like to not shut people out. And I like to help people be successful. And we had a number of exception requests at code freeze that we ended up having to say no to from risk and just the lateness of the requests coming in. And it feels terrible doing that.

You've seen these people dump a ton of stuff into things, put their heart and soul into getting this feature across the line. And then you have to say no. So I think that was the biggest advice I gave him, like, don't be afraid to say no. And make sure that you're making the best decision for the release, not necessarily the best decision for an individual feature's health.

ADAM GLICK: Finally, as a coder, you've written code that runs in a lot of different places. One of the most unique ones that I've heard about was a fighter jet. Was it there as a passenger?

CRAIG BOX: In-flight entertainment?

JEREMY RICKARD: It could be in-flight entertainment. In a previous life, for a long time, I worked as a defense contractor. I worked for a company called Lockheed Martin. They do a lot of stuff across different areas of defense. The group that I worked for did a lot of ground station things like air planning missions. When you hear about no-fly zones being set up over places, a lot of the technology and the software that's used for that was written by the team that I used to be on.

One of the things we were doing was trying to figure out how to make the experience of calling in surveillance missions. You want to fly a plane over somewhere, and take a picture of something, and send it back. How do you make that process better? Because before, it was very manual. And you would say, hey, fly over this area. Take a picture. Send it to me. No, that's not what I want. Fly over here.

We ended up building-- this is, like, 2004, 2005-- so a ton of awesome, cutting-edge technologies like SOAP, and WSDLs, and Enterprise Java. We basically built this application that ran on an early version of a ruggedized tablet with pens. F-16s run really old technology. So this was kind of like an add-on thing. It was basically a kneeboard computer that would allow the person to say, here's a box. Go take a picture inside that box. Send it back to me.

And the person could send it. And they could say, no, circle the little area. Go take a picture of that. So it would do communication back and forth over an IP radio. So it was really cool. We got to write this piece of software that ran in the jet, this piece of software that ran on the ground, figure out communications between them, figure out what the latency is for that.

One of the problems of trying to do these kind of communications over IP radio that maybe going in and out, how do you make this things discoverable too? Problems we have now are like, I have all these microservices popping up, or ephemeral things. Like, the jet's ephemeral. It's going to fly in and out of an area. How do I pop that into the system when it comes within range?

It was really interesting, I think. Gave me the chance to do a lot of different, weird things that I had never thought of that I would work on going through college. We called it John Maddening. Because when you watch football games in the past and they would draw the plays we're going to do, I never thought, like, in college, that I would be sitting there watching the plane fly around Arizona drawing pictures and then going back and forth. It was a fun, interesting thing.

One of the things that's really stuck out to me recently was seeing that a U2, I think, is flying with Kubernetes on board. And they were able to do software updates to it very, very quickly. And having worked in defense before, definitely is a change. Because you did not update those things quickly at all. Getting approval--

ADAM GLICK: In the air?

JEREMY RICKARD: Yeah. Getting any of that stuff to happen was very difficult. It's interesting to see how the technology keeps coming and going. You have these things developed for use cases that you don't really expect. And then you see it pop up in this other way.

CRAIG BOX: How do you feel about the fact that you had to go to a specialist company to develop software to run on an F-16, and now every Kubernetes contributor can effectively claim the same thing?

JEREMY RICKARD: It's a big change. At the time, early 2000s, open source was not a big thing in the government world. I worked on other things before that that were even harder to get open source things into. But it was really interesting to see the change start to happen then. We were using, again, like Enterprise Java stuff in this project before the F-16 thing.

It all ran in classified environments. And it didn't have connections to the internet. How do you solve those air gap problems doing the same thing, again, now for my internal dev platform? But how do you bring in these bits? And how do you attest that it is what it is? And it's very difficult. I don't think, at the time, that it was very fun. There were a lot of challenges and problems, just frustrations around that whole process.

It's really cool to see how that's changed. Thinking back to the times that I was in that environment, it's just really interesting to see the evolution of open source technologies in things like Kubernetes and the pervasiveness that you have with those things now.

Can I ask one question?

ADAM GLICK: Sure.

JEREMY RICKARD: Earlier, you said that only two people had been on twice. Who was the other one?

ADAM GLICK: The other person who has been on twice is Paris Pittman. She was on our very first show about community. And then after a year, we had her back on to talk about an update on the community.

CRAIG BOX: After two years.

JEREMY RICKARD: I'm in very good company then. I feel honored.

CRAIG BOX: I've been meaning to ask, are you related to Matt Rickard?

JEREMY RICKARD: I'm not. I discovered him when I started at Microsoft and started doing more in open source. I pinged him one day. And I'm like, hey, we have the same last name. Are we related? And we're not. I don't think, at that point, I had ever really met anybody with my last name outside of my immediate family. So it was really interesting to see, oh, somebody's working in this space, and we have the same last name.

Kind of related, when the internet was becoming a thing, and like, google my name right after college, I googled my name. And I'm like, oh, I wonder what presence I'm going to find for myself. And the very first record is a guy named Jeremy Rickard who is a mathematician in the UK. So I emailed him. And I said, I was googling my name, and I saw that we have the same name. I also like math. He replied--

[LAUGHS]

ADAM GLICK: What a great icebreaker there.

CRAIG BOX: I'd like it back, please.

JEREMY RICKARD: He replied, and he quoted my email. The first thing I said was, I was googling my name. And the first thing he said was, as one does. And that's always stuck with me as a great use of the internet, and communication, and how small the world is.

ADAM GLICK: Jeremy, it's been great having you on the show. Thanks for coming back.

JEREMY RICKARD: Thanks for having me. I've really enjoyed this.

ADAM GLICK: You can find Jeremy Richard on Twitter @jrrickard.

CRAIG BOX: You can find Matt Rickard on episode number six.

[MUSIC PLAYING]

ADAM GLICK: Thanks for listening. As always, if you've enjoyed this show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @kubernetespod or reach us by email at kubernetespodcast@google.com.

CRAIG BOX: You can also check out our website at kubernetespodcast.com, where you will find transcripts and show notes, as well as links to subscribe. Until next time, take care.

ADAM GLICK: Catch you next week.

[MUSIC PLAYING]

View More Episodes