Kubernetes Podcast from Google: Episode 71 - containerd, with Derek McGowan

#71 September 18, 2019

containerd, with Derek McGowan

Hosts: Craig Box, Adam Glick

containerd was born from community desire for a core, standalone runtime to act as a piece of plumbing that applications like Kubernetes could use. It sits between command line tools like Docker, which it was spun out from, and lower-level runtimes like runC or gVisor, which execute the container’s code. This week’s guest is Derek McGowan, a Software Engineer at Docker and a containerd maintainer-d.

Along with the news of the week, Adam and Craig discuss the many Vancouvers.

Do you have something cool to share? Some questions? Let us know:

ADAM GLICK: Hi, and welcome to the Kubernetes Podcast from Google. I'm Adam Glick.

CRAIG BOX: And I'm Craig Box.

[MUSIC PLAYING]

ADAM GLICK: Craig, I've been finishing up our travels here on the West Coast, and I'm learning some of the amazingly confusing geography of the Americas here. As it turns out, I'm based in Seattle, and you can drive about three hours north or south, and either way, you end up in Vancouver-- either Vancouver, Washington, or Vancouver, BC.

CRAIG BOX: Lovely.

ADAM GLICK: If you drive south, you will also run into South Bend, Washington, which is curiously located about an hour north of North Bend, Oregon. And both of those are separate than Bend, Oregon, which is a completely different place.

CRAIG BOX: Are these places named for a geographical shape of some description?

ADAM GLICK: I suspect they might be named for bends in the river and multiple rivers that are there. We also came across one of the greatest-named cities I've ever run across, which is Cosmopolis.

CRAIG BOX: Oh, really? Sounds like a board game.

ADAM GLICK: I was like, oh my god, that must have been named in the '60s. But actually, it goes back to the 1800s.

CRAIG BOX: Well, speaking of Vancouver, I actually have a connection to Vancouver. And George Vancouver, for whom the city was named, is buried in a cemetery around where I live in the UK.

ADAM GLICK: Interesting.

CRAIG BOX: Yeah.

ADAM GLICK: Wow. Connected the dots, there.

CRAIG BOX: Yes.

ADAM GLICK: I also went to a gallery in Astoria, Oregon, and ran into a guy who is doing a gallery show of artwork. And it turns out the artist they were showing was a bunch of art that had been painted for the covers of albums. And so I got to see some of the cover art that-- you see the album cover, and you don't realize that the actual piece of art is much, much larger.

CRAIG BOX: Right.

ADAM GLICK: There's much more to it. So they had some stuff from Kansas, Warrant, AC/DC, The Bullet Boys. And it turned out the owner used to be in the music scene. He's like, oh, yeah, I used to be in a ska band in Detroit. I used to do a radio show back in Cleveland that did ska music, amongst other things. And it turns out that I'd been to their shows. And just one of those kind of weird quirks to run into somebody that you'd connected with long, long ago.

CRAIG BOX: Indeed.

ADAM GLICK: How are your travels going?

CRAIG BOX: Well, speaking of AC/DC, I am in Sydney, Australia this week. One of those great things about hosting a show like this is occasionally, someone puts the dots together and says, ah, I saw that you were coming to speak at this summit, and I love the show. Would you come and have a chat with us? So by the time you hear this, I will have both done a presentation for the Cloud Summit in Sydney and then also had a chat with a couple of people who reached out via the podcast. So thank you very much.

If you're interested and you do hear that either Adam or I are on the road somewhere, please do get in touch with us. We love to catch up with all the audience. And I made sure my bag was full of stickers.

ADAM GLICK: Awesome. Let's get to the news.

[MUSIC PLAYING]

CRAIG BOX: Istio 1.3 is out, bringing with it a raft of usability improvements and technology previews. The mesh now captures all inbound traffic by default without the need to configure ports individually. Protocol detection is enabled by default on outbound traffic and can be optionally enabled in preview for inbound traffic, letting you test this feature before it also becomes default in a future release.

In the CLI, a single command can now add services to the mesh for Kubernetes services or external services running on a VM. And a Describe command will now tell you if any changes need to be made to your Kubernetes objects to enable Istio features.

Additionally, a preview of Mixerless telemetry allows you to send telemetry data from the Envoy proxies directly rather than via Istio's Mixer component, which improves throughput and reduces latency.

Istio now has over 400 contributors from over 300 companies, giving it a community completely unmatched by any other service mesh product.

ADAM GLICK: Google Cloud held an event in New York City this week where they announced a significant set of updates to their Anthos offering. The hybrid and multi-cloud Kubernetes platform from Google has added a beta of Anthos Service Mesh, a managed Istio service. They announced the beta of Cloud Run for Anthos, a managed serverless offering, as well. Cloud Run for Anthos is compatible with Google Cloud's managed Cloud Run service, allowing customers who want to move their serverless applications into a hybrid or multi-cloud environment an easy way to make that happen.

Google Cloud also stated that Anthos Config Management has added binary authorization to ensure that containers and code launched into Anthos environments had not been altered. The ecosystem of partners continues to grow, as they announced a new list of partners, including Atos, Cognizant, Deloitte, HCL, Infosys, TCS, and Wipro.

CRAIG BOX: The cloud data of application bundle, the topic of episode 61 of this show, has reached 1.0, its final draft. The core specification has also joined the joint development foundation, home specifications including the AV 1 codec, and the GraphQL query language. CNAB tools, including Duffle and Porter, are now being updated for the final spec. And work continues on getting the registry and security specs updated.

ADAM GLICK: The CNCF has opened up nominations for their annual community awards. These awards fall into three categories of Top Ambassador, Top Committer, and "Chop Wood, Carry Water" award. Nominations are due by October 2nd, at which time voting will begin. The winners will be announced at KubeCon San Diego this November. If you're curious about which ambassador to support, don't forget that the handsome voice of Craig Box that comes to you each week on this podcast is one of the CNCF ambassadors you can vote for. And I'm sure he'd be honored to have your vote.

The CNCF also released a case study this week on how Bloomberg has achieved close to 90% hardware utilization using Kubernetes.

CRAIG BOX: On the topic of utilization, Gajus Kuizinas shared his learnings that the shape of the node you use for your cluster is important. Unable to get an answer to his question of what size his nodes should be, he set out to optimize his cluster and see the results. Gajus learned that bin packing is efficient, but only if you have a machine that has the resources to be able to do so.

Using one vCPU machine can allow for fine-grained cost control, but at the expense of overhead and a lack of an ability to move things around without high start up and shut down times. In his case, he found that using 16 vCPU machines gave him the right mix of control and cost management. But his story is a good reminder that the economics of Kubernetes has many facets.

ADAM GLICK: The Kubernetes IoT Edge working group has published a white paper entitled "Edge Security Challenges." Authors Kilton Hopkins, Jono Bergquist, Bernhard Ortner, Moritz Kroger, and Steve Wong made the white paper 1.0 this past week and have posted it to GitHub for people to read and comment on.

CRAIG BOX: Cruise Automation has released Isopod, an expressive DSL framework for Kubernetes configuration without any YAML insight. Instead of static replacements done by templating tools like Helm or Kustomize, Isopod lets you use a Python-based language which has been extended with support for the Kubernetes API server, HTTP requests, vault secrets management, and more. All these built-ins include mock implementations for unit testing.

Objects are created and managed against the Kubernetes API server directly using protobufs, without a YAML intermediary. Since adopting Isopod, Cruise have seen a 60% reduction in code size due to reuse and 80% faster rollout, due to parallelism.

In related news, Pulumi, another infrastructure, has code offering with a broad base of object support, including the three major cloud vendors, recently reached 1.0.

ADAM GLICK: StackRox wants you to know about five RBAC mistakes you should avoid. Their post focused on the mistakes of granting cluster admin unnecessarily, improper use of role aggregation, duplicate role grants, unused roles, and granting of missing roles. These boil down to good housekeeping by cleaning up things that aren't used and simplifying what you're doing to avoid confusion and mistakes. A good set of rules, to be sure, and a fine reminder that Occam's Razor is also a good operations principle.

CRAIG BOX: A new feature being worked on for Red Hat OpenShift 4.2 is airgapped for offline installations where you don't have connectivity to the internet. You'll need to swap a lot of floppy disks to install those container images. If this is important to you, it is available to test in their nightly builds.

ADAM GLICK: If you prefer your images online, Red Hat has also announced Quay 3.1. This release brings a number of features for repository mirroring both in Quay and with external sources, as well as operator-based deployment. Accessing archived repositories and more storage options, such as cloud provider blob storage, were also added in this release.

CRAIG BOX: Updates to Azure Kubernetes Service this week bring Scale Sets and the Standard Load Balancer to general availability. They also fixed an issue with consistent and recurring crash-and-reboot looping of worker nodes introduced by a buggy upstream kernel.

ADAM GLICK: Over at Amazon, they've added cluster tagging to EKS nodes and the ability to assign IAM permissions to Kubernetes service accounts for new clusters running 1.13 or above.

Additionally, Abisheck Ray, a software engineer at Amazon, has been writing a blog on learning AWS. Recently, he posted on Fargate performance versus Lambda and about when you would want to use Lambda, ECS, or Fargate. He's done some interesting benchmarking and provides some conclusions and recommendations for when you should use each of these services in his posting.

CRAIG BOX: Kong, makers of an API gateway built on NGINX, have released Kuma, a service mesh which manages a fleet of Envoy proxies. If this sounds familiar to you, then you're not alone. Their "versus others" page in their doc calls out differences which are actually features that Istio has, and offers no obvious reason why they didn't just join the Istio community. Kong seemed to be taking the investment made in Envoy by the Istio teams and claiming it's easier to use. In light of the major usability improvements and 300 company community in Istio we announced earlier, we wish them all the best with that.

ADAM GLICK: Google Cloud has announced the alpha of Cloud Dataproc for Kubernetes. This brings a managed Hadoop service into GKE and makes it available for hybrid use with Anthos. It gives users of Dataproc a single way to manage clusters running on either YARN or Kubernetes. Spark is the first open source big data software included in this alpha, but Google Cloud are teasing others, releasing an open source operator for running Apache Flink on Kubernetes.

CRAIG BOX: The cat and mouse game between security researchers and security vendors continues. Mark "Antitree" Manning from NCC Group discusses ways to bypass sys call blocking in Falco, originally by Sysdig, and now a CNCF sandbox project. Falco's rules are attuned to observe and report rather than block, so the post isn't an expose or a bug report, but rather, a good example of how security research is performed and what goes into the assessment of an environment.

ADAM GLICK: Finally, Rafay Systems landed an $8 million series A round of funding this past week as part of their expansion into SaaS Kubernetes services. The 26-person company originally had $4.1 million in seed funding. And their new SaaS offering is an automation framework that's designed to simplify running Kubernetes. NTT DOCOMO was announced as a customer, as well as one of the investors in their latest funding round.

CRAIG BOX: And that's the news.

[MUSIC PLAYING]

ADAM GLICK: Derek McGowan is a software engineer at Docker and maintainer of containerd. Welcome to the show, Derek.

DEREK MCGOWAN: Hello. I'm glad to be here.

ADAM GLICK: To start out with, can you explain what containerd is to those who might not be familiar with it?

DEREK MCGOWAN: Containerd is a container runtime that's designed, really, to be simple, cleanly built, stable, and tightly scoped. The word "container runtime" is used quite a bit to refer to different levels of the stack. So think of containerd as managing all the resources related to containers.

So this could be your actual containers as well as the tasks that are being run inside the container. They could be the artifacts that are pulled down from a registry. They could be the copy-on-write file system objects. Containerd managers all those so that the higher-level callers of containerd don't have to worry about that.

ADAM GLICK: Can you give some examples of what those higher-level callers might be and what the stack is as you think about it?

DEREK MCGOWAN: Containerd is sitting, really, right below Kubernetes or the Kubernetes kubelet, as well as Docker. So Docker will call into containerd in order to do all of the container management. Kubernetes calls through containerd through the kubelet, through the container runtime interface. This delegates the responsibility of pulling containers, creating containers, tearing down the containers, and other resources related to the actual containers that are running.

And then, below containerd, we really have a bunch of runtime. The most common one is runC. So runC is the reference implementation of the OCR open container, as well as common sandbox runtime such as gVisor, Kata Containers, Firecracker. As well as for implementations on Windows, it calls directly into the HCS shim layer.

ADAM GLICK: How did you get involved with containerd?

DEREK MCGOWAN: I've been at Docker for quite a while, now. And originally, I was working on stuff related to the registry-- so on the registry API as well as the new registry implementations. Then I moved into working on some of the Docker engine back end, like some of the image storage. I implemented the overlay2 driver.

And it was around that time that containerd's scope had increased. Instead of just managing container resources, also managing the resources related to images. So I joined the containerd team, really, to focus on building out that image storage as well as the client that the Docker and Kubernetes would end up interfacing with.

ADAM GLICK: Where did containerd originally come from?

DEREK MCGOWAN: Containerd originally was designed to interface directly with runC so that Docker itself didn't have to keep a bunch of state related to the containers. So it decoupled Docker from the underlying containers. So for example, runC is very low level. RunC itself doesn't act as a long-lived parent of containers. So Docker was doing that, previously. So containerd was originally created to create a boundary between Docker and runC so that containerd could own the actual container resources. And then from there, containerd grew to include image resources, as well, so that containerd could be a full-fledged container runtime that could be used by Kubernetes without having to go through Docker at all.

ADAM GLICK: A lot of people are familiar with interacting with their containers through tools like Docker. You've got all the different ways you can see which images are loaded, which ones have been downloaded, which ones are currently running, and access that. How does containerd give people access to that kind of information?

DEREK MCGOWAN: Containerd was certainly designed in such a way where we expect it to be invisible to the end users. But the users want visibility into everything that's running. So whether or not it's tooling that they're using to-- in their production environments, the debug tooling is really important.

So today, we have a tool called ctr, which is used for debugging containerd. It's not a tool that we support. And it's not a tool we even plan to support in the future. But what we do expect is that we'll see better integrations with higher level tooling, such as Docker, so that you can use your existing workflows in order to see everything that containerd is doing.

As well as we'll probably see new tooling come along that's maybe only talking to containerd. Maybe it's based on ctr. Maybe it's something new. But I think we're looking forward to see what comes out of that. And we certainly want users to use what they're familiar with, which is Docker, so that they can see their containers just using their normal docker ps, docker images commands.

ADAM GLICK: Where does the logging and monitoring information go? Does that interface directly with Prometheus? Or does it just expose it by an API for others that want to consume it as they pull it out of that API?

DEREK MCGOWAN: Yeah, so this was something we added to containerd, which is per container metrics. And we actually expose those through Prometheus. So if you're using Prometheus, you can use the Prometheus end point to query statistics about individual containers.

The actual logs in containerd-- containerd doesn't handle the logs at all. It actually just passes through fifos that the caller would create, pass down to containerd, such that the caller can have is direct access to the standard out or any of the outputs from the container without having containerd sit in the middle.

ADAM GLICK: Recently, the project has grown quite a bit. It graduated from the CNCF.

DEREK MCGOWAN: Yeah.

ADAM GLICK: Congratulations. How did it feel to achieve that milestone?

DEREK MCGOWAN: For us, I'm not going to say it didn't feel like anything. It certainly felt good to get recognition that the project's growing and the community around it. But really, containerd's been steadily growing over the past three years, since we joined CNCF back in-- I think it's 2017? So there was at no point a sharp inflection point that we hit. And graduation certainly doesn't reflect some sort of inflection point. It's just recognition of what we've seen over the years of containerd growing, becoming more mature, and actually getting adoption out in the community.

So containerd's been adopted by quite a few cloud providers today. So after graduation, we started to see more excitement from even more projects than just cloud providers, but other platforms, as well-- so for example, k3s, some other projects have come to us and started integrating with containerd.

ADAM GLICK: So not just cloud providers, but some of the installed Kubernetes versions.

DEREK MCGOWAN: Yeah. Certainly, the cloud providers were some of the earliest adopters. And they really helped containerd grow to the point where it is today and gain the stability. So a team within Google working on CRI put a lot of work into getting containerd usable by Kubernetes so that some of the other maintainers could just focus on the core of containerd being stable and well-designed.

ADAM GLICK: You mentioned the CRI. For the container runtime interface, how does that interact with containerd?

DEREK MCGOWAN: CRI is an interface that defines higher-level actions around a container, such as polling, creating a pod, starting a container. And the actual core of containerd is much lower-level than that. We have a bunch of resources, and we have interfaces around those resources. So we actually have a CRI plugin that implements the CRI gRPC interface and maps that down into containerd as well as managers the CNI networks and everything else that's needed to implement that interface.

ADAM GLICK: So the APIs that go along with that have stabilized, certainly, as it's become a graduated project. Have you seen that increase the proliferation?

DEREK MCGOWAN: CRI itself hasn't seen too many changes since it was first announced in 2016. So the scope of CRI-- we used that to help define what the scope of containerd was going to be. So we knew that any of the capabilities that were needed from CRI is what we would try to map into containerd to make sure that containerd had these abilities.

So CRI was first announced in 2016. And even though CRI isn't itself considered a stable API other than for specific Kubernetes releases, it hasn't really changed much. So that's really helped us from containerd perspective to focus on just having a stable release and being able to use those releases across multiple versions of Kubernetes. But if the CRI changes, we'll need corresponding changes in containerd in order to make use of those.

ADAM GLICK: Why was containerd originally made as an open source project?

DEREK MCGOWAN: I think the best answer to that is, we were coming from Docker, which was open source. So the plan was never to create something brand new and then throw it to the community, but rather to take some of the stuff we've learned from Docker and-- as a community, as a whole-- build a new open source project that focused on those lower-level stable bits that the community was really asking for.

ADAM GLICK: I know you mentioned before a little bit about how it interfaces in the stack with other technologies. How should people think about it in relation to some of those? You mentioned that things like Firecracker, gVisor sit below it, versus other technologies like Kata Containers, rkt, or CRI-O. How should people think about that ecosystem? Because I know that ecosystem has grown a lot recently.

DEREK MCGOWAN: About a year ago, when we came out with containerd 1.2, we stabilized the containerd shim API. So the shim API is a gRPC-like interface for doing low-level commands to containers. So this would be on the level of create a new container, start the container, exec a new process inside the container, get the metrics about containers-- much lower level.

So having that lower level interface allowed for some of these sandbox projects that are really focused on having a secure runtime, such as gVisor or Firecracker or Kata Containers, which usually run inside a VM. So it gave an interface for them to implement that can be easily used in containerd. Previously, they would try to look like runC.

But the challenge there is, runC is really designed to interface with the Linux kernel. So even its CLI is designed in such a way where it expects that it's talking directly to the Linux kernel. So trying to mimic some of those behaviors when you have a VM-- that's difficult. So having an API that they can implement makes that much easier to interface with the higher-level runtime.

A good example of that would just be getting metrics. So when runC gets metrics, it's just going to expect that the caller can get the metrics directly from the cgroup. But if you're a VM and you don't want to just get metrics from the cgroup of the VM process, you actually want to be able to get metrics that the VM is keeping track of. So we provide an API that makes it much easier for those VM runtimes to tell the runtime exactly what the resource utilization is.

ADAM GLICK: The CNCF had a couple of directly container-related projects, most notably rkt, which recently has been somewhat deprecated as the focus has become containerd. Can you talk about how those two projects work together within the same foundation and how those technologies work together?

DEREK MCGOWAN: Containerd and rkt were actually added at the same time. But they were added at much different phases of their lifecycle. So containerd was pretty new when it was added to CNCF. rkt had actually been around for a while.

And I think rkt has a pretty interesting history. It was announced back in late 2014 along with a new definition for containers. So rkt was a little lower level tooling than what Docker had. And a lot of people found that useful, having that lower level way to interact with containers.

So after that, that's really what prompted the creation of the open container initiative. So since rkt was defining a new type of image, basically open container initiative was getting together so that everybody was working on the same definition of what a container is and what an image is. So it was really rkt that was the catalyst for that move towards standardization of containers.

ADAM GLICK: That's pretty cool, how projects have different life cycles, but they all impact each other in terms of moving the community forward and building the best technology for everyone to use.

DEREK MCGOWAN: Yeah. And really, out of that, rkt, to me, is almost more comparable with runC. It's a little higher level than runC. But rkt was directly managing containers. And at the time, Docker's runtime was tightly coupled inside of Docker. So that ended up getting split out. So there was a library called libcontainer inside of Docker that was split out to open container initiative. And then runC was created from that.

So really, runC was created in order to solve that problem that rkt was trying to solve. And then runC was the reference implementation in a neutral place at that point. Even though it was split out from Docker's core, it was something that was community driven.

ADAM GLICK: Earlier, you mentioned Windows containers. And we had the good fortune to speak previously with Patrick Lang about his work on those. How does containerd work with Windows containers?

DEREK MCGOWAN: Today, containerd has a snapshotter for Windows. So this will handle managing the file system so when you pull a Windows container, those containers or those images will actually get unpacked into containerd snapshotter. And then we previously had a Windows runtime, as well.

But after we came out with 1.2 and we stabilized the shim API, we were actually able to make use of that API so that the Windows team has actually implemented that API in the HCS shim code base. So now, it's decoupled a little bit, even than it was before. So the first version of the Windows integration was really tightly coupled, whereas now the Windows implementation for HCS shim looks very similar to what gVisor and all of those have done.

And then a lot of the ongoing work today is a little higher up than the core. So in the CRI plug-in, getting all of that to work with the networking and everything related to Windows, as well as I know there's some work even in some of the Kubernetes SIGs in order to complete that work. Some of it is we're waiting for some of the changes to come out of the SIGs to handle issues such as multi-platform. So on Windows, you can actually run Windows containers and Linux containers, which is an assumption that's not currently made in Kubernetes, today, around having individual nodes be able to run completely different platforms.

ADAM GLICK: As you've stabilized the containerd shim API, are there any places where you've found people using it that surprised you?

DEREK MCGOWAN: I don't know if we've had anybody come to us with something we would say is surprising. We've certainly tried to do some interesting stuff with it, even trying to get Wasm-style containers working.

ADAM GLICK: Can you define Wasm containers for us?

DEREK MCGOWAN: Using web assembly.

ADAM GLICK: Mm-hmm.

DEREK MCGOWAN: So I expect to see more stuff like that coming along. Luckily, since we're a pretty open community, usually stuff doesn't surprise us. People come to us pretty early on when they're thinking about something, and we try to guide them through. It gets pretty complex down at that level. So normally, we don't have people come in and have built a whole new shim coming to us. Normally, we're interacting with people pretty early on.

ADAM GLICK: When people get involved with you, about how long does it take for them to start consuming that and really build it into the projects that they want to use it with?

DEREK MCGOWAN: I think Firecracker is probably a good example of that. They were building that around the same time that we stabilized the 1.2 shim. We knew that they were building something. We didn't know exactly what they were building until they could announce it publicly.

But I think that was probably a couple months where we knew they were building something. And then by the time we'd heard about it, they already had something that was working, which was pretty amazing to see. When you define an API and you stabilize it, it really gives the community an opportunity to go and build something and prove that it works on their own.

ADAM GLICK: That is pretty cool. What's next?

DEREK MCGOWAN: We already discussed the Windows support. A lot of that work is ongoing. We have some parts in the core which have stabilized. So for example, you're able to pull Windows images down. You can run them if you're using containerd directly. So getting that work plumbed all the way up through CRI is probably the next big feature coming out of containerd.

Some of the cool work that's going on and that I hope to see completed is some ideas around resource sharing. So serverless is one of the hot topics today. And one of the difficult parts with serverless is start up speed. So the time that it would take you to, say, commission a new node, and actually get workloads running on it is a really important metric.

And some of the ways in order to bootstrap the nodes in a way where it's ready to run containers-- just doing a normal registry pull and unpack can be slow in an environment where you want something to feel almost instant.

ADAM GLICK: Totally.

DEREK MCGOWAN: So there's a lot of work going on around trying to be able to do resource sharing among the clusters so that, say, if you have shared storage, you can utilize that. So if you bring up a new node, it can take advantage of shared storage, since most of what containers use is read-only storage, especially for serverless environments. There's not a lot of file system write operations that need to go on inside the containers.

So the containers really should be able to start up pretty quickly, because what the container actually needs in order to start up a process is not that much. But today, the process has quite a bit of bootstrapping that goes on in order to basically pull down a full image, which is usually smaller than an operating system image as possible. But even the smallest images are pretty large when you're thinking about trying to boot something up in milliseconds.

ADAM GLICK: Sure. Any sense on timeline for when those things might make it into the project?

DEREK MCGOWAN: Oh, I don't give timelines.

[LAUGHTER]

ADAM GLICK: Wise man.

DEREK MCGOWAN: Yeah. I can just say that that's being worked on. And I know that there's a few big players working on it. So I think they'll motivate the timeline. But there are some changes we want to make to the core. I want to get those into the next release. So I would say you'll see more stuff in the next couple containerd releases that really make this possible.

ADAM GLICK: What's the release cadence, generally?

DEREK MCGOWAN: Well, we've tried to do about two releases a year. Since the last release, it's actually been quite a while in terms of the major releases. So we had 1.0 in late 2017. And then we actually followed on with 1.1 maybe three months after that. And there was a big addition that we made in 1.1, as that's when we pulled in the CRI plugin as built in. Six months later, we had the 1.2 release, which stabilized the shim API.

And since then, a lot of work has been going on with Windows. We were thinking the next release would be the Windows release, but there's a lot of work that's been going on there. So the next release we have coming out is 1.3. And this has a lot of the changes that have gone in. There's no huge interface changes, but there's a lot of changes that will impact some of the callers.

ADAM GLICK: Under the covers work?

DEREK MCGOWAN: Yeah, there's a lot of under the covers work. There's a few interfaces that have made it easier for tools such as BuildKit, which uses containerd directly, to make some of those integrations easier. But what's really important to us is to make it-- containerd something easy to integrate. So a lot of the features in 1.3 are related to making containerd easier to integrate with rather than any really big features, which is what Windows support would be.

ADAM GLICK: What area would you most like help from the community on?

DEREK MCGOWAN: I think the best help we get from the community is really related to issues. So containerd is one of those projects where we've kept it pretty tight in the scope. So we don't really have new features coming in. But we do have a lot of users who are using it, whether passively or just through their platform, or they're using it directly.

So our number one goal with containerd is that it's stable. So the best way to be stable is when you have users who are using it, putting their workloads on it, and then reporting issues when they have those. So using containerd in your Kubernetes cluster helps that to happen-- helps have more workloads, which helps us have more stability.

The other place where we really like to see the community get involved is especially in our plug-in ecosystem. So we mentioned some of these lower-level run times. So there's a lot of things you can do with run times. And whether you want to build your own runtime or contribute to some of these other runtimes, some of the work that I mentioned-- doing the resource sharing-- we have plugins that are related to managing the file systems that you can help out with as well as anything related to tooling or integrating into higher level systems.

Normally, the path to contributors for containerd is, you're going to get involved with either integrating with containerd for some plugin. And then a subset of those contributors may be interested in looking at the core of containerd. But luckily, containerd is designed in such a way where as the number of users grow, the core should stay roughly the same size. So we don't have to, for example, scale out the number of core developers in order to match growth within the community, but all the ecosystem around it.

ADAM GLICK: Awesome design for scalability.

DEREK MCGOWAN: Yeah, that definitely helps.

ADAM GLICK: If people want to help with plugins, for instance, where should they go to get started?

DEREK MCGOWAN: We have documentation related to each of our individual interfaces. So if you want to build a snapshotter, for example, you can look at our snapshotter interface as well as, we have examples for doing proxy plugins. So you can basically create your own plug-in for containerd without even restarting containerd. For proxy plugins, you just configure containerd to use your plugin.

And for runtimes, it's actually even easier, where you just specify which runtime you want to use as the name of the runtime when you're creating a container. And it will call out to your runtime, since the shims are actually separate binaries that they run, and then they expose an interface that containerd will use. So containerd doesn't need to know about those ahead of time, even before it starts up. So you can look at those interfaces. It's a good place to get started for building plugins.

ADAM GLICK: That's pretty cool. Really appreciate you coming on the show today, Derek.

DEREK MCGOWAN: Yeah, Thanks for having me.

ADAM GLICK: You can find Derek McGowan on Twitter @DerekMcGowan, and you can find the containerd project at @containerd.

[MUSIC PLAYING]

Thanks for listening. As always, if you've enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @KubernetesPod, or reach us by email at kubernetespodcast@google.com.

CRAIG BOX: You can also check out our website at kubernetespodcast.com, where you will find transcripts and show notes. Until next time, take care.

ADAM GLICK: Catch you next week.

[MUSIC PLAYING]

View More Episodes

containerd, with Derek McGowan

Chatter of the week

News of the week

Links from the interview

Transcript