Kubernetes Podcast from Google: Episode 159 - Talos, with Andrew Rynhard

#159 August 19, 2021

Talos, with Andrew Rynhard

Hosts: Craig Box, Jimmy Moore

Kubernetes lets us manage our infrastructure declaratively, so why do we still manage the underlying OS with a myriad of different text files? And why allow shell and SSH access to a machine that should be immutable? So asked Andrew Rynhard before creating Talos, a Linux distribution built for Kubernetes. He’s now CTO of Talos Systems, a company founded to take it to market.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

40 years of the IBM PC
DONKEY.BAS
Commodore 64
- Wheel of Fortune
- Little Computer People
- C64 vs IBM advertising
- 6502 and derivative CPUs: the C64 used a 6510
- Bender

News of the week

Transcript

Show full transcript

CRAIG BOX: Hi, and welcome to the Kubernetes Podcast from Google. I'm Craig Box, with my very special guest host Jimmy Moore.

[THEME MUSIC]

CRIAG BOX: Last week, we celebrated 40 years since the original launch of the IBM PC, Model 5150. If anyone hasn't had the pleasure-- I'm not 100% sure I've touched one myself-- there is a link to a great emulator, which you can find in the show notes, along with the standard text-based DOS software and so on, there is a game that I encourage everyone to have a go at playing. It's called "Donkey."

Jimmy, you had a go at it there. How would you describe this game?

JIMMY MOORE: It's kind of like a one-player back and forth, like Pong, except nobody else is throwing the ball back. You've just got to go left and right. Actually, it's one button. It kind of feels very much like the iPhone, one button.

CRAIG BOX: Yeah. Yeah, 40 years later nothing's really changed.

JIMMY MOORE: [LAUGHS] Nothing's really changed, you know. I mean, it's amazing, 40 years. I'm just a little over 40, so I can relate to this PC. I imagine it's now waking up with mysterious aches and pains being over 40.

CRAIG BOX: It amused me to find out that someone's actually ported "Donkey" to the iPhone and the Apple Watch. So if you do want to play a modern recreation, that's available to you. Did you know who wrote that game?

JIMMY MOORE: I do not. Who?

CRAIG BOX: It was written by the one and only Bill Gates.

JIMMY MOORE: You're kidding. It turns out he really did invent computers.

CRAIG BOX: At least in this case, yes. There was not a lot you could do with the original IBM PC. It had BASIC built-in, and it had a cassette tape player that you could plug in. Not many people would have used that, I wouldn't think.

JIMMY MOORE: No, I saw that. And I thought it was a music player, actually. I didn't realize it was a disk drive.

CRAIG BOX: You grew up with a Commodore. You would have had a little bit of experience with this kind of software.

JIMMY MOORE: Yeah, I had a Commodore 64. Grandma got us all a Commodore 64 because the computer age, you know, 1985 or whatever it was. We had those big floppy disks-- what I actually consider floppy disks, because they were actually floppy.

And I played all sorts of games. Well, just two. I played "Wheel of Fortune" on my Commodore 64, and then this game that I called "Terrence." It was this man in the house, kind of like "The Sims." And We did a little research and found out the actual name of it.

CRAIG BOX: Yes, it was called "Little Computer People."

JIMMY MOORE: That's right. It was a game you couldn't win or lose, which kind of was great for me, because I was not very competitive, and I hate losing. So it was great. I just kind of fed this fish and helped Terrence around the house.

CRAIG BOX: If only you had had "Donkey" at that time. You can never lose that.

JIMMY MOORE: No. Perfect. I would have loved that.

CRAIG BOX: Now, you were voted, of course, Most Likely to be a Game Show Host. Is that in part because of your love of "Wheel of Fortune"?

JIMMY MOORE: Oh, yeah. In fact, I always did the voice. And I announced the, "nope, there's no E. Try again." I always said that, even to myself. Because there was no voices coming out of the Commodore 64, right? It was just, like, probably seven tones that were coming out of the speakers.

CRAIG BOX: For its time, the Commodore 64 had very advanced sound. It had an analog synthesizer, which can actually do-- I think it's three voices of notes and one voice of noise, a square wave sound. So by comparison to the IBM PC, which came out, I want to see-- if the PC came out in '81, I think the Commodore 64 came out '82, '83, and it was substantially more advanced. And their advertising at the time emphasizes how much better it is, and how much cheaper it was also, especially when it was kitted out with all the bits and pieces.

I could bore you for hours about how it's a shame that Commodore threw itself down the drain, and how everything we have today is effectively based on that 5150 PC from 40 years ago.

JIMMY MOORE: Yeah, I didn't realize how substantially different it was. What was Commodore actually built on, platform-wise? Is it anything we would recognize today?

CRAIG BOX: It was built on the 6502 CPU, which was used in things like the Nintendo and was also used in Bender from "Futurama." A little bit of an Easter egg that the show creators threw in.

JIMMY MOORE: That's awesome. Yeah, it's nice to know the origin story.

CRAIG BOX: All right. Well, let's continue with the origin story of the Kubernetes ecosystem, and let's get to the news.

[MUSIC PLAYING]

CRAIG BOX: Chaos Engineering platform, LitmusChaos, has released version 2.0. The new release brings a centralized management approach for managing Chaos across environments, both Kubernetes clusters and cloud instances, and the ability to define workflows to stitch together multiple experiments as part of a complex scenario. You can learn a little about the history of Litmus in episode 56.

JIMMY MOORE: The CNCF has released the results of a sponsored security audit of SPIRE, the runtime for the SPIFFE Identity Project. SPIRE is a toolchain of APIs for establishing trust between software systems and can issue SPIFFE IDs and verification documents to them. You may then go on to use those documents to establish trust between two endpoints.

No severe or critical flaws were found, with vendor Cure53 judging the project to be "quite mature". SPIRE was discussed in episode 45.

CRAIG BOX: Bovine is a new utility for running single-node Rancher clusters. For new users, it aims to be a stepping stone to experimenting with Rancher and Kubernetes. For advanced users, it lets you test new versions of the platform. Bovine is written in Rust by Rancher's Nick Gerace, who has also recently launched a website with resources for cloud-native Rust.

JIMMY MOORE: Google's Monitoring Suite has uptime checks for URLs, VMs, or App Engine instances, and has now added support for GKE load balancer services. As the underlying pods for a service change, the uptime check changes with it, allowing you to quickly correlate a service with an uptime failure. You can also set an alert policy on your uptime check and jump straight to the associated GKE dashboard with Notify.

CRAIG BOX: Finally, the fall '21 term is open for LFX, offering full-time three-month internships to developers interested in getting involved in open source. 12 CNCF projects have offered up 30 ideas of fun things to work on, including projects in both chaos engineering and Rust. Applications close August 22, so get in quick or put a reminder in your calendar for the next term.

JIMMY MOORE: And that's the news.

[MUSIC PLAYING]

CRAIG BOX: Andrew Rynhard is the co-founder and CTO of Talos Systems who are building an operating system for distributed systems. Welcome to the show, Andrew.

ANDREW RYNHARD: Hello. Thank you for having me.

CRAIG BOX: People say that Kubernetes is the operating system of the cloud. Is that trite? What even is an operating system?

ANDREW RYNHARD: I have heard this before. I think it's fair. I mean, if we're being really pedantic about things, I wouldn't call it an operating system. But it is a kernel of sorts, right? It's a kernel for distributed systems. That's the way that I personally see it.

And I think it's true. It's really standardized how we look at a cluster and gives us all of the primitives and constructs for us to get our application onto this kernel. Really, that's what Talos is about. It's about getting rid of the operating system and giving you a focus on Kubernetes.

CRAIG BOX: When did Docker become something you were aware of and started playing around with?

ANDREW RYNHARD: This was probably when Docker was still beta. I was very picky about how I would provision my servers. And at the time, I wasn't provisioning for anything professional. I was just into Linux.

I felt like any time I SSH-ed onto a box, it was tainted. I did something wrong if I had to SSH onto it and fix something. And so when containers came about, when Docker was beta back in, maybe, 2013 or 2014, I just became in love with it, because this now meant I don't actually really have to touch the file system of the operating system.

I could just get everything onto containers, and I can iterate on deploying the application without actually messing up the host itself. So I really became fascinated with this idea and just fell in love with containers instantly.

CRAIG BOX: Where did that mentality come from, though, if you think about what you were doing on your desktop at the time? There was definitely no immutability there.

ANDREW RYNHARD: It wasn't my desktop, necessarily. I was still learning Linux, and, you know, I was just deploying things like applications I would write. Because I was also learning Golang at the time. And so I'd write fun little applications and learn how to deploy them.

I actually didn't go to school for computer science. I went to school for physics. But my love and passion was actually for computer science. And so I constantly found myself coding and figuring out how to deploy Linux in the most optimal way instead of doing my homework.

CRAIG BOX: When did you discover Linux then? Was that something that was necessary as part of your education? Or is that just something that came through from culture at the time?

ANDREW RYNHARD: I didn't really grow up with computers. We never really had one. What I did have, though, was an Android phone. And I found out you can root it. And you can make the CPU faster. And I just loved it.

So I burned a lot of phones. They all ran very, very hot. But I wrote little Bash scripts, compiled Android, compiled the Linux kernel. And that's how I learned Linux, really, is just getting into a rooted Android phone and figuring out, what's a kernel? What is Linux? And just digging into that and learning everything that I possibly can on my own about it.

CRAIG BOX: That sounds like a good background for someone who would eventually go on to build their own Linux distribution. What was it that inspired you to take on that challenge?

ANDREW RYNHARD: So what really inspired me was just this curiosity. I wanted to understand what makes Debian different from Ubuntu, or CentOS, or Red Hat? And I was just fascinated with this.

And so I started going down the rabbit hole of just figuring out, really, what is Linux? And I found out that it's actually really simple. If you look at a project called Linux from Scratch, it actually takes you through the process of building a Linux distribution completely from scratch. You start with the host operating system, you bootstrap everything that you need to even build all your tooling to build your operating system.

So you combine that, really, with this experience in Kubernetes. I had this passion for Linux and just trying to figure out how Linux distributions are really put together. And then in the real world-- fast forward some years after playing around with containers and whatnot-- I had a job where my job was to maintain Kubernetes clusters.

And I found that maintaining the operating system and Kubernetes was really two different jobs. But they had very similar crossover-- user management, securing it, hardening it, so on and so forth. And I really just wanted to get rid of the operating system.

And so I put those two things together. And that's where Talos was kind of born.

CRAIG BOX: Talos wasn't the first name of the project. What was it called to start with?

ANDREW RYNHARD: I really hate this name. But I'll say it. It means "distributed" in Greek, and it's Dianemo. I'm not even sure I'm pronouncing it properly. And it did not roll off the tongue. And so very quickly I realized I'm going to have to rename this project.

CRAIG BOX: It's not a very dia-nemic name.

ANDREW RYNHARD: [LAUGHS] No, it's not. And so I very quickly realized that this is not going to catch on if it has such a horrendous name. And so I did some research and found Talos. It's a robot that you got the whole Greek origin. It was a robot that would circle an island three times a day and protect it from pirates or invaders.

And so there's that security aspect as well. And there's the whole nautical theme. And Talos was also a robot that would go throughout the lands and enforce laws.

And if you look at how we've designed Talos, it's very much like Kubernetes. It's controller-based. And so it goes out, and it enforces what you want on the host itself. It's got its own little controllers that run right there on the host process. There's really a lot of parallels with Talos the robot and how we've designed the operating system.

CRAIG BOX: Was the goal of the project to solve the problem that you had with the fact you were trying to administer both Kubernetes and Linux at the same time? Or was this just a passion project?

ANDREW RYNHARD: It was a lot of both. I was always looking for a reason to really dig into this and really build my own Linux distribution. Yes. I found maintaining and managing Kubernetes clusters and having to worry about both, it just had so much overhead. I wanted to get rid of it.

And so that was my goal-- get rid of the operating system and allow me to focus on just Kubernetes. And so that's where things like take out Bash and SSH-- and the motivation there really was, don't let a human be the cause of error. Again, the file system is completely read-only.

And so if we just eliminate the element of humans, we're probably going to be in a better situation to maintain this going forward. And so really, it was just about getting humans off of the boxes and making individual machines snowflakes, because that's what I had seen.

CRAIG BOX: You know, Andrew, that's what a robot would say.

ANDREW RYNHARD: That's exactly what a robot would say. [LAUGHS]

CRAIG BOX: I'm beginning to worry about you.

ANDREW RYNHARD: Yeah, that's where a lot of these features came from, just trying to get rid of the operating system, really.

CRAIG BOX: You're looking obviously at Kubernetes and how it is a declarative system. It's powered by controllers. You define what you want, you send it off to an API server.

How do you apply that same model to one or more hosts running this operating system?

ANDREW RYNHARD: In the design of Talos, we're really not trying to depend on etcd as much as possible. That kind of makes things a little bit limited. And really, what it comes down to is Talos is driven by its configuration file that you can apply to each machine.

Under the hood, this machine configuration file, as we call it, is kind of torn apart, and different pieces of it are handled by different controllers. For example, the whole networking stack is broken up into multiple controllers that one controls hostname, another controls routes and setting up the addresses on interfaces and stuff like this. And this is all driven from the configuration file.

And so Talos itself, how you look at it, it's kind of a basis on which you can apply even more complex orchestration on top of. And so we're building tooling on top of this.

You have an operating system that has an API. All of a sudden, this opens up all kinds of doors and possibilities for the next layer of tooling on top of it. In the same way that we had Ansible, which worked over SSH and all of this, we need tooling that can work with API-driven operating systems. And so applying configuration files could be something-- this is things that we're working on, where you actually submit this to a system, and it rolls it out for you in a controlled manner so that things don't just blow up on you.

CRAIG BOX: How do we deal, then, with the bootstrapping of the system, where some of the things that you're talking about are setting hostnames and network configuration, and in order to get an API server up that you're able to then send that machine configuration to, you have to do those things, at least in some fashion. How do we solve that chicken and egg problem?

ANDREW RYNHARD: We have sort of a two-pronged approach to this. In the very early boot stages, we try to do DHCP, and just get what we can, get enough of a networking stack up and running so that we can pull a configuration file from the cloud provider or from somewhere hosted within your data center.

If that's not possible, you can supply that via kernel arcs as well. You can say, set up this interface for me in such and such a way. And Talos us will honor that and get that up and running and pull its configuration file.

This is really just the first time Talos boots. What we then do is we just honor the configuration file going forward from there. And so DHCP could play a very, very small part in your bootstrapping process, but it doesn't necessarily need to.

CRAIG BOX: The project started as a Kubernetes operating system. Is it useful for people who are not running Kubernetes? Or are the two projects tightly coupled?

ANDREW RYNHARD: Right now, they are tightly coupled. We set out in the beginning to solve this problem of Kubernetes. It is the de facto distributed system or orchestration tool right now, whatever you want to call it. And so we're solving that problem right now. But we are in the process of breaking apart Talos into two distinct pieces, one of them being called COSI. It's called the Common Operating System Interface. In the same way that we have CNI and CRI and CSI, we want to provide an API to the operating system.

And so we're breaking apart everything into this controller-based operating system. And then Talos could just become pure upstream COSI plus a Kubernetes plugin. And then substitute the Kubernetes plugin for whatever distributed system or tooling or orchestration tooling that you want, Nomad from HashiCorp, for example, and write your own plugin. It interacts over the COSI APIs, and now it is a Nomad-specific operating system. But currently today, as it stands, we are very tightly coupled with Kubernetes.

CRAIG BOX: All of the other deployment systems of Kubernetes basically boil down to, SSH into a machine, or have the machine pull some kind of program which starts the process of building Kubernetes. Talos, of course, as an immutable system, with this machine configuration it's going to have to do something different. How do I go from zero to Kubernetes?

ANDREW RYNHARD: There's really two approaches to this. In places like the cloud, we have a metadata endpoint from which we can pull the configuration file. And when you run Talos in the cloud, that's how it operates. You just basically set the user data, and that drives the state of the machine.

We used to have the notion of an init node. It's something that we're moving away from. And really, it is just some setting within the configuration file which says, you are a machine that's responsible for bootstrapping etcd. We're kind of going away from that. We also have an API. It's called talosctl bootstrap. And it is responsible for initiating the bootstrapping of etcd.

The second approach is, instead of it pulling it from the cloud provider or from some HTTP endpoint, Talos can actually spin up in what's called Maintenance mode. It realizes that there's no current installation of Talos, and I have no configuration file. It's going to sit there and wait for its configuration file to be pushed into it. And so that's another way that you could bootstrap Talos.

CRAIG BOX: So if I push a configuration file to a machine that says, I would now like you to be node number one of the Kubernetes cluster, what's it going to do with that?

ANDREW RYNHARD: If it is a control plane node, it's going to start up etcd. And it's going to try to discover any other peers via the Kubernetes API. We actually pull the Kubernetes service endpoints, which end up being the node IPs anyways.

And we pull those. And so we can discover the other peers that need to be part of the etcd cluster. We bootstrap etcd. We pull all the pure upstream Kubernetes components, and we start and run them. Very early on in the process, the API is up and running. And you can get all kinds of information if things don't run, for example static pods start messing up, or whatever.

CRAIG BOX: And all these things are running as containers?

ANDREW RYNHARD: Yes. All of it runs as containers except for Talos. There are a few components to Talos that actually run in containers. But we're moving more towards a single binary to preserve resources. But yes, we run all of the upstream Kubernetes containers.

Even the kubelet itself, which is not upstream, we maintain it. But we do pull pure upstream kubelet, and we run that in a container as well.

CRAIG BOX: So how small does that mean you can make the core operating system?

ANDREW RYNHARD: The core operating system sits around 50 megabytes. It's a Squashfs. And everything else is pulled.

CRAIG BOX: Another system that people talk about when they talk about small Kubernetes is k3s. They have a different approach, which is basically compiling all the things that run on top of the operating system into a single binary, and then you can run that anywhere. They do have, as a sort of a side project, a k3OS.

But if you think about the k3s model, that single binary versus how things are done in Talos, can you sort of contrast those two ideas for me?

ANDREW RYNHARD: I'd probably approach it from looking at it from a k3OS point of view. They don't use systemd. They use some other init system. But you still have SSH. You still have Bash. You still have all of these traditional tools that you would use. They've slimmed down Kubernetes.

We've gone the opposite. We've slimmed down the OS and give you pure upstream Kubernetes. And we actually have a blog on this in-- we did some comparisons in resource usage.

The differences were practically negligible in resource usage-- CPU and memory. I think the place where the case may be made more for k3s is that it is smaller. It does bootstrap a little bit quicker. But Talos can still bootstrap a full-blown vanilla Kubernetes cluster within a couple of minutes. That's pretty fast, faster than most cloud providers.

Yeah, we've really taken the approach of slimming down the operating system and delivering unadulterated, pure, vanilla upstream Kubernetes.

CRAIG BOX: You've said a few times that there's no shell, and there's no SSH. How do I debug a system like that?

ANDREW RYNHARD: I asked the same question. When I created an immutable operating system that was just enough to run Kubernetes, I remember very vividly just having the kubelet running, and I had no idea why it was failing to run. Because it was literally just a very small PID 1 running the kubelet.

And so I said, well, I'm kind of in a situation here. I don't want humans to hop onto it. It's immutable. So really, if they could, what would they do? And I also want to keep this super, super small. So I don't want to add all these utilities.

What can I possibly do? And so that was when I decided to put an API in front of the system. Really, at the end of the day, we just need this information to be able to figure out what's going on.

I remember the first time we went to KubeCon, I was kind of nervous about telling people that we don't have Bash and SSH, because they were going to get their pitchforks and drive us out of the hall. But actually, people were very happy about this idea.

And for the people that weren't and kind of had trouble accepting this, I just asked them, you know, what are you trying to get when you SSH on the machine? Well, I need to get routes. I need to be able to look at traffic. This and this and that.

Well, if it's over an API, does it really matter whether it's over SSH or whatnot? Oh, I guess not. And so they kind of settled in and said, yeah, I could live with that. And so that's where the API was born.

CRAIG BOX: One of the things that people might need access is to debug the debugger, effectively. You've said there that you've got your kubelet running, and what do you do in that case?

Then, effectively you'd need to have something that has access to the kernel. And the modern answer for that, of course, is eBPF. Is there a method to get straight to the kernel with your API to be able to debug when things aren't working at a very low level?

ANDREW RYNHARD: Not currently. Not at the kernel level. The best that we could really do is something like dmesg, getting the kernel logs. And that actually goes pretty far.

But we are working on finding out a use case for eBPF. And that is part of the COSI project itself as well.

CRAIG BOX: You could effectively look at the serial port output from the machine. You can have your dmesg output pipe to that.

ANDREW RYNHARD: Right. Exactly. We even have an API for grabbing the kernel logs.

CRAIG BOX: Kubernetes itself doesn't have a concept of users. It sort of inherits them from an external system. Would it make sense to synchronize these two and have users on the Linux environment relate to users in the Kubernetes environment?

ANDREW RYNHARD: I have never really thought of that, if I'm being honest. And my initial reaction to this is no. Because I am very much of the mind state that we should not be concerned with the nodes anymore. We really shouldn't.

They are simply more compute, more resources, to a much bigger computer that is spread across multiple physical machines. And that is the angle that we should take when we think of these distributed systems, I think. The more that we have to worry ourselves--

CRAIG BOX: When you think about the fact that that has become the case, Kubernetes had to inherit a lot of things with the idea that it would run on top of more traditional Unix-like systems. Are the changes that you think will come in Kubernetes as the operating system diminishes and becomes more minimal?

ANDREW RYNHARD: Absolutely. Just to give you an example, the kubelet itself actually shells out or forks out to actual Unix utilities, like find in some cases, NFS, stuff like this. And so it does make some assumptions that it has access to Linux distributions.

Even in some cases, there are container storage interfaces which actually nsenter into PID 1 and actually try to install packages. They look to see if they can use Yum, APT, whatever. And they actually try to install utilities that are required by that container storage interface.

And I think this whole way of thinking will go away as we get these types of operating systems such as Talos.

CRAIG BOX: Is there anything saying that a Linux system needs to have things like users and-- we're doing these things because of this tradition and going back all the way to the foundations of Unix back in the '60s. How do we react to this from the ground up to provide only what it needs, instead of retrofitting one thing onto the other?

Are we just putting APIs on top of this layer cake of things? Or is there a point where we should be re-architecting everything?

ANDREW RYNHARD: Yes. We should be re-architecting everything. And that is exactly what we're doing with Talos. Talos actually doesn't run systemd. We run an init system that we wrote called machined. And it actually has the API.

So we've re-imagined everything from the ground up. There is very little traditional user space within Talos. I think the only thing that we really run that's traditional is udev.

We tried writing udev in Go, but that became a mess very, very quick. It was fun. Learned a lot. But not something we want to do. We run udev, and I think that that's about the only user space that is from the traditional operating system-- or Linux distribution.

CRAIG BOX: Are there parts of the kernel space that you would change if you could?

ANDREW RYNHARD: Yes. I think /proc. It's a minefield. There's so many things that you can get from it and do with it.

Imagine being able to have RBAC-based API for specific things under proc-- not just necessarily mounting up proc and being root user, but you actually have different things under proc which are exposed as an API, and you can now have RBAC. In the same way that we have the verbs in Kubernetes, you can list them, get them, modify them. I think that would be a really powerful system.

But other than that, I'm really excited about some of the work with eBPF. We're trying to find ways to work that in. We're actually using it in a more unique way. We're actually writing eBPF programs to get hooks within the kernel, events like a disk was discovered.

So I plug in a USB stick, I can actually publish this as an event within the COSI runtime. And then now I can have a controller which watches for this disk to be mounted. And maybe it matches some rule that says, when a disk of this type is mounted, format it and mount it up in this place. Or maybe when an ACPI event happens, or someone pushes a power button, or the whatever. We can actually publish this as events in the same way that we have events in Kubernetes.

Look at Kubernetes. The thing that really makes it powerful is that you're able to write these really powerful controllers that watch for very specific things, and they can react to a live, moving system. We're trying to bring that down to the kernel. And so eBPF is really exciting to us because, again, we can get hooks into the kernel and publish what is happening in the kernel as events. And you can now have controllers that can react off of these things instead of probing the system.

CRAIG BOX: Could I do it with admission control on USB sticks, for example?

ANDREW RYNHARD: Yeah, that could be it. Maybe you can have something that says, nope, I don't want to allow any USB sticks. I'm going to automatically unmount it, or something like that.

CRAIG BOX: You were working on Talos back in 2017, 2018. Talos Systems, the company, was announced in June 2019. Talk me through how that came about.

ANDREW RYNHARD: I remember. It was a Thursday evening. I was probably a couple of years into developing Talos. I wouldn't say I was tired. I was just ready to see if the project could be successful or not.

And so I said, you know what, I'm going to share this with the world. And so it was late evening on a Thursday night. And I said, if this doesn't go, I'm going to wake up, and I'm going to say, OK, I'm done.

Because this was a lot of work. I learned a lot about Linux, yes, great. I'll be a better operations engineer because of it.

CRAIG BOX: It's just a hobby. It won't be big and professional like Linux.

ANDREW RYNHARD: [LAUGHS] Nothing. I remember posting it to Reddit. I basically gave people a little bit of background on why I did it. Went to bed. I woke up.

And all of a sudden, the post was on the front page of Hacker News. I have people writing to me, emailing me, asking me. It got 500 stars overnight on GitHub.

This was like every open source developer's dream to wake up, and you have all of a sudden 500 stars. Wow, that's the clout that we have, right?

CRAIG BOX: People are paying attention.

ANDREW RYNHARD: Yes, exactly. And so I felt the excitement around the project. Our CEO Steve Francis, he put me in touch with a guy named Said Ziouani. And Said actually helped found Ansible, which eventually got bought by Red Hat. And Said put me in touch with Tim Gerla.

We all got together, got a small amount of money put together for seed funding. And that's where the project was founded. We thought at the time, you know, CoreOS was on its way out. People weren't very happy with what happened with that project. And so we saw an opportunity. We raised $1 million to start with and started the company.

CRAIG BOX: How many employees did the company have at the time? Was it just keeping you afloat, or were you able to bring people on at that point?

ANDREW RYNHARD: At the time, it was just two engineers and our CEO.

CRAIG BOX: And you mentioned there both Tim Gerla and Steve Francis. Tim was the founding CEO, and Steve the CEO now. You also worked for him at LogicMonitor in the past, I understand.

ANDREW RYNHARD: That's correct. Steve is actually-- he's the one who has helped me really get to where I'm at. He's given me opportunities. We met each other on the jiu jitsu mats. We do Brazilian jiu jitsu together, and we would beat each other up.

CRAIG BOX: Mm-hmm?

ANDREW RYNHARD: We got to talking. I remember at the time, I was very, very broke and living in Santa Barbara, and really struggling. And I told him, you know, I don't have any education in this, or at least official education. I'm just completely self-driven.

And he's like, well, let me get you in touch with my operations team. I got a bunch of interviews, ended up doing well, and got the job. And that's where I started Kubernetes.

CRAIG BOX: You must have made a definite impression on him, because he followed you to this next venture.

ANDREW RYNHARD: Yes, I like to think that I did.

CRAIG BOX: Hopefully not through breaking any bones.

ANDREW RYNHARD: No. Maybe us choking each other out every now and then. But that's about it.

CRAIG BOX: Is the jiu jitsu something you've been able to keep up?

ANDREW RYNHARD: Oh, yeah, absolutely. That is my therapy. That is what helps me in my leadership skills, I very strongly believe. That is, you become a better martial artist, you become a better leader. I'm not worried about confrontation. I'm not worried about asserting dominance. Things like this, I think martial arts is a really great teacher. And it's very useful for leaders.

One of the things I've learned from jujitsu is to just kind of leave your ego at the door. You step through those doors, and you're rolling with every type of personality that you can think of, from CEOs to gangsters. And everyone is equal on those mats. Everyone is equal. The math nerd could be choking you out. And you could turn around and choke out a gangster.

It's the great equalizer. And so you really have to leave your ego at the door. That's something that I try to bring into my leadership.

CRAIG BOX: You obviously got the background as a technical leader building the project out. How much did you want to be involved in the company leadership?

ANDREW RYNHARD: Not at all, actually. I'm very much a person who likes to work behind the scenes. And I was very shy growing up. I can't stand talking in front of big audiences and this and that. I just want to be left alone to do my jujitsu and work on my technology.

I'm actually starting to really like the role that I'm learning. It's challenging me to become a different person, a better person than I was before I started this-- much more outgoing, able to talk to people better. And so just really picking up those soft skills, which, I think, translates into all kinds of things in life. I'm really thankful that I've gotten the opportunity to learn. And now I'd say I'm pretty excited about doing it.

CRAIG BOX: You work in a fully remote team. That will have come in quite handy over the course of the last couple of years.

ANDREW RYNHARD: Yeah. We actually made the decision to be fully remote even before COVID. And it just happened to work out very well for us. We saw, especially at Steve's previous company, LogicMonitor-- we're in Santa Barbara, and so finding talent was a little bit difficult. Santa Barbara is a little bit smaller.

And so it just made sense. There's so much talent out there in the world, and we didn't want to be constrained to the city that we live in. Our team has been fully remote even before all of this pandemic.

CRAIG BOX: Having built this out as an open-source project, having got some attention on the internet beforehand, how did you find the initial team of engineers?

ANDREW RYNHARD: So the initial team was me and a buddy that were working on Kubernetes at our job together. And I just said, hey, do you want to start this company with me? And he said yeah, after a little bit of convincing.

CRAIG BOX: You didn't have to choke him out or anything?

ANDREW RYNHARD: No. [LAUGHS] Just me and Steve. That's how me and Steve hash out our problems. But everyone else, yeah, they don't do jiu-jitsu. So we don't subject them to that.

CRAIG BOX: You've got a good technical foundation for the system here. What was the business case that you made to people at that early stage?

ANDREW RYNHARD: I don't think that I personally had a really good business case. I think the technology really speaks for itself a lot. And that is something that we're still developing, if I'm being honest.

The business case, though, as we see it today, really is getting back to the original problems I was trying to solve-- less maintenance, less overhead to maintain Kubernetes clusters, being able to run the cluster consistently no matter where you run. Talos runs on GCP, AWS, Digital Ocean, Azure, on prem, Raspberry Pis. And it's the same exact image. It does not get modified in any way, other than some maybe kernel args that tell it that it's running in AWS or GCP.

And so you get a very, very strong consistency story. And so you can manage Kubernetes clusters the same way, regardless of where you want to run them. And you don't have as much operational overhead. And so that's the business case as we see it.

CRAIG BOX: A lot of people consume Kubernetes by going to a cloud vendor-- GCP, different cloud-- picking their automatic Kubernetes deployment and management service. And then they kind of get whatever operating system comes with it. Each of those vendors has their own operating system of choice. And not all of them allow changing the operating system underneath. How are you going to get to those people without going through those services?

ANDREW RYNHARD: I think it goes back to consistency. There's actually a lot of data out there that suggests that people are running Kubernetes not just in one place. They're actually running it on prem, they're running it in the cloud, and in some cases in multiple clouds.

Having different tooling to set that all up, having different operating systems under the hood means that there's a lot of movement. There's a lot of room for error. There's a lot of variation.

Going back to even just the idea about Talos being immutable, you need consistency across your clusters, and especially what your cluster is sitting on top of. And so when you have this consistency, I think that it just makes a better story, all in all, for everyone involved-- security engineers, operations engineers. It's a simpler mental model for everybody.

And so I think for the people that are running Kubernetes in multiple places, which is more often the case than not, you're going to want consistency. And so that's where Talos comes into play.

CRAIG BOX: Some vendors distribute a node image that goes along with their Kubernetes environment. Others launch with a bring-your-own-node system. It's possible to think of the bring-your-own-node as being a quicker time-to-market decision. But do you think users should have that level of control? Do you think the node should be something that is provided separately from the control planes that run the system?

ANDREW RYNHARD: I think everyone's going to always want that level of control. They like to have that option. But again, going back to our personal philosophy at Talos is, we should stop worrying about the node.

Again, don't worry about the node. We're worried about the cluster, and the behavior of the cluster, and the security of the cluster, and how it's configured. We shouldn't care about the nodes anymore. They should be simply looked at as more compute into a larger system.

And so when you come from that angle, I shouldn't really have to care about the operating system. But again, there are still container storage interfaces that are worried about the operating system. And so until we get away from this, until the tooling actually stops making assumptions that we're running on these more traditional-style Linux distributions, we're going to have to have those options around. Because, again, different container storage interfaces may not be approved on some bespoke operating systems.

CRAIG BOX: Do you think, for vendors who are still in the mindset of saying this particular driver is certified against these enterprise kernel versions or operating systems, do you think that's something that you'll have to go and do through partnership, or do you think the model will naturally resolve itself over time or through some of these interfaces?

ANDREW RYNHARD: I don't see that changing from the traditional way anytime soon. These big players like Dell and HP and all of that, I think they're still going to want these, this is an operating system that's certified with our systems. I don't see that going away anytime soon at all.

In fact, we have seen people ask if we're certified, because they're afraid to run us because it would void any kind of warranty from their vendor. So I think that's still a problem. And that's probably going to be around for a while.

CRAIG BOX: The first principle listed on the Talos website is that Talos is intended to be operated in a distributed manner. Is there ever a use case for running it on a single machine?

ANDREW RYNHARD: Yes. So in the beginning stages, Talos was very much designed-- our whole thing was, we want to give you production quality HA Kubernetes clusters. And very quickly, when people start seeing, wow, it's only 50 megabytes? I could fit this thing on a Raspberry Pi easily. I love this idea.

So they wanted to run it on these smaller machines, and they wanted to run it in single-node configurations. And so we put a lot of work in making it capable of running in single-node configuration. So that perfectly well works today.

CRAIG BOX: Back at KubeCon, you announced the COSI project, which you talked a little bit about. Can you tell me about COSI and why you wanted to standardize this interface?

ANDREW RYNHARD: Really, we're trying to ask ourselves, how can we make API-driven Linux the de facto standard? That always comes down to having something that the industry agrees on, some specification. I remember when containers were just a Docker thing. And then OCI came along, and now all of a sudden we're using OCI for all kinds of things.

Our motivation with COSI is to really standardize how we look at an API-driven Linux distribution, so that we don't have what we have today, where we have Yum, we have DNF, we have Pacman, we have APT. We just want to give people one consistent way of doing an operating system, not have this sprawl that we have currently.

And so if we can agree on a core set of APIs, then sure, your particular flavor of this API-driven Linux distribution may have, maybe, an extra set of APIs that may be specific to you. But the core of it should be common enough. We're all going to need the ability to list files or get how much disk space we have left. All of these things are going to be common.

And if we have an API that's consistent, then that just means more interoperability. I can use this with that and that with this. And I think that's just better for our industry, really. We're not trying to make Talos its own proprietary thing. Everything that we do is 100% open source. And so we're very much working the spirit of, how do we just do better for our industry?

We're developing this core set of APIs that we're hoping other people who want to develop systems such as Talos can agree with us on.

CRAIG BOX: Could you imagine that being a binary that just gets bundled in with the kernel on the base layer by someone like Red Hat as part of their operating system?

ANDREW RYNHARD: Yeah. And that actually was another motivation. What if we can have the APIs that we currently have with Talos and all of the things that we love about it, but we can actually just throw that onto traditional operating systems and all of a sudden you're good? What if you could take upstream Ubuntu, put it on a Squashfs, throw in COSI binaries, and now you have an API in front of it?

That could be really great for us in particular, too. There's people that want to run Ubuntu because they want to run Ubuntu, but they love the idea of an API-driven operating system. And so COSI could enable that in theory. Just run it as a container.

If you really want to get it like Talos, throw away SSH keys and uninstall Bash, if you really want to make it as close to that as possible. We go the extra mile. But in theory, you can get close to it.

CRAIG BOX: Finally, you were talking to me today on a quite intense audio setup. Do you want to plug your SoundCloud?

ANDREW RYNHARD: [LAUGHS] I don't have a SoundCloud. I'm not sure I want to get into doing podcasts. But again, being that person who doesn't love the public eye-- but I'm learning to. It's really challenging for me to get into that mindset.

So I'm at least prepared for it. I bought myself a nice microphone. I've been into music ever since I was young. I'm an audiophile. I love high quality sound.

And so yeah, I got myself a Blue Baby Bottle. I'm running this all through Logic Pro right now. And it's all mixed from my voice, equalizers, compressors, noise gates, all these different things. I'm kind of a nerd when it comes to audio.

CRAIG BOX: Do you have a music production background? Do you play instruments?

ANDREW RYNHARD: I actually used to make rap beats for people in the Bay Area. And I would mix them and produce them for people.

CRAIG BOX: Wow.

ANDREW RYNHARD: That was my passion when I was younger.

CRAIG BOX: I have been challenged to a rap battle by one member of the Kubernetes community. He will be listening. He will know who he is. And unfortunately, I just didn't have the time. But if a backing track was to arrive, I would be hard pressed to say no at some point in the future when the moment takes me.

ANDREW RYNHARD: Well, if you guys need a beat, maybe I can make one. It's been a while, but I could dust off the keyboards.

CRAIG BOX: Well, that would be fantastic. Thank you very much for joining us today, Andrew.

ANDREW RYNHARD: Thank you for having me.

CRAIG BOX: You can find Andrew on Twitter @andrewrynhard. You can find Talos at talos.dev. And you can find Talos Systems at talos-systems.com.

[MUSIC PLAYING]

CRAIG BOX: Thanks for listening. As always, if you've enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @kubernetespod or reach us by email at kubernetespodcast@google.com.

JIMMY MOORE: You can also check out our website at kubernetespodcast.com, where you'll find transcripts and show notes, as well as links to subscribe. Until next time, take care.

CRAIG BOX: See you later.

[THEME MUSIC]

View More Episodes

Talos, with Andrew Rynhard

Chatter of the week

News of the week

Links from the interview

Transcript