#142 March 17, 2021

Tinkerbell, with Gianluca Arbezzano

Hosts: Craig Box, Vic Iglesias

If you’d like something more tangible than a virtual cloud instance, there’s always (still!) bare metal. Tinkerbell is a project from Equinix Metal to manage bare metal servers at scale, and Gianluca Arbezzano is one of its maintainers. We talk stacks, racks and MACs.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

News of the week

CRAIG BOX: Hi, and welcome to the Kubernetes Podcast from Google. I'm Craig Box with my very special guest host, Vic Iglesias.

[MUSIC PLAYING]

CRAIG BOX: Welcome back Vic, how are you doing?

VIC IGLESIAS: Doing really well.

CRAIG BOX: It's been a long time since we've had you on the show. It was July 2018, episode number 11. Can't go too much further back than that.

VIC IGLESIAS: Seems like things have changed a bit in our world.

CRAIG BOX: It's hard to tell, it all just feels the same to me. You're in Santa Barbara, I have to ask you-- have you seen any members of the British royal family around?

VIC IGLESIAS: You know, right now, given how things are going, I've kind of tried to avoid those celebrities, in particular, but we have had some nice weather, so I've had occasion to get out and about, but no interactions yet, I'll say.

CRAIG BOX: Do you often see people out hunting for famous people?

VIC IGLESIAS: I would say yes. What I've been doing more in the hunting space is looking for Pokémon via the Pokémon GO app, which has been a good activity to get ourselves outside. And then, when we're inside-- which we've been doing a lot this year, as you can imagine-- we're playing the Pokémon card game which also gives us that experience, which has been nice.

CRAIG BOX: Have the people who make Pokémon GO made any affordances for social distancing?

VIC IGLESIAS: Yes, absolutely. You used to have to walk around and go to places a lot more, which obviously now you don't want to encourage. So you've had the ability to do remote battles and things like that, and battle people that are not exactly next to you or within your area. So, that's been nice to have a little bit more ability to engage with it while we're a bit sedentary.

CRAIG BOX: I know it's always sunny in California, but why so warm at this time of year?

VIC IGLESIAS: Well you see, I'm in Santa Barbara, California. We have a band of temperatures basically between 68 and 72 degrees all the time. So, really, being outdoors is kind of the obvious thing to do every day, but in our pandemic times, it's been a little harder to find that time to go out into the wilderness where we should be kind of distancing from folks.

So we've been doing a lot more in our homes, which maybe for my kids has been Pokémon GO, but for myself, I've been doing a little bit more video gaming, been on Google Stadia quite a bit earlier in the pandemic, recently got an Xbox and really just enjoying playing soccer and not getting tired by being in a virtual world.

CRAIG BOX: Good, well, I think that's a good summary of everything that's happened since July 2018, so let's get to the news.

[MUSIC PLAYING]

CRAIG BOX: GitOps project Flex has been promoted from the CNCF sandbox to the incubation phase this week. Flex has 14 maintainers from five organizations, with over 1,800 contributors to the project. It's currently undergoing a rewrite to use the Kubernetes extensibility APIs with a V2 due out in the coming months. The CNCF now has 20 projects in incubation, with 15 having graduated.

VIC IGLESIAS: NetApp has expanded their storage platform into Kubernetes with Astra, a fully-managed, application-aware data management service. Astra supports data protection via snapshots, disaster recovery with remote backups, and portability and migration with active clones. It services GA on GCP, with other clouds and on-prem to follow soon.

CRAIG BOX: Fairwinds has released Saffire-- with two Fs-- an open-source project to solve a very particular problem that you suspect must have bit them very hard once upon a time. Saffire is aware of container registries that host the same content and if an image can't be pulled from a registry, it will patch the Kubernetes object to suggest a different one. This allows you to use multiple registries without having to run a local pull through cache.

VIC IGLESIAS: You should, of course, make sure all the images in your registries are signed. And Dan Lorenc, container security guy and guest of episode 39, wants to help you. He's built a very simple tool called Cosign, which lets you store signatures next to the containers in a registry, with the signing performed as a step in your CI process. While somewhat of a proof of concept, the tool forms part of a larger Linux Foundation project to improve the security of your software supply chain.

CRAIG BOX: Komodor, with a K, is a new platform for troubleshooting applications running on Kubernetes. It's built to show how deploys or other changes affect an overall system, showing changes and alerts on the timeline, which help to identify what may have triggered an alert. The platform has entered open beta this week, although gated behind a form, and they are sending out swag to people providing product feedback.

VIC IGLESIAS: Oracle has announced fully private Kubernetes clusters in Oracle Cloud Infrastructure container engine for Kubernetes-- a service with two acronyms and four too many words. Nodes and load balancers could already be part of your private cloud network, and in this release, which is generally available this week, it adds the ability for the Kubernetes API endpoint to be on the private network, as well.

CRAIG BOX: Linkerd, the service mesh that really want you to know that its diet is totally paying off, has launched version 2.10. Many control plane components have moved to extensions, meaning you can run it on as little as 200 megabytes, in case you're dialing in from a time before the JVM and you think that's a lot of memory. Other enhancements include support for TCP and multi-cluster services, and marking ports as exempt from protocol detection.

VIC IGLESIAS: And now, the Kubernetes Podcast is proud to present the Money Section, with soundtrack contributed by our guest from episode 127, David Pait.

[BACKGROUND MUSIC STARTS]

CRAIG BOX: Talking of diets, the slimmed down Docker, focusing entirely on developer teams, has taken $23 million in Series B funding. Pay no attention to the fact that they were up to Series E before moving the enterprise business to Mirantis in 2019. Docker stated their annual recurring revenue has grown 170% year on year. The round was led by Tribe Capital.

VIC IGLESIAS: Cloud Native security vendor Aqua has raised $135 million in a Series E round at a $1 billion valuation led by ION Crossover Partners. Aqua values their open source credentials, saying that adoption has more than doubled in 2020, with Trivy, their vulnerability scanner, selected as the default scanner for Harbor Registry, GitLab, and the CNCF Zone Artifact Hub.

CRAIG BOX: Two weeks ago, we talked to Kamil Potrec from Snyk, and in what must, of course, be a complete coincidence, Snyk raised a whopping $300 million in their Series E round, valuing the company at $4.7 billion. Led by Excel and Tiger Global, and with 13 other funds participating, Snyk says that more than 27 million developers use their tools worldwide.

VIC IGLESIAS: Tetrate, the networking company co-founded by Istio's first PM, Varun Talwar, has raised a $40 million Series B round. They will use the money to improve their service bridge platform and hire go-to-market teams worldwide. Tetrate's round was led by Sapphire Ventures.

[BACKGROUND MUSIC ENDS]

CRAIG BOX: Finally, last week we spoke to Daniel Mangum from the Crossplane team, who explained his love of compilers and the lower levels of computing. If that wasn't proof enough, he wrote a blog post comparing Crossplane to the LLVM compiler suite and how it relates to the Kubernetes cluster API. Check it out if you like visual aids with your podcasts.

VIC IGLESIAS: And that's the news.

[MUSIC PLAYING]

CRAIG BOX: Gianluca Arbezzano is a principal engineer at Equinix Metal. He is a maintainer on the Tinkerbell project, a CNCF ambassador, and a Docker Captain. Welcome to the show, Gianluca.

GIANLUCA ARBEZZANO: Thank you for having me.

CRAIG BOX: You got into tech through that venerable web technology, PHP. Tell me about that.

GIANLUCA ARBEZZANO: Yeah, I presume that's how a lot of people started. A lot of my friends and people around me were looking for a website and I was spending a good amount of time with that sort of stuff, so I thought, maybe I can do something. So I just started with PHP. I did some Wordpress websites and from there it was a lot of fun, so I kept digging into the topic. And I was studying computer programming at high school, so I was alreadybecoming a developer. That's how I got started.

At some point, I thought maybe it's time to learn a language that is a compiled one, because I was looking for new challenges. I picked up Go, because back then, it was popular and Docker was on the rise, HashiCorp was using Go, so I thought, maybe I want to use those tools, so why not?

CRAIG BOX: PHP gets given a bit of a bad time but it was, like you say, the way a lot of people got into technology. Do you think that, as a modern programming language, there's still room for it?

GIANLUCA ARBEZZANO: I think so. I also have to say that I didn't really look much at web development since I stopped developing with PHP. So I know there are way more languages, like Elixir, that are really popular, and obviously all the JavaScript frameworks and libraries that you can install on your laptop can do the same.

So it's probably biased by me not having much experience in that field for something that is not PHP. But I think so, and also the language changed so drastically, because I know they had the types and strict checks sometimes. It's a different language compared with the one I used, so I can't really tell.

CRAIG BOX: The getting started experience was very simple, though. You'd just upload a file, you'd have HTML and PHP code all in the same file, you'd drop it on a web server, and you were live. I think that some of that's lost with the whole art of compiling and having to deal with runtimes and things. Do you think there's an easy way for younger people, perhaps, to get started with programming?

GIANLUCA ARBEZZANO: I think it's still a good way to start, and as you pointed out, if you want to resolve a website or a file on the internet, you upload it, and you're good to go. You don't need to think too much about all the compilation and the worst case scenarios, or all the troubles that you are putting yourself into. That is usually something you don't have to think when you start, so it's good that you can have something simple to pick up and just run and see the benefit of actually controlling a CPU in one way or another.

CRAIG BOX: In your career, you've gone between software engineering, programming, and SRE, or DevOps. How did you move between those two things and do you think that they are fluid and you should be able to move backwards and forwards between them as you please?

GIANLUCA ARBEZZANO: I think it really depends on where you want to go, obviously, as well as a lot of other stuff. You are the one driving the boat, at least for me. I'm lucky to be able to do it, or to feel that I can do it. As I said, all the changes I made in my career were driven by curiosity, or the need for a different challenge. So when I saw that websites and PHP weren't giving me what I was looking for anymore, because I was looking for a new way of developing, I decided to move to Go.

And moving to Go, I ended up mastering a lot of the tools that today are DevOps tools. I think the day I decided to switch from a developer to a more DevOps-oriented career was when I understood the power I had as a developer in the infrastructure. So when the cloud providers started to rise in popularity, I saw that I was able to manage and speed up servers quickly with a bunch of APIs.

And at that time, a lot of people were in the mindset of, I'm a system administrator, and I need to learn how to develop. But I was already kind of a developer, somehow, so I just decided to have fun with servers, and it ended up being a good challenge. I joined a company in Turin that still today helps other companies to move to the cloud.

So for me, it was a good challenge because I had the opportunity to see a lot of use cases and to actually see the benefit of cloud computing very early in my career. As I said, I'm born and raised in the cloud.

CRAIG BOX: Turin, probably, is quite a high altitude as well, so a good place to be.

GIANLUCA ARBEZZANO: That's true, I'm closer to the cloud.

CRAIG BOX: It's great how your example is driving a boat rather than driving a car.

[LAUGHING]

A lot of nice lakes around where you live.

GIANLUCA ARBEZZANO: Definitely.

CRAIG BOX: Now, another thing about Italians and cloud, you worked at InfluxDB for three years. We spoke with Leonardo Di Donato in episode 91, did you work with him?

GIANLUCA ARBEZZANO: I also shared a flat with him for two years, so--

GIANLUCA ARBEZZANO: Wow, it's a small world there? It's like, every Italian in cloud native had done a stint at InfluxDB at some point?

GIANLUCA ARBEZZANO: When I joined, I was one of the early employees in Europe, so I was feeling a bit lonely at some point, so I decided to propose to Leonardo and Lorenzo to join me because I trust them as a developer. We are good friends. So for me was a great opportunity to feel less lonely, as well.

CRAIG BOX: What were you working on there?

GIANLUCA ARBEZZANO: I got hired as a SRE. They have a SaaS offering and back then, they were looking to bring a bit of order to it . As you can imagine, InfluxDB had a very high growth, and the SaaS software growth with that, not under good control. So I was doing automation. Back then the orchestrator was custom developed in Go, so no Kubernetes, not many DevOps tools.

The manager who started the SaaS was coming from CoreOS, so he brought a l`ot of the approaches that the company had--a good amount of code that was doing something like Kubernetes does for containers, but for ACU.

CRAIG BOX: Did you end up with Kubernetes there?

GIANLUCA ARBEZZANO: We moved, around when I left, to Kubernetes. So now, the version 2 of Influx Cloud runs on Kubernetes.

CRAIG BOX: InfluxDB is a time series database. Do you see a lot of the use cases for it in monitoring and observability?

GIANLUCA ARBEZZANO: It's definitely a thing. We learn quickly how to organize and store logs and events, but doy we use them in an effective way? I think it's still an open question. So InfluxDB definitely helps to make order on the time series, and it is now trying to make good use of those metrics. We generate so many metrics compared with what we used, to with microservices and distributed systems that they are key-value now.

CRAIG BOX: Is it implicitly easy to monitor time series data of a service that's running a time series database?

GIANLUCA ARBEZZANO: It depends on how it is written, almost like the application. So I don't think there are many differences, it's mainly that you have a chicken and egg problem, so you have to trust your system. I think what was really complicated at InfluxDB was the fact that we were using InfluxDB to monitor ourselves. So all the bottlenecks that we saw in the product were on us as a user, as well.

So a lot of the limitations and the workarounds that we encountered were super important because we were able to see them and fix them for the customer. But it was also a pain for us as a user, because you need to run 1,000s of EC2 at scale, and you have that limitation, and that beta doesn't work. So it's a good challenge, it's a mindset that you develop. So being the guinea pig of yourself, is the challenge-- I think it's a good one.

CRAIG BOX: What do you think that experience at InfluxDB did to your feelings about cloud as a career?

GIANLUCA ARBEZZANO: I think when you have the opportunity to look at an AWS bill for a SaaS like Influx Cloud, you start to see an opportunity, or at least a misunderstanding of the dream of cloud computing.

CRAIG BOX: I should point out other clouds are available.

GIANLUCA ARBEZZANO: Yeah, that's true. I don't have experience at the same scale as I did for AWS. But there are different ways to think about hybrid cloud or multi cloud than the obvious one that we see around.

CRAIG BOX: When you start running something at that kind of scale and where the bill is a substantial part of your company's outgoings, perhaps, there have been a lot of high profile people who have said, hey, I should be running these computers myself. Someone else is making overheads here. This is something I can do. Dropbox is a good example. They decided to move away from cloud and start running their own data centers. Do you think that there is a tipping point where that becomes true? And if so, where is that tipping point?

GIANLUCA ARBEZZANO: The answer is always in the middle. There are workloads or responsibilities that is better to outsource, and others that it's better to manage by yourself. Equinix Metal, that is the company that I work for today, invests a lot of money on custom hardware and specialized hardware. You can gain a lot of flexibility and huge performance if you go deep down the stack. And this is something that if you get locked in a cloud provider, you probably can't think about.

So the secret here, I presume-- it's not really a secret-- but the good practice here is to be as flexible and open minded as possible. There are situations where the cloud provider is needed because you need velocity and you don't care about the infrastructure, because you have more important things to care about, or somebody else is taking care of the actual spending, so what's the point? When you onboard full responsibility, you need to avoid lock-in in any case, whatever it means.

CRAIG BOX: Let's step the clock back a little bit. Equinix Metal, when you joined, was known as Packet. I know my experience of it was when people wanted to run VMware workloads and things that didn't necessarily run in a hypervised, hyperscale cloud environment. What was Packet good at, and why did you decide to join them?

GIANLUCA ARBEZZANO: Their commitment to open source and their community mindset definitely caught my attention, because it is the same mindset that I have personally. So that's definitely something important. I was also thrilled to write code for cloud computing. This is something that I think is a good step for me in my career because as I said before, I worked as a developer. I moved to a more ops person.

And I moved to SRE. That is the bridge between how you figure out what's going on in a system without knowing the system. So how do you help developers to write debuggable and understandable code? And I also contributed to Kubernetes and other tools like that. I like the feeling that cloud gives me where I can write code that in some way is related to something that physically I can visualize. So it's not too abstract. For me, it was a good company and with a good culture, so I decided to pick up the baton.

CRAIG BOX: It's interesting because when you think about the term "cloud," you talk about something that you can see. But cloud by its graphical representation is the blob on the diagram where you don't know what everything is. When cloud computing became the thing that people were aware of 10 to 12 years ago, I would say the way that I described it to people was it is sort of pay as you go, utility computing, but virtualization was an important part of it.

Now we're starting to talk about environments that don't have virtualization, that are running bare metal. When is something cloud, and when is something not cloud?

GIANLUCA ARBEZZANO: The way I define cloud is whatever is driven by an API. So that's mainly my definition. I don't really see virtualization as a key part of a cloud stack. And even before, when I was a heavy user of GCP or AWS, the fact that EC2 by itself was a virtualized server for me didn't really change much, because I had SSH access, I was able to use an API. So for me, the abstraction was enough to not care about the underlying details. That's my answer to that, I presume.

CRAIG BOX: If I'm a system administrator and I publish an API which says, get an instance, and then when you call that API, I leave that hanging for a month while I go down to the shop and buy a Dell box and come and plug it back in and everything, am I a cloud?

GIANLUCA ARBEZZANO: I think you are a cloud. Back in the day when I used to work at Corley, a company in Turin, they actually scaled with a cloud provider via fax. So they had to send the fax, and that was the cloud provider. So you definitely have to choose a good cloud provider, but that's enough for me to think about cloud.

CRAIG BOX: While you were in the process of joining Packet, they were in the process of being acquired by Equinix. People who have heard of Equinix have generally heard of them in the context of running networks and being a location that people do peering at. Why do you think Equinix wanted to acquire Packet?

GIANLUCA ARBEZZANO: When you run the amount of data centers that Equinix has, 200 and counting around the globe, there is the opportunity to simplify, or find a different way of selling that space. Equinix is a data center company. So in theory, you knock the door and say, OK, I want that room. I want power and data, and you pay for that.

There is another customer that may knock on the door and say, I want that room, and I want that room full of servers with power and data connection. And that's probably the traditional way data center companies operate. When you have so many data centers it's slightly possible that you start to look at another way to sell compute or to sell your space. And I suppose that having that API layer in front of computing, it's a different way to sell and use compute power.

CRAIG BOX: A way that you can put an API layer in front of your computer is the Tinkerbell project, which you work on. Tinkerbell lets you provision and manage bare metal in a cloud native Kubernetes-centric way. How did Tinkerbell come about?

GIANLUCA ARBEZZANO: Probably a similar story to the Kubernetes one for Google. With the Equinix Metal or Packet, back then, we developed an automation to do provisioning and much more around automation of data centers. And at some point, they realized it was a cool project. And today, it serves all the provisioning and all the hardware that you can have at Equinix Metal. So Tinkerbell is that tool that we use internally with new concepts.

You can think about the internal Tinkerbell that we have at Equinix Metal has a blob of Bash scripts that are not really reusable, hard to package and ship and debug and troubleshoot. So at some point, they thought, oh, maybe we can generalize this concept. And in the process of generalizing it, we can bring it to open source so we can validate with other community members what we are doing.

And another opportunity, when you open source a tool, is that it makes collaboration easier. It's way easier to reach out to other companies or other entities and say, we are working at these open source tools, and we want to make it better, do you want to do it with us? This collaboration is important now that we have many architectures, many cloud providers, and many vendors that are doing servers.

CRAIG BOX: I've been working in cloud for quite some time now. So I'm not 100% sure what servers look like. Have they changed substantially in the last 20 years? Are they still pizza boxes that sit in the rack and connect to multiple power and network sources?

GIANLUCA ARBEZZANO: As I said, I was also born and raised in the cloud, so I'm not the right person to look back at the history of servers. But there were not that many servers in data center I saw, because I joined Equinix Metal when the pandemic was already happening. I've never had the opportunity to enter a data center and look around, so I can't really tell. But I know that there are many specifications for usable hardware. So I'm not sure if it's a pizza box. I presume it is like a pizza box with a bunch of stuff inside that companies can glue together.

So Facebook has the Open Compute program, and the Equinix Metal uses and keeps developing the Open19 standard with LinkedIn that is a way you can build building blocks for data centers. So you can pre-cable racks and design how you hook all those cables in a reusable way. It's a good world that I have for sure to go deep into. But yeah. I think the case is a pizza case for me. Not much different.

CRAIG BOX: As an Italian, do you approve of putting pizza in boxes, or should it be eaten fresh straight out of the oven?

GIANLUCA ARBEZZANO: I like fresh pizzas, but as long as there is no pineapple around it, I'm fine.

CRAIG BOX: Because these are physical things that take up space in the room, we presumably have to put them in the room and then have services that we can call to turn them off or connect them. What is the input to Tinkerbell? Do you give it a room full of computers that are already networked, or do you need to think about the setup before you start thinking about the software?

GIANLUCA ARBEZZANO: Tinkerbell is a stack, so it's designed to be replaced [in parts]. You don't have to think ahead too much about what you want. For example, I have 10 NUCs over here in my home network, and I can pilot them with Tinkerbell. So even if I don't have a fully operational data center with redundancy and who knows what, I can use Tinkerbell for my home lab.

CRAIG BOX: But do those 10 servers need to be on and waiting all the time, or is there some automation that lets me turn them on when they're needed?

GIANLUCA ARBEZZANO: This is a good question that I'm actually trying to answer for myself. Proper servers have BMCs inside that are boards -- you can think about that as a computer inside a bigger computer. And the small computer is the BMC. It controls the main one. The BMC is way smaller, so it consumes less power, so it can stay up all day, all the time. And it can be used to switch on and off the main one and to monitor the proper server.

So this is how in a proper data center, you manage to have servers that are all powered off, and you power them on when needed. Because obviously, for cost saving and for safety, you can't leave them running forever. So you have BMCs, and you can reach out to the BMCs via various protocols, and you can say, turn my server on, turn my server off, or PXE boot it, or start it from disk and so on. So this is how you pilot it.

How does it work for a computer that is not a server, like a NUC? That's a good question. It depends on your NUC. I have here in front of me a board. You can't see, but it's there. It's a NUC that I disassembled. I started to learn the spec for that NUC, and I found that there are pins that you can use to control the board.

And control means that you can check the status of the board if it's on and off. And you can switch it on and off. So that's it. So I hooked that to a Raspberry Pi that is my BMC-like software. So I can leave the NUC off all the time, and I can reach out to the Raspberry Pi to turn it on and off. So this is kind of what you can do if you want to play with a home lab that doesn't require you to buy an expensive server.

CRAIG BOX: So we now have our baseboard management controller, our BMC, which is able to accept commands from the network -- effectively, it's a computer inside your computer, if you want, to be able to turn it on and off and control it. You mentioned they're PXE booting. I'm never 100% sure what that stands for-- Pre-boot execution environment? Well done. Had to dredge that one up from the bottom of my memory.

GIANLUCA ARBEZZANO: [CHUCKLES] Even a laptop works in this way. So even the NUC has the same feature. So they go over a bunch of boot devices. It may be your hard drive, it may be your USB stick. And if it can find none of them, it usually does PXE boot mode.

CRAIG BOX: Not the floppy disk.

GIANLUCA ARBEZZANO: After the floppy disk.

CRAIG BOX: After the floppy disk.

GIANLUCA ARBEZZANO: Usually PXE boot is the last one you have to try. So let me tell you the story of a brand new server. Amazon rings your door. they come with a box, and there is a server inside. So you unpack it, and you hook it to the network, and you hook it to the power. When it boots, there is nothing inside the disks. You maybe don't have any disks. So what does it do? It tries to boot from the hard drive. It doesn't find one. It goes to the USB. It doesn't find a USB. It goes to the floppy disk. There is no drive for the floppy disk.

So it goes down to the PXE boot. And when the PXE boot starts, PXE does a DHCP request to the network. The DHCP is one of the pieces of the stack that Tinkerbell provides. We'll speak about them later, I suppose, but interior DHCP replies and says, OK, I know you via MAC address. So you have to run these little scripts. And usually, the scripts contain the way you set up an IP for the server. It can be directed at DHCP, or it can be a static one if you assign one to the MAC address, whatever. So the server has an IP now.

In that script, it receives how to boot itself. We serve an operating system that runs in memory, a very small one based on Alpine. And from there, there is a kernel, there is an init ramdisk, so your server can boot an operating system. And you're almost done there. You have an operating system. Obviously, Tinkerbell gives you more capability. It's like when we started the in-memory operating system, the OS itself starts a worker, an agent inside, that asks Tinkerbell if there is anything to do. And usually the work that the Tink worker has to do is to provision an operating system.

So from there, it executes a bunch of actions that are Docker containers that download a root FS, download a kernel for your Ubuntu server, and flash everything on the hard drive. So from there, you can reboot, and now the server does the booting process again. But this time, it can find an operating system on the hard drive. So it starts Ubuntu or whatever you installed. This is the lifecycle of a server, at least for the creation phase.

There is a deprovisioning that works almost in the same way. So you issue to the PCB a request to reboot your server, and you force the PXE boot environment. So from there, it's the same dance. The server restarts, but it jumps over the hard drive and goes directly to PXE boot. So PXE starts the operating system in memory. The operating system in memory starts the Tink worker that asks the Tink server what to do, and that's it.

CRAIG BOX: Now you've mentioned before the Tinkerbell is a stack of microservices, and there's a few of them that you will have gone through in order to provision and deprovision the servers here. Tell me a little bit about the microservices that make up Tinkerbell.

GIANLUCA ARBEZZANO: There are a couple of them. They are all open source and you can find them in the repository. But there are three that are the control plane of all the stacks. So there is Tink server, that provides a gRPC API. The Tink server stores hardware information, and it stores workflows, so what you want to run. And you can issue commands like start a workflow - there is a provisioning or deprovisioning workflows.

There is a Tink worker that is the agent that runs on the in-memory operating system environment and interacts with the Tink server asking for work to do. And the Tink worker relies on containers, and Docker as a runtime to take a list of actions and run them on the hardware that you want to provision.

There is a Tink CLI that is a command line interface that you can use to interact with Tink server. There is Boots. It is a DHCP server. So you run it on your network. It's waiting for requests from hardware that has to be provisioned. And what it does, it also serves PXE scripts that boot the operating system-- the in-memory operating system. So not the one you want, but the one that gets used to provision the one you desire.

CRAIG BOX: Right.

GIANLUCA ARBEZZANO: And we serve, at this stage, two different in-memory operating systems. One is called OSIE, and the other one is called Hook. OSIE is the one that today Equinix Metal runs. Hook is the new one that we developed using Linuxkit.

The last one is Hegel. That is a metadata server. You can think about that as the equivalent of how all the cloud providers have a metadata server. So when you are inside the server, you can reach out to a particular web server getting information from the server that you are in.

Hegel does the same. So when you are inside the hardware that you want to provision, you can reach out and get information about the IPO server, or the user data that you want to run, and so on. So those are the pieces of the stack.

CRAIG BOX: The control plane services? They all run on top of Kubernetes?

GIANLUCA ARBEZZANO: They are not designed to run strictly on Kubernetes, but they are binaries, so you can run them everywhere.

CRAIG BOX: You mentioned before that the actions that run on the machines run in containers. Is there anything specific to the Kubernetes control loop interface? For example, the asking what's going on? Is that using Kubernetes CIDs or anything behind the scenes?

GIANLUCA ARBEZZANO: The architecture is very similar. So you can think about the Tink server as the API server, and the Tink worker as the kubelet, somewhat. And Docker is a runtime binary. The topology is the same for the services like Kubernetes, but there is no similar concept to a CRD yet.

We did some experimentation about creating an operator to bridge the Kubernetes API with the Tinkerbell API, and I think at some point, it will come in a solid state. We also have a Cluster API implementation for Tinkerbell. So there is obviously a lot that we can do with the Kubernetes community, on its own, Tinkerbell is its own beast, I'd say.

CRAIG BOX: I'm not sure that you mentioned the best-named microservice of the lot, which is "PB&J".

GIANLUCA ARBEZZANO: I didn't mention PB&J too much because it's not yet hooked into the services that we ship as part of Tinkerbell, mainly because it controls BMCs. Not many in our community have BMCs in their lab yet. We are learning that there are two different spectrums of the community around bare metal. There are people like me that have a bunch of NUCs or Raspberry Pis and they want to play with them. And there are gigantic companies like Equinix Metal that want to use that software for data centers.

So there is a bit of back and forth there, but it can run for both use cases. And obviously the fact that it comes from the operational experience that Equinix Metal and Packet developed, it's a good sign that it's a solid tool.

CRAIG BOX: I was going to say that there's not really a connection or a theme between all of the names here. But then you mentioned you've got a new one called Hook, so I can at least start seeing a Peter Pan theme emerging in the names.

GIANLUCA ARBEZZANO: You definitely spotted it - good one. So yeah. I don't know how it will end, but the naming is going well.

CRAIG BOX: Now, you've been talking about NUCs, or Intel's Next Unit of Computing, which are X86 devices. You've also mentioned Raspberry Pis, but maybe you were mentioning that in terms of being the BMC for one of those NUCs. If I've got a room full of Raspberry Pis, can I use Tinkerbell to control them the same way as if they were X86 PCs?

GIANLUCA ARBEZZANO: I have to say that we don't have a full story around Raspberry Pis yet. But all the binaries are compiled for Raspberry Pis, so you can use them. PXE supports ARM, so you can technically run that. The two operating systems we provide, Hook and OSIE – Hook doesn't yet have mult-arch support. We are hooking it up because we use linux-gate and linux-gate supports multiple architectures.

CRAIG BOX: Hooking it up. I see what you did there.

GIANLUCA ARBEZZANO: Sorry, it will come. OSIE can run on Arm, but it's too big to run on Raspberry Pi. We are almost there, and we are after a Raspberry Pis setup as well. So it's just a matter of time.

CRAIG BOX: You've stated a couple of times, you're born and bred in the cloud. A lot of us work in cloud and Kubernetes. While it takes from Borg, which was a "runs on physical machines" environment, it is very much thought of as a cloud thing. Do you see Tinkerbell as something that you would use to provision a cloud environment?

GIANLUCA ARBEZZANO: We open sourced Tinkerbell, as I said, because Equinix Metal used that concept to provision data centers and manage the automation of data centers. And Equinix Metal and Packet are a cloud company. So we expect that private and public cloud providers -- now the difference is very tiny. We'll be able to use Tinkerbell as a way to provision bare metal, and in a fast and reliable way, and to support a proper cloud provider expectation, let's say.

CRAIG BOX: Very unfortunately, OVH had a major fire in one of their data centers last week. If you were recovering from something like that using Tinkerbell, do you think that would get you up and running again quicker than what the rest of the industry is doing today?

GIANLUCA ARBEZZANO: I don't think Tinkerbell on its own can do much when it comes to fire protection. You have to plan and care about how you distribute your workload, and how you take backups, and so on. So I think the OVH experience has raised a lot of questions to me. Obviously they started to rise as soon as I knew that all the people were safe, and the fire was under control, or whatever.

But I think it's a good experience, and it should raise a lot of questions for the people that are using cloud providers, and for cloud providers themselves. But I think a lot of them are already doing a good job trying to figure out all the possible scenarios. But sometimes you don't have them all under control.

CRAIG BOX: Everyone who's running a cloud service today has some provisioning service. Would you like to see Tinkerbell become something that's more widely adopted? Or do you think this is just a thing that suits your particular need, and you're putting out there for people to run perhaps more in their home labs?

GIANLUCA ARBEZZANO: No, we definitely don't want to make Tinkerbell 100% tied to home labs. We are not working with that in mind. But we know that a lot of people, or even a lot of companies that are not that big, have something that looks like a home lab, as a data center. So it's definitely a community that we want to support. But we are going after much bigger use cases, more realistic data centers.

So it's definitely something that I hope will get adoption for high scale environments. And we already have a couple of community members that come from hardware vendors or hypervisor companies that are looking at how they can use Tinkerbell. Because it's a hard job to provide a reliable API to do bare metal provisioning, because the feeling you get the metal is different from the one you get from a hypervisor.

CRAIG BOX: It's a lot colder to the touch.

GIANLUCA ARBEZZANO: Yeah, that's one. And it's also unpredictably slow sometimes. So you send a message, and you never know if it got there, to the destination, or what it's doing.

CRAIG BOX: The guy's off ordering a server from Dell, and he'll be back in a week's time.

GIANLUCA ARBEZZANO: That's not the cloud experience we want to provide with Tinkerbell, just to be clear. [CHUCKLES]

CRAIG BOX: Tinkerbell joined the CNCF last year. It was announced at KubeCon North America in November. What's happened to the project since then?

GIANLUCA ARBEZZANO: As you can imagine, when you take a piece of code that runs for like five, six, seven years at scale, inside a company, privately, it looks different compared with the one you expect to see from an open source project. There is a lot of work ongoing around making it more general and easier to figure out for people that are new to the project.

And this is a simplification in concept and usability that Equinix Metal is looking for themselves. And having more people helping us figuring out all the bits and having different expectations is a very good way to improve the user experience for us. So definitely, that's an ongoing effort. We also saw many other companies and people looking at how they do bare metal provisioning. So that's very important.

We started to work more closely with the Kubernetes community as well. We started to do biweekly contributor calls. So if you want to join us, they're free to jump in, where we can speak about the evolution of the project and how people are using it or looking at it and so on. We also started to use Artifact Hub-- that is a project developed by the CNCF-- to ship all our reusable actions.

So I mentioned to you before that Tinkerbell uses actions that are Docker containers. And the reason why we did that was to make them reusable and easier to move and ship. Equinix Metal is writing more actions that are translated from Bash scripts to Docker containers. In order to ship those containers, we are using Artifact Hub to do that.

Ideally, we want to extend Artifact Hub to support workflows as well. And it means that moving forward, we expect to be able to reuse ProxMox installation or vSphere installation workflows, or OpenStack workflows-- OpenShift workflows, mainly, more than OpenStack, I suppose. Ideally we want to build a community that can help each other doing better bare metal provisioning.

CRAIG BOX: When you are open sourcing something that powers your company, as Tinkerbell does at Equinix Metal, there is often, like you say, a bunch of Bash scripts and things you might say, hey, I want to do this better for the public version. And I guess there's a couple of different approaches I see. One is to just open source it as it is, and then evolve in the public, and keep tracking that.

Another one, for example, was what the backstage team, who we spoke to recently, did, which is release a new version, possibly written in a different programming language, and then do all of the upstream work with the idea of eventually moving the internal stack to that. And they may or may not do that because they've already got a completely working version themselves. How did you decide which of those paths to take?

GIANLUCA ARBEZZANO: I have to admit that I wasn't there when the decision was made. But if I have to look back and try and reconnect all the dots, the fact that we had different services helped us a lot because we were able to say "the DHCP server is good and can be open sourced as it is" because it's well scoped and so on. So we open sourced that. Same for Hegel.

And I think the part that we've wrote from zero is mainly related to the Tink server, so the API and the CLI, and the concept of workflows that wasn't there, but got developed with the idea that Bash scripts don't scale well as Docker images and the proper programming language, I'd say.

CRAIG BOX: You can always just run your Bash script inside a Docker image. That'll make it scale.

GIANLUCA ARBEZZANO: Yeah. And that's what helps Equinix Metal to move forward without having to rewrite all their-- or without having to throw away their operational experience. Worst case, you just package your Bash script inside the Docker container, and you hope for the best.

CRAIG BOX: Now that Tinkerbell is open source, and you're participating in a broader community, what sort of things are you looking for the community to contribute? Or how can people who are playing around with Tinkerbell help the project?

GIANLUCA ARBEZZANO: We have many different channels, like the Slack channel, and we have our mailing list for contributors. But what I think we need help with is the user experience because as I mentioned before, the experience we have, and the one that got developed in Equinix Metal was very tied to the Equinix Metal mindset, or what they had at that particular point in time. But now we have the opportunity to take all that operational experience at scale and make it more user friendly. So that's definitely a good way to help.

And even more, just play with it and let us know what you can do, because as you can imagine, we've only just explored a fraction of the possibilities that bare metal provisioning has.

CRAIG BOX: You want people to file off the rough edges of the metal?

GIANLUCA ARBEZZANO: Yeah, that's it. Just break the metal if you can.

CRAIG BOX: Finally, when you're not tinkering with your home lab, I hear you can be found in your garden.

GIANLUCA ARBEZZANO: Yeah. I developed the desire to have a good garden because my grandparents did that all the years since I was born. So for me it was a great way to keep going and help them. But also because I work from home. I moved to Dublin for a couple of years, and I got back when I started to work on InfluxData. And now it's like maybe five years that I've been working from home.

So I was really looking for a way to get out of my bed and do something that was not typing on a keyboard. So having a garden, it's a very good way to do something different, and use your hands more.

CRAIG BOX: Well, it feels like everyone's been working at home for five years. Do you have any specialties? Anything that you're really good at growing?

GIANLUCA ARBEZZANO: My family keeps tomato seeds every year from the previous year. We have like 40 year things.

CRAIG BOX: A real heirloom tomato.

GIANLUCA ARBEZZANO: Yeah. Now I feel the pressure of not losing seeds every year. It's a bit depressing, but I think that's a specialty I have to develop.

CRAIG BOX: Does it make a really nice pizza sauce?

GIANLUCA ARBEZZANO: Yeah. Every year.

CRAIG BOX: Great. Well, thank you very much for joining us today, Gianluca.

GIANLUCA ARBEZZANO: Thank you for having me.

CRAIG BOX: You can find Gianluca on Twitter at @gianarb, or on the web at gianarb.it. You can find Tinkerbell at Tinkerbell.org.

[MUSIC PLAYING]

CRAIG BOX: Thank you, Vic, for helping out with the show today.

VIC IGLESIAS: Thank you so much for the invitation, Craig. It was great to hear and see you again. I hope all is well on your end.

CRAIG BOX: If you enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @KubernetesPod, or reach us by email at kubernetespodcast@google.com.

VIC IGLESIAS: You can also check out the website at kubernetespodcast.com, where you will find transcripts and show notes, as well as links to subscribe.

CRAIG BOX: I'll be back with another guest host next week. So until then, thanks for listening.

[MUSIC PLAYING]