#36 January 15, 2019

Rook, with Jared Watts

Hosts: Craig Box, Adam Glick

Rook is a cloud native storage orchestrator and a controller for storage systems such as Ceph. Jared Watts has been working on Rook since the start, first at Quantum, and then at Upbound. He talks to Craig and Adam about storage, chess, and premium-rate telephone numbers.

Does anyone actually read the show notes? Turns out a few of you do. Thank you for listening and reading!

Chatter of the week

News of the week

CRAIG BOX: Hi and welcome to the "Kubernetes Podcast from Google." I'm Craig Box.

ADAM GLICK: And I'm Adam Glick.


CRAIG BOX: How are you feeling, Adam?

ADAM GLICK: I'm doing relatively well. Had a nice weekend and got a chance to dig into a couple of new things. Decided to finally check out Tabletop Simulator. For those that don't know, it's basically a thing to build board games in. So there's every board game you can think of that people, basically, made, you can play with it. But by far, the best feature of it is the fact that of all the buttons you have to use as a player, one of them is the Flip the Table button.


ADAM GLICK: Which, at any point where you're just like, I'm done with this, you just hit it and just flip the table. Rage quit.

CRAIG BOX: It is a shame that kids of today have to use a Tabletop Simulator. I have-- [KNOCKING ON WOOD]

ADAM GLICK: You have an actual table.

CRAIG BOX: --a tabletop right in front of me.

ADAM GLICK: Nice. So between that, and I finished watching my latest Netflix binge watch of a show called "Happy!", which actually came from Syfy, which I can only describe as somewhat surreal. It is an entertaining show, though I love the first episodes more than the end. But it's basically, if you don't know it, a guy who's a detective turned kind of rage revenge killer who gets visited by a blue unicorn that only he can see and guides him to do things.

CRAIG BOX: Yes. It's his daughter's [imaginary] friend. Actually, caught a couple of episodes of that show on a plane once. I thought, ooh, I might keep up with that. But I have not yet. How does it resolve, without giving too much away?

ADAM GLICK: Without any spoilers, because I hate when people do that, what I will say is you do get to a conclusion with it. And if you like the surreal. I mean, it is from Syfy, so you know you're getting into. It does get more weird. I first was like, OK, he can see his kids' imaginary friend, and they're going to leave it at that. But it does get more surreal, a little more weird. There's some content that probably isn't right for viewers of all ages. I'll just put it that way.

CRAIG BOX: Definitely don't watch it on an aeroplane.


I do wonder about that. Sometimes you get the content warnings in front of things that says this has been edited for content, or something. And you're generally wearing headphones, but there are some things like, I'm worried about what happens if someone walks past while I'm watching this on an aeroplane. I'm a respectable member of the aeroplane community.

ADAM GLICK: It does take much more of a Tarantino-esque kind of view on violence, so it is fairly violent. It is entertaining. The main character, he's a stereotype, but he is a very entertaining stereotype. Anyways, that's what I've been checking up. Shall we get to the news?

CRAIG BOX: Shall we get to the news?


ADAM GLICK: The CNCF has formally announced the first ever Kubernetes Day is going to be held in Bengaluru, previously known as Bangalore, India, on Saturday, March 23. Kubernetes Day is a single day, single track event that brings together local and international experts to engage people interested in Kubernetes and the cloud-native technologies. The event is targeted at regions with large numbers of community members who can't necessarily travel to the KubeCon events. The CNCF expects over 1,000 attendees at the first Kubernetes Day, and a registration link can be found in the show notes.

CRAIG BOX: Google launched vertical pod autoscaling to Google Kubernetes Engine in beta. While horizontal pod autoscaling decides on the number of replicas a pod should have, vertical pod autoscaling decides what the CPU and memory allocation of an individual pod should be based on watching the code that runs inside it.

It can operate in three modes, showing recommendations only, applying the recommendation for new pods as they are created, or applying a rolling update to a deployment to change all the existing pods. VerticalPodAutoscaler builds on an open-source API developed by SIG Autoscaling but is a Google-specific implementation of the controller, which integrates with the GKE cluster autoscaler to ensure the nodes of the correct size and shape are prewarmed to run your newly scaled applications.

ADAM GLICK: Have you built functions on the AWS Lambda platform? You can now move them directly to the cross platform Knative framework using KLR, spelled with a K. That's K-L-R, standing for the Knative Lambda Runtime. KLR is a set of build templates that can be used to run a lambda function in a Kubernetes cluster installed with Knative. Environments supported are Go, Node.js 4 and 10, Python 2 and 3, and Ruby is upcoming. KLR is built by TriggerMesh, which you can learn more about by going back and listening to episode 28.

CRAIG BOX: Meet krew, also with a K, the plugin manager for kubectl . krew is a package manager for kubectl plugins. Conceptually similar to apt or brew, krew lets you install over 20 plugins for kubectl and keep them up-to-date. krew was built by Luk Burchard, a Google Cloud intern in 2018, and his mentor, Ahmet Alp Balkan. It is implemented itself as a kubectl plugin and works with 1.12 or higher.

ADAM GLICK: The CNCF has posted the first in a series of postings on monitoring Kubernetes, written by Sean Porter from Sensu. He covers the challenges in monitoring Kubernetes and the main data sources used to help solve them. Porter's concerns are that applications are constantly moving, there are many small pieces to monitor, and the ephemeral nature of stateless microservices makes tagging and labeling all the more critical to keep track of all the parts of your application.

His four data sources to watch include the cluster nodes through Prometheus node explorer, the metrics of the Kubernetes master processes, the cAdvisor, built into the kubelet for container metrics, and cube state metrics for application metrics. All in all, it's a nice introduction to data sources and things to monitor as you look at smooth operation of your cluster and production.

CRAIG BOX: The Istio team has issued an update on their progress towards a 1.1 release. Dan Ciruli, our guest in episode 15, says that the flood of production adoption of Istio 1.0 necessitated a series of patch releases, and thus, pushed back the 1.1 release date. Themes of the upcoming release include scalability to thousands of services, multi-cluster measures, and smooth upgrades. Dan says that the team has begun the release qualification process, with the first release candidate due later this month and a final build sometime in February.

ADAM GLICK: Finally, Stefan Bueringer has written a guide on managing Kubernetes using the open policy agent framework, a CNCF sandbox project for cloud-native policy management. Implementation offers some advantages over Kubernetes' RBAC, including the ability to explicitly deny permissions which he lays out in his blog post. Integration is implemented using the Kubernetes' policy controller, originally developed by Microsoft Azure, and currently being donated to the CNCF.

CRAIG BOX: And that's the news.


ADAM GLICK: Jared Watts is a founding engineer with Upbound and a core contributor to the Rook project. Welcome to the show, Jared.

JARED WATTS: Thank you very much for having me today.

ADAM GLICK: You want to share a little bit about your background and how you get started in the world of storage?

JARED WATTS: Sure, absolutely. So I started my career at Microsoft as an intern quite a long time ago. And I was working on Windows Server and management applications for Windows Server. And back then, I actually was a little intimidated by open-source projects. I didn't understand a lot of the build systems and the project tooling that a lot of those communities were using, and I was very much in a Microsoft-focused environment.

But things have changed a lot over the past 15 years. So much more involved now in open source. GitHub made it a lot more social, a lot more accessible. And some of the storage projects that we've started actually started at our first startup effort at a company called Simform, which is doing peer-to-peer storage across the public internet.

ADAM GLICK: Is that how you get your start into open source and kind of moving from the proprietary world to more of the open world?

JARED WATTS: Yes. My start with open source was at that first startup with a project called Mono, which actually was an open-source effort to produce the .NET framework.

ADAM GLICK: Yeah, I remember.

JARED WATTS: And so we were using that to run a lot of our live services, our servers, in the centralized data store for that peer-to-peer storage network.

CRAIG BOX: 15 years ago was Windows Server 2003, which I would put to you was the best version of Windows Server, peak Windows.

JARED WATTS: And it came in on the tail end of Server 2003 R2. And then the first one that I really shipped was Windows Server 2008.

CRAIG BOX: What was your path from server management to the world of storage?

JARED WATTS: Great question. So the first efforts that I got involved in storage was the startup that I left Microsoft for, a company called Simform. And we were doing a bit of very non-traditional storage architecture, where it was a peer-to-peer network that was using untrusted storage across the public internet. You could contribute some of your hard drive and in return, get free cloud storage from the rest of that peer-to-peer network. So that was my first foray into storage, and I've been in storage technology ever since that.

CRAIG BOX: I remember that, when it was called Kazaa.

ADAM GLICK: I think that's a different form of storage.

JARED WATTS: Peer to peer has the protocol and architecture has a lot of uses.


ADAM GLICK: That was the most genteel answer I can imagine.

CRAIG BOX: How did you end up working at Quantum?

JARED WATTS: That first startup, Simform, ended up getting acquired by Quantum Corporation. And so we stayed on there and started some interesting internal storage projects there, including what ended up being the Rook project.

ADAM GLICK: For those that aren't familiar, can you explain what Rook is and what it does?

JARED WATTS: Yeah. So Rook could be thought of as a cloud-native storage orchestrator. And so what that means is that it extends Kubernetes with new custom types and custom controllers. And the whole purpose of those is to provide software automation to automate the deployment, bootstrapping, scaling, upgrading, disaster recovery, all sorts of administrative tasks for storage solutions, specifically.

CRAIG BOX: Kubernetes comes from the cloud world, and my experience with storage is basically call an API and some is delivered to me. What's it like for people who are running in an on-premises environment?

JARED WATTS: That's actually an environment where Rook tends to do very well. So when you're running in an on-premise environment and you're managing all of your own hardware, turning all of those disks and storage devices across all of your hardware into a useful consumable storage system by stateful workloads and applications in Kubernetes, is really where Rook shines.

CRAIG BOX: What was the state of storage before Rook?

JARED WATTS: Storage in Kubernetes, in general, has come a really long way, I think. There's some amazing progress from the general Kubernetes community and ecosystem, including a lot of the work from the special interest group for storage, the SIG Storage folks. Some amazing stuff there.

So basically I think, really, what Kubernetes has done an amazing job at, in terms of storage, is the volume abstraction. So your stateful applications, they need to store their application state somewhere, and they don't really need to know exactly what the underlying storage system is. Kubernetes has abstracted that away in the form of a volume. And so all your container needs to do is be able to write to a local file system or block device and Kubernetes handles the rest of that for you and abstracts away whatever the underlying storage actually is.

CRAIG BOX: So what is the underlying storage of Rook?

JARED WATTS: The underlying storage of Rook is a number of different storage technologies. So Rook, in the sense that it's a storage orchestrator, it does that orchestration and deployments and managements for a few different storage technologies. The first storage technology that Rook started with was a distributed storage software called Ceph.

CRAIG BOX: With a C.

JARED WATTS: Exactly. C-E-P-H, cephalopod. And that is the storage technology that is the most stable with Rook and the most production deployments, et cetera. Rook also does that same sort of orchestration and deployments and management for a couple of other storage technologies, like CockroachDB, Minio, and NFS, and Apache Cassandra, as well.

ADAM GLICK: How would you compare that to something like Gluster File System?

JARED WATTS: Yeah. So Gluster is in the same vein as Ceph, in the sense that it's a distributed storage system, where it's scalable. You can add more hardware at it, and that overall storage cluster and capacity will increase.

A big difference I see in between Ceph and Gluster is that Ceph, at its core, is object-based. It is object storage through and through. And different presentations of storage, such as block storage and file storage, are built on top of that, where Gluster, at its core, is file storage through and through. It can also put other types of presentations of storage on top of its core file system, file storage, such as block and object, but that's the big difference between the two of them.

CRAIG BOX: So if we consider the comparison between blob storage, like Google Cloud Storage, and then block storage, like the persistent disks, it sounds like Rook provides that abstraction using blob storage, using Ceph underneath, and then provides volumes to Kubernetes in block format?

JARED WATTS: Yeah. That's a good way to look at it.

CRAIG BOX: Is that true?

JARED WATTS: Yeah. So something really key to remember here is that Rook itself is not on the data path. So when your application wants to write or read bytes to the underlying storage, either through the S3 object storage interface or a shared file system or raw block device, what it's going to do is actually do that, those data write and read operations directly with the underlying storage, like Ceph or Gluster or CockroachDB. So Rook does the managements and ongoing health monitoring and configuration of those storage systems. But the actual data path is going to be the storage solutions that Rook has orchestrated and deployed into your Kubernetes clusters.

ADAM GLICK: How does Rook handle the ability to provide stateful data? And so if you've got a set of data, if it's providing volumes and a container goes away, or let's say you have a storage corruption, how is it handling that? Does it provide redundancy in there? Does it rely on the underlying area for redundancy? Does it provide multi-tenancy?

JARED WATTS: Yeah. So it's a bit of a combination of both. So while Rook itself isn't on the read and write data path, this underlying storage system is going to be responsible for redundancy and scrubbing and durability, and all those directly storage-related tasks. But in terms of the integration with Kubernetes, to provide those volumes in a very dynamic orchestrated way, Rook does get on that path.

So when a container requests a volume, that needs to be provisioned with the underlying storage system, and then it needs to be attached and mounted and available on the specific node that that pod has been scheduled to. And Rook does have a hand in that orchestration effort, to make sure that the volume is available for wherever the consuming pod ends up. And if that pod happens to move to another node in the cluster, then Rook will need to do that unmapping, unmounting of the volume, and then remapping and remounting it on the new node that the pod has landed on.

ADAM GLICK: Gotcha. So Rook will actually move data as needed. Or will it move the data, or is it just going to, basically, unmap the volume and then remap the volume on the new node where that pod is spinning up?

JARED WATTS: Yeah. It'll move just the volume abstraction itself. So it's a very quick, very fast operation. No data has to move around or get rebalanced in the cluster for the pod to continue consuming that storage on a new node in the cluster. So it's a very fast operation.


CRAIG BOX: What was the impetus to develop Rook?

JARED WATTS: When we were first starting to learn about Kubernetes and were talking to a lot of people, one thing we consistently learned and ran into from users of Kubernetes is that a lot of people didn't have a great solution for storage in Kubernetes. They had integrations with external NaaS devices or other external storage solutions, but there wasn't really a very portable way to deploy and manage storage within Kubernetes itself.

CRAIG BOX: Is there a place for using Rook in a cloud environment, or is it something that's only used on-premises environments, where you don't have those APIs to just get storage on tap?

JARED WATTS: Yeah. I think Rook does its best, it shines, in on-premise bare metal type of environments, or maybe hybrid environments as well, where you have the raw machinery and you've got Kubernetes, but you don't necessarily have that rich set of cloud provider managed services that can provide you with you persistent disks, or databases, or message queues, et cetera, all those difference storage solutions that are available when you're running in the public cloud. So I think that it really is a matter of your environment and your use case.

I think there's a great case to be had for using these cloud provider managed services. They're reliable. There's an SLA for them. You pay for them, but you get a quality service in return. So when you're running in the public cloud, I think your application consuming these managed services is great. But then when you're running on-premise, or in maybe more hybrid environments, I think Rook does a great job with providing you with reliable storage directly inside your Kubernetes clusters.

ADAM GLICK: You mentioned hybrid environments there. When you think about people that are spreading their clusters across, possibly, multiple separate data centers, or between the cloud and on-premises, how does Rook help with the storage in those situations?

JARED WATTS: Yeah. So I think that there is more work to do there, definitely. But I think that when you have these environments that you're spreading your data across multiple data centers and you have different constraints, or different setups and availabilities in these different regions and different data centers, you sort of need a little bit of a different solution. So currently, Rook does very well with a single cluster deployments, but there is more work to be done there to kind of handle that orchestration and scheduling and management of storage when you get to environments that are spanning multiple clouds, multiple environments.

CRAIG BOX: If someone is running in a hybrid environment and they have access to cloud provider APIs and some of the machines in some of their clusters, and they're running on-prem and they're using Rock and Ceph to get storage on the on-prem environment, is there a case for running Rook in the cloud? And will it help you port data volumes, perhaps, between those two environments?

JARED WATTS: Yeah. That definitely is possible and does help with that scenario. So what I see right now is that data migration, or data communication across different clouds right now is really handled by the underlying storage technologies, such as Ceph. So Rook would be able to deploy Ceph into those varying environments, on-premise and in cloud, and be able to set up the machinery in networking there to be able to share data across those environments. But I think that that can be taken even further in that that's definitely something that we want to be looking into, going down the road of being able to manage your portable workloads across all of your various environments in a more seamless way.

CRAIG BOX: What's the story of the name?

JARED WATTS: The Rook name. As mentioned earlier, the Rook project came out of the company called Quantum Corporation, and it was an internal project called Castle. So building on that theme and wanting to find a cool-sounding name, first off, that is not trademarked or have any other prior organizations using it. We wanted to represent the castle chess piece of rook with that name and logo.

CRAIG BOX: I think the very first hard drive I ever bought was a Quantum.

JARED WATTS: I hear that a lot. When the Quantum name comes up, people say that they used to own hard drives back in the day from them, or that they know that they're still famous for tape storage.

ADAM GLICK: Rook is currently part of the CNCF. How is it progressing through the CNCF?

JARED WATTS: Yeah, so Rook first became a CNCF project at the sandbox level back in January of 2018. And the growth that we demonstrated throughout the months after that got to-- I think it was about August or so that we made a proposal to graduate Rook from the entry sandbox level to the incubation level.

And to do that, we needed to demonstrate that Rook has a healthy flow of commits, contributors, and maintainer team across multiple organizations, multiple vested organizations, production deployments, user testimonials. All sorts of data needed to be gathered. And that was actually a very fun experience that I got to go through because I got to talk to a whole lot of our Rook users and learn more about their stories and their enthusiasm and get more involved with some of the rest of the community, which was really fun.

CRAIG BOX: And what will it take for Rook to graduate to a top level project in the CNCF?

JARED WATTS: The conditions that need to be met to be a fully graduated project in the CNCF are pretty high. I think the only graduated projects so far are Kubernetes, Prometheus, and I think Envoy graduated just recently as well. So those projects, the level, the bar there, has to be really, really high.

And so what I think we're going to need to do is that we definitely need more production usage to find and sort out those pesky bugs that really only reproduce under stressful situations at scale. The fun bugs to find, right? And I think that Rook also needs a little bit more growth in the developer and contributor community. We have over 100 contributors already on the project, but I would love to have more commits and bigger features being implemented by new people in the community that want to get involved.

CRAIG BOX: You work now for Upbound, which is a company founded to work on Rook. What's the story of Upbound? How was it founded?

JARED WATTS: So Upbound is a startup company in Seattle that was founded by Bassam Tabbara, who is another one of the maintainers on the Rook project, the founder of the Rook project. So we were both at Quantum together. And Bassam left to go start the Upbound venture, I think, in probably late 2017 or so. And I was not too far behind him, joining at Upbound as well in early 2018.

So really, what Upbound wants to accomplish is that Upbound wants to be the platform for multi-cloud environments and multi-cloud deployments with an amazing user experience for managing these multi-cloud environments that people are starting to move to to be able to have choice about cost and provider features and innovation from the various cloud providers and policy constraints, et cetera.

So Upbound really wants to be the platform for managing multi-cloud. Storage is a big part of that, where one of the biggest problems with being able to have portable applications and portable workloads is the storage component of it. So what we're seeing is that there is a whole lot of convergence on the Kubernetes API and the Kubernetes ecosystem, with all the tools and software and projects built around that.

So being able to enable freedom to move across cloud providers, a huge part of that is the data. Data gravity is the big thing, of once you're running in a particular environments, how do you move that to another environment freely, or how do you communicate across these various environments with data and communication? So it's difficult.

ADAM GLICK: What parts of the roadmap are you most looking forward to? Obviously that you've got to build a lot of things and a lot more to go, as we've talked about, especially as you expand to hybrid multi-cloud kind of world, the other file systems. What are you most excited about building next for Rook?

JARED WATTS: So one of the features that I'm personally most looking forward to is in the 0.9 release, Ceph is declared stable. So we're making a statement that the API will remain stable and new features that added will be done in a backwards compatible way and that it's ready to have more production adoption than it already has.

Another thing I'm very excited about in Rook is more storage riders being integrated. We started with just Ceph, but we already have three or four other ones now, with more coming along the way. So building the Rook ecosystem and providing more storage options and solutions to more users in cloud-native environments is something I'm very excited about.

CRAIG BOX: With the Container Storage Interface recently having gone 1.0, and with Rook having now stable support for Ceph, when those building blocks are there, when everything is stable, what things do you think you can do on top of them? Once you've solved the underlying problem, what are the more exciting things that you think you might be able to build next?

JARED WATTS: Yeah. I think that gets back to some more of those interesting scenarios, where you're running in more complicated environments. And you've got redundancy and availability across multiple environments and regions for a disaster recovery and performance by making sure that the data is placed where the user is, closer to where the end consumer of the data is. So I think you start getting into those more interesting scenarios of performance and reliability across in a global scale.

CRAIG BOX: And if people want to get involved with making that possible, what should they do?

JARED WATTS: So Rook is an-open source project, the Apache 2.0 license on Git Hub. So if you go to GitHub, rook/rook, we are always accepting new contributors. We love new features and bug reports, and everything from the community there. So that's a great place to go and participate on a developer level.

And then as a user, you just go to rook.io, the website there, and that has links to everything-- usage and our Slack channel, which we're very, very active on, Twitter, all those various means of getting involved and staying up-to-date with Rook.

CRAIG BOX: Are you worried that a more powerful storage orchestrator called Queen might come along?


JARED WATTS: That absolutely always is, and her various moves that she'll be able to execute would definitely scare me.

ADAM GLICK: I used to play when I was younger, but are you a chess player?

JARED WATTS: Very, very, very casual, and I normally don't necessarily win against people I play against. So that would be just a fun thing.

CRAIG BOX: I was only ever at the sandbox level in chess.

JARED WATTS: Yeah, exactly.


CRAIG BOX: All right, Jared, thank you very much for joining us today.

JARED WATTS: Thank you so much for having me, guys.

ADAM GLICK: You can find Jared on Twitter @jbw976. And you can find his writings on the Rook blog at blog.rook.io.

CRAIG BOX: Were jbw 1 through 975 taken?

JARED WATTS: Yeah. It took a long time with brute force to finally get to the 976. It was accepted and went through.

ADAM GLICK: Out of curiosity, do you have to pay $0.90 a minute to send you Twitter messages?

JARED WATTS: [LAUGHING] Nope. There's no constraints like that.


CRAIG BOX: Thanks for listening. As always, if you've enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @kubernetespod, or reach us by email at kubernetespodcast@google.com.

ADAM GLICK: You can also check out our website at kubernetespodcast.com, where you can find transcripts of all of our shows. Until next time, take care.

CRAIG BOX: See you next week.