Kubernetes Podcast from Google: Episode 182 - Cloud Native Storage, with Alex Chircop

#182 June 10, 2022

Cloud Native Storage, with Alex Chircop

Hosts: Craig Box

As we move further up the stack, we rely on many foundations – including storage. Alex Chircop is co-chair of the CNCF Storage Technical Advisory Group (TAG), as well as founder and CEO of Ondat (formerly StorageOS). Join us to learn why no app is truly stateless, and how data is the new storage.

Do you have something cool to share? Some questions? Let us know:

CRAIG BOX: Hi, and welcome to the Kubernetes Podcast from Google. I'm your host Craig Box.

[MUSIC PLAYING]

CRAIG BOX: In December 2019, I bought tickets to see one of my favorite bands, Crowded House, who had re-reformed and were playing in July 2020. Then the big thing which we try not to remember anymore happened. And the show was rescheduled to June 2021. Guess what? More lockdowns, more postponement, this time to June 2022.

Meanwhile, my tickets were returned due to my upcoming relocation to New Zealand. And to add irony to insult, Crowded House had already played in New Zealand in March '21 because of that whole “New Zealand was open while everywhere else in the world was closed” thing. Well, you may recall that I am in Europe a while after having been to KubeCon And it happens to be June 2022.

Guess who went to see a real live band on Tuesday? It's been a while. It was odd in a way. Life was all face coverings and social distancing when I left the UK. And now, it's like nothing ever happened. Thousands of people in a room with only the road crew wearing masks to remind us. Frontman Neil Finn caught COVID while the band was on tour in Australia in April, forcing the postponement of a number of dates.

Anyway, the show was great. There are some video snippets on my Twitter as promised. The new lineup includes their original producer on keyboards. And they put a lot of effort into making the old songs sound much more like the studio versions. There were also a handful of new songs which, of course, the audience completely ignored.

Meanwhile, a friend who has just been in Valencia for entirely non-Kubernetes reasons, has come back to London and tested positive after feeling ill for a few days. I don't think that counts as #KubeCOVID, but we do live in interesting times. Stay well, everyone. And let's get to the news.

[MUSIC PLAYING]

CRAIG BOX: The Kubernetes project has published its second Annual Report, providing quantitative and qualitative measures of community health as measured by project milestones and reported by community leaders. The report states there have been 62,000 contributors to Kubernetes with 10,000 added in the last 12 months. It calls out both additions and removals to and from the project, as well as good and bad milestones touching on regressions and burnout. Current initiatives and areas of help wanted are listed for each SIG, and you can read more about those groups in their own sub reports.

Conference season continues with SuseCON this week. A new release of SUSE Linux Enterprise is now SLSA Level 4 compliant, the first piece of software I have heard of that claims to meet this bar. Rancher Desktop has released version 1.4, now offering registry credential helpers, and Longhorn has released version 1.3, now configurable with CRDs.

ContinuousDeliveryCon opens this week with a "State of CD" report. Key findings include that 77% of developers are associated with some DevOps function. Those using containers are twice as likely to have lead times of less than one day compared to those not working with container technologies. Likewise, Kubernetes users are one third likely to be top performers than those who do not use it. Pat yourself on the back and go and read the report for free, linked in the show notes.

A new project has been founded to handle measurement and allocation of infrastructure and container costs in real-time. OpenCost is a spin off of the Kubecost platform, being its core allocation engine. It also includes a brand new community-driven spec with support from companies as diverse as Google, Adobe, SUSE, and New Relic. The project has been submitted to the CNCF as a sandbox project. Learn more about Kubecost in episode 124.

The CNCF has funded the security audit of the CRI-O project, performed by the Open Source Technology Improvement Fund, ADA Logics, and Chainguard. The single high-severity finding is a denial of service attack on the cluster by way of exhausting resources on a node. Interestingly, the same attack was found to apply to Containerd. Both runtimes have a security advisory and patched versions available. This report includes a supply chain review for the first time in the CNCF security audit, performed by Chainguard, which suggests that SLSA level 1 compliance is within reach for the project.

Beware if you use Bitnami's Helm charts. Their repository index has been pruned to only include charts published in the last six months. The index YAML file was 14 megabytes and couldn't be compressed due to the Helm client not supporting this. So it was cut to save on transfer costs. 12,000 chart versions totaling 10 megabytes were removed, which broke a number of people's builds over the weekend. With the comments on the GitHub issue creeping into the hundreds, we'll keep you updated on any eventual fix.

Finally, the CNCF is improving its Code of Conduct processes to be more transparent to include community representation and to balance project self-governance with foundation support. This includes an immediate change to bring community members to support CNCF staff and setting up a working group for a four to six-month project to figure out a new structure and process. There will be multiple calls for input from the entire community throughout. And that's the news.

[MUSIC PLAYING]

CRAIG BOX: Alex Chircop is a founder and CEO of Ondat, formerly StorageOS, and a co-chair of the CNCF Storage TAG. Welcome to the show, Alex.

ALEX CHIRCOP: Hi, Craig. It's really great to be here.

CRAIG BOX: Before founding the company you work at now, you were at Goldman Sachs. That was about 10 years ago. Would you recognize the company from 10 years ago today? I know that they're a big Google Cloud customer. I know they've gone through a big cloud native transformation. Is it the same place?

ALEX CHIRCOP: Well, that's actually tricky because it is 10 years since I've been there. But I think Goldman Sachs has always been a very innovative company. And they've always been trying to build interesting environments. In fact, even before I had left Goldman's, we were in the process of building private clouds.

And so, yeah, I think I would recognize some of it because some of the people and some of the colleagues that I used to work with are still there. It was a great set of people. But I think, yeah, it's hard to actually comment first hand given the amount of time that's passed.

CRAIG BOX: Well, I'm very impressed with how the finance industry led the jump to Kubernetes much earlier than you would have thought for what you might think of as stodgy hundreds of year old companies.

ALEX CHIRCOP: I'm not actually that surprised. I think what we see is service providers and finance companies tend to be slightly ahead of the maturity model in many of the technology adoptions. And that's just because of the size of the environment and the rate of innovation. But also, one of the key things that I think financial services in general wanted to achieve was a quicker time to market, because competition is fierce. And bringing applications and product science to end users is a key differentiating factor.

And so having the ability to abstract away the infrastructure and provide developers with a way of defining what their application needs to run, and something like Kubernetes just makes it happen, is kind of nirvana. So even 10 years ago, I was working on similar projects at Goldman Sachs, where most of our peers at the time.

CRAIG BOX: It's always good to speak to a CEO whose first jobs were a programmer and system administrator. Can you walk us through your career to date?

ALEX CHIRCOP: Yeah, I'm a technologist at heart. And I've probably been focusing on infrastructure engineering for 20, 25 years before doing this startup adventure. And I think when I first started, it was doing programming for a really straightforward accountancy product. I remember learning accounts to help me code an accountancy product. I remember those days very fondly.

But I quickly moved into SysAdmin and Unix and found that I really enjoyed learning about kernels, and networking, and how all of that infrastructure hung together, including storage, of course. And then, my first encounters with Linux, which were receiving this big box of floppies, as I lovingly installed on my PC.

And then, of course, all of the changes that came along with the internet and setting up one of the first internet connections in Malta, which is where I was living at the time. And then, almost after that, focused entirely on financial services and the infrastructure in financial services.

So that included a variety of different roles. And I covered automation and other types of infrastructure areas, not just the Unix and storage, but expanded there and became more focused on delivering what application developers needed out of the infrastructure rather than just pure infrastructure plane.

CRAIG BOX: We're both in London this week. And I think I might be the only person in London who's never actually worked in financial services. Do you think that the industry has changed here in terms of what jobs people expect to get out of school doing tech work?

ALEX CHIRCOP: I think financial services in London is big, just like financial services in New York, I guess. That said, the big tech companies like Google and Amazon and Facebook and Microsoft, et cetera, also have big presences here. But we're also seeing a very big startup culture. And I think there are lots of big and small startups which are based in London and growing throughout Europe, which is an interesting trend from the typical Silicon Valley and West Coast-type startup environments.

CRAIG BOX: We talked to Justin Santa Barbara last week, who said that he had to leave London then go to Silicon Valley to get investors and start his startup. Do you think there's a connection between the two things that you just mentioned? Do you think that the fact that the large tech companies have been in Europe for a while now has perhaps brought a set of people who are willing to take bets on startups as employees, as well as investment opportunity?

ALEX CHIRCOP: I think it's definitely been evolving and changing in the years that I've been running a startup. I can certainly say it's easier to get funding if you are an American entity rather than a London entity. Just the access to the opportunities, I guess, is probably the big differentiator today. And certainly, throughout the funding process, at certain points in the process, I was asked a few times, when are you moving into California? And the answer was, well, I'm not.

CRAIG BOX: There's much less rain there.

ALEX CHIRCOP: There is much less rain there. And I guess if I was at a different stage in my life, maybe that might be a thing. But I've got family, and kids, and everything else. That also matters. But also, there is excellent talent in London and in Europe. And there isn't, honestly, a monopoly of location anymore. And then, I think the pandemic and remote working has probably proven that more than anything else now.

CRAIG BOX: How was it then that you came to be co-founder of a startup?

ALEX CHIRCOP: I was seeing this demand for infrastructure automation. Even when I was at Goldman Sachs, we were working on a number of options to provide developers effectively with a way of being declarative about their environment, being able to specify what their applications needed to run in terms of compute, and networking, and security, and the topology, and all of that, sort of, thing.

And then, there was storage. And storage was stuck in these big black boxes locked into data centers, didn't really have APIs, had very proprietary ways of connecting to them and moving to them. But we had an issue with providing the automation. And I came to the realization that we had compute iterating and being fairly commoditized with hypervisors and VMs.

And we had software-defined networking. But storage was still stuck. And I got to that point in my career where I said, I've always wanted to build something. And this is a real thing. This needs to be built. And so we set up StorageOS at the time to solve that problem of providing a software solution that could be deployed anywhere and that could have the APIs that supported the automation that developers needed to power what we saw as the future platforms that were coming up.

CRAIG BOX: Was the “we” there a group of people that you'd worked with before?

ALEX CHIRCOP: It started as a very lonely adventure. And then, I quickly started talking to some of my old colleagues and friends. And we set up the company together after spending a number of months experimenting with algorithms, and actually trying things out, and making sure that things actually worked.

And we first showed off our product at TechCrunch in London, which was a bit of a surreal experience. And we got voted at the time as startup most likely to succeed, which was very flattering. Yeah, it was certainly a team effort because you need diversity of ideas, both in terms of technical experience but also business experience, to build the startup successfully.

CRAIG BOX: You talked about building out a private cloud at Goldman before you started StorageOS. And you also mentioned the declarative model at the time. Containers were really starting to become part of the public consciousness. Kubernetes was about six months old at the time StorageOS was founded. What platform in particular were you targeting with your initial technology?

ALEX CHIRCOP: Initially, it was very early on. And containers were present. But they certainly weren't mainstream. And as you said, Kubernetes was very much in its infancy. So what we were targeting was a software-defined solution, probably, that was going to be deployed as an appliance that developers could access over an API. And then I remember this very clearly, we attended KubeCon in London in 2016, March of 2016, which was a very, very embryonic community.

It was a few months after the very first KubeCon in 2015 in North America. And we were looking around. And we were going, so the software is in containers. And the software is portable. And it needs portable storage. And it's all declarative. And we went, this is the thing. This is where the world is going to go. And straight after that, we turned all of our focus to turn it into a containerized storage product and a containerized solution and never looked back.

CRAIG BOX: Was your API designed to sit on top of a tray of physical disks? Or was it designed to sit on top of a cloud provider?

ALEX CHIRCOP: In the very first days, we were looking at boats. We always wanted to be platform-agnostic. And we wanted to be able to use boats, commodity disks, and commodity SSDs that are available, and as well as the cloud instances. But even then, in 2015, 2016, cloud was certainly very popular. But again, it wasn't incredibly mainstream in many enterprises yet. So we were still thinking about having to focus on boats.

CRAIG BOX: As more and more people move to a place where you can call an API and given a block of storage, or a virtual disk, or so on, is there more or less demand for storage software?

ALEX CHIRCOP: I think there's more demand now than ever for storage software because what we're seeing is developers have fully embraced Kubernetes. And I know that this is a fairly cliched thing to say. But there's the phenomenon of shift left, where developers are taking over control and defining in a declarative way what they need out of the environment, not just for what started off with, say, testing and security, but now networking, and availability, and data services too.

And so what we're seeing is this move where a lot of so-called stateless applications have moved into the Kubernetes world. But the reality is, and this is not always a popular statement, but it is a fact, no application is actually stateless. Every application is storing state somewhere, whether it's a database, a message queue, a key value store, an object store, or a file system, or whatever.

And the reality is that most of the stateful components of applications are stuck outside Kubernetes today because the automation, and the scaling, and the capabilities of that storage layer are typically still evolving and locked outside of the Kubernetes world. So they're using databases and a VM and then have the rest of their application running in Kubernetes.

But of course, what that means is that they're not getting all of those benefits that they've lovingly applied to their Kubernetes environment. So Kubernetes, they have a declarative way of doing everything. They have automated security policies. They have automated failovers. They've got automated FIPS. They have automated load balancers, automated healing, and all of the other benefits that Kubernetes brings to the environments.

CRAIG BOX: They barely need to come into work anymore. It's all been automated.

ALEX CHIRCOP: Well, yeah, somebody still needs to look after Kubernetes, I guess. But you know what I mean. The concept is that many enterprises are now at the stage where they end up having two sets of operational run books. They have a set of processes for the stateless parts of their apps and then another set of processes for the stateful set.

And so I think the time is actually ripe for looking at doing the next step in Kubernetes adoption, which is actually bringing the stateful applications into Kubernetes. And we're seeing this not just across storage providers but across a plethora of different database providers. And there's a huge community of operators running stateful applications now in Kubernetes too. So I think it's just the next step of that process as enterprises engage with Kubernetes.

CRAIG BOX: Like you say there, Kubernetes has started out with a big warning label on it saying, only use this for stateless software. And there may or may not be any true stateless software. Kubernetes can store some of that state for you. But let's have a think now about two different personas. First, someone developing a so-called greenfield app today, fully cloud native, perhaps even serverless. And second, someone who is tasked with moving a long-standing enterprise application onto Kubernetes. What does each of those people need to know about storage?

ALEX CHIRCOP: There's probably quite a bit of commonality. I think if somebody is developing a greenfield app, they might have some parts of their application that might be using cloud native storage in a slightly different way. So for example, a lot of greenfield apps have dependencies on object storage, which typically weren't as mainstream in some of the legacy or brownfield systems.

But I think at the end of the day, all of those applications still typically have architectural patterns that depend on a database, or a message queue, or some other function like that. And those databases need to run somewhere. So the question is, how do you automate? And how do you do CI/CD for those parts of your application? What we've seen certainly is the move away from just one type of database.

So we now have lots of different types of databases, lots of different types of message queues, lots of different types of key value stores that embrace the cloud native way. And with my CNCF Storage TAG hat on, we look at the cloud native storage as being everything from traditional volumes to API-driven storage-like databases, and message queues, and key value storage, and object storage too. But I think what we see is the developers might actually choose different types of databases. But ultimately, there's always going to be a database or a message that needs to run there.

CRAIG BOX: That is something I wanted to dig into a little. The Storage TAG, as you mentioned, considers things like databases and message queues as being storage. If you were to take that to some of your previous companies, where you have larger enterprises and teams that work on the different layers, they might think storage is the physical bits in the disks, and so on, that they provide a service to a database team, who then runs something on top of that. Is there a commonality between these two worlds? Or is this now driven by the fact that smaller teams run more pieces of software than they used to?

ALEX CHIRCOP: I think certainly, in the adoption of cloud native, we see organizations going through an evolution, where they might have previously had very dedicated silos that focused entirely on storage, or entirely on databases, or entirely on Unix or networking. And now, we're getting to a point where we have developers and DevOps teams that have a more shared operational and a broader coverage of the technology areas they're looking at.

But I also think that that's because of the automation and the commoditization of a lot of these technologies, which is partly driven by cloud and partly driven by technologies like Kubernetes where, effectively, the developer does now have declarative interfaces to a lot of these things. So I think what you find is that as enterprises adopt these technologies, they end up building the SRE and the DevOps teams. And there are shared teams with broader responsibilities.

And then over time, we're seeing that they end up actually bifurcating a little bit. And you end up with specializations in those groups too. But ultimately, I think the key thing is that developers are probably not that worried anymore about the intricacies of some of the storage environments and what goes into the storage bits and bytes. They're expecting a service definition or a service offering.

And then they want to run their stateful components like databases on that service offering, rather than actually focusing on some of the nuts and bolts and how storage does replication, or data protection, or compression, or whatever services that storage layer is providing. So I think, yeah, developers are getting abstracted away from the storage in much the same way that service accounts and service reps in Kubernetes are abstracting them away from the bits and bytes of load balancers and physical pieces of networking hardware.

CRAIG BOX: It is always nice to move up the stack, just to have an API where I can say, I store something here and it'll have 11 nines of reliability or whatever the number ends up being. There is a big community, though, around data on Kubernetes. Do you find that people associate with one word more than the other? Are there storage clubs and data clubs? Or are they now the same group?

ALEX CHIRCOP: No, I think as the services move up the stack and as storage moves up the stack, the nomenclature probably changes a little bit and becomes more data-focused because it's more about the data services rather than the nuts and bolts of the underlying storage. So a developer is interested in how to achieve high-availability. They're interested in how to achieve disaster recovery. They're interested in how to achieve failover, and recoveries, and those things. And it's less about the underlying storage technology.

So yeah, the nomenclature, I think, has changed as we move up the stack. And certainly, the data in Kubernetes community, which is an amazing community with tons of independent contributors as well as vendors from all parts of the spectrum, has brought together not just storage providers but cloud providers and database providers. And we would work together to be able to provide real-world solutions and use cases for end users to learn how to do these things for themselves.

CRAIG BOX: You are one of the co-leads of the Technical Advisory Group, or TAG, in the CNCF 4 storage. You mentioned being at KubeCon very early on in London. How did you get involved formally with the CNCF and lead to that role?

ALEX CHIRCOP: So it was a little bit of an evolution. The CNCF has the TOC, which is ultimately the committee that makes the technical decisions. And that is the main technical decision-making body for the CNCF. And what we saw was that there were parts of the community within the CNCF as the CNCF was growing that were creating their own working groups.

So for example, a number of us in the storage community had gotten together and created a working group to do, initially, some basic stuff like define what cloud native storage actually meant, and what we thought it should mean, and where it should be going, and what the landscape actually looked like. And we started putting some of the initial white papers together.

In parallel to this, of course, the CNCF was growing at a really fast pace. There were more and more projects joining the foundation. There were more projects applying to join the foundation. There were more end user organizations. There were more CNCF members joining. It grew to 400 or 500 members very, very quickly.

And so there was a realization that the TOC was actually being stretched simply because the amount of manpower that was needed was being stretched. But also, we saw so many areas of cloud native computing come into play, whether that was networking security, storage, runtime, application delivery, and all the different facets. It was actually quite hard to have one group of people that had expertise in everything.

And so around the time, probably sometime in 2018, we started putting together the charters for what was later going to become the CNCF SIGs, which then got renamed to TAGs. And I'll tell that story in a minute. But we started putting together the charter for the SIGs. And the main purpose for them was to effectively act as a scaling capability for the TOC.

We wanted the TOC to be able to have access to a number of professionals in the field that had the expertise in specific areas that could provide unbiased information about the projects that were applying to come into the foundation and help with the project reviews, but also to provide educational material to the end users and continue to put white papers and other articles out.

And I do remember being in Barcelona for KubeCon just about maybe 1/2 an hour before my talk about the storage working group when Chris Aniszczyk sent me an email and said, hey, you know the SIGs? They got formally approved this morning. Can you talk about it in your talk? And so the storage stake at the time was the first stake created. And I was on the stage at KubeCon updating my slides five minutes before my talk began. And we announced the SIGs, which was crazy and exciting, both at the same time.

And then after that, we had this discussion where we agreed that having CNCF SIGs and Kubernetes SIGs was getting a little confusing, because when we were referring to SIG, it was hard to differentiate between them. And so we decided to just change the nomenclature to TAG, which is a Technical Advisory Group, instead of a special interest group, mainly because what our role is providing, technical advice to the TOC.

And so the TAGs today, now there are multiple TAGs. I think there might be eight or nine of them, who we're still carrying on that original charter, where we're providing the expertise and the end user engagements for the TOC. But it's ultimately a voluntary activity that's part of the community. And ultimately, the TOC retains all of the decision-making responsibilities. We're there to help out, effectively.

CRAIG BOX: When we think about the Kubernetes Storage SIG, we think about things like, they contain the storage interface and the work that's going on currently to move drivers to out-of-tree plugins. What interface do you have with that group?

ALEX CHIRCOP: So there is a certain amount of overlap. And certainly within the TAG, there's a good amount of crossover. So for example, my co-chair Xing Yang is also one of the chairs for the Kubernetes SIG. We work in fairly different ways, though.

The Kubernetes SIG is more focused on Kubernetes specifically and the project work that's happening within Kubernetes for things like CSI, but also recently, things like COSI, which is an initiative similar to CSI but to applying to object stores and other activities specific to Kubernetes. Whereas, a TAG is focused on things like the cloud native projects that the CNCF is reviewing and evaluating, as well as providing informational content for end users.

CRAIG BOX: I see your other co-chair is Quentin Hall. Has he tried to federate anything lately?

[LAUGHTER]

ALEX CHIRCOP: So Quentin was pivotal to actually setting up the SIGs in the first place. And he was one of the original authors for putting all of the charters together. And this was at a time when he actually was a member of the TOC at the time.

He's very much a roving, if that's the right term, resource and focuses his time on creating and starting up initiatives before moving on to the next interesting thing. So he's very much a bit of a game-changer within the CNCF, as are all of the people, in fact, who have been part of the TOC over the years.

CRAIG BOX: Two of the major pieces of output from your group have been white papers on storage and disaster recovery. What was the process of putting those together?

ALEX CHIRCOP: We started with the storage white paper, which was effectively an educational tool to get end users to actually understand what they needed out of their storage system. So it was there to explain how storage systems work at a broad level, but more importantly, how to define the attributes that developers needed for their applications.

And then we defined a number of attributes, like scaling, and performance, and failover, and other capabilities, like durability, for example. And the idea was that not all storage systems are built the same. And each storage system typically is optimized for one or more attributes. And that leads to compromises in other attributes.

So as a simple example, if you're optimizing for scale, and you're distributing data and charting data across many, many nodes, you're probably taking some compromises on latency. Or perhaps you might have to take a compromise on consistency. And so developers need to understand what those attributes are and how they apply to the storage system, and then understand how some of the systems are built in layers.

So there are so many layers in a storage system today that we tend to have to break away from some of the traditional perspectives of how storage systems were defined. If we look back 10, 15 years ago, we could say, oh, if you have a file system, well, that's great for sharing. And it's great for throughput, but probably not great for latency. And if there was block storage or a SAN, for example, you might think of it as a very fast performance, low-latency.

But the reality is, nowadays, a lot of these systems and a lot of these interfaces are built on multiple different layers. So it's not uncommon, for example, to have a file system built on an object store. So it has the sharing attributes of a file system but the latency attributes of an object store, for example. And so understanding those concepts is really good when a developer is trying to select which different components or which different services to deploy for their application or their stateful workloads.

And then finally, we talked about not just the data interfaces but also things like the management interfaces, so things like CSI, and how Kubernetes can automate those environments and make them declarative, and how some of that applies to a volumes work but also covering things like databases and key value stores, et cetera.

And once we finished that white paper, we decided that we would focus in on some of the specific attributes and delve in a bit deeper there. So the two that we decided to focus on first based on end user feedback was performance and disaster recovery.

And we launched a disaster recovery reference white paper for cloud native. And we covered that off at our recent KubeCon talk as well. And that talks about the differences between native or traditional disaster recovery methods and the automation that you can achieve with a cloud native model.

And also, then in the performance side of things, we talked about how an end user would look at evaluating the performance and the things that they should consider when evaluating the performance of things like volumes or databases within their environments.

And actually, the crux of that particular document was there are lots and lots of gotchas. And it's really, really hard to do performance benchmarking well in any meaningful way. Never trust the vendor-specific benchmarks. And always benchmark your own stuff in your own environments was the TLDR for that document.

CRAIG BOX: You obviously have a great overview of the space. And you also run a company in it. And that must give you the chance to say, hey, here are gaps and things that need to be filled. Let's talk a little bit now about Ondat, which is the new name for StorageOS. First of all, is OnDat like UpDog?

[LAUGHTER]

ALEX CHIRCOP: No, so this comes back to some of the decisions we were talking about earlier in the way that developed for the perceived storage. So what we realized was that developers weren't really looking for a storage system. They were looking for a data platform. They were looking for a way of running their stateful applications and running their data services.

And so we decided to move away from the storage name to more of a data services oriented name. And that was the reason why we rebranded. And we've got to say, it has been very successful to date. We've certainly gotten a lot more developer engagement as a result. And I think it works well with the trends that we're seeing in this cloud native environment today.

CRAIG BOX: That name change obviously reflects changes to your product over time as well. What are those changes? How have you evolved the product since it launched in 2016?

ALEX CHIRCOP: There's been a lot of work in the product in three different areas. On the first side of things, we have the data plane, which provides the data services and the instantiation of a volume. And we've created this mesh that effectively makes your storage application-centric, because I think one of the biggest challenges and one of the big differentiators between traditional storage and cloud native storage is that cloud native storage by definition needs to be application-centric.

So many of the traditional storage solutions are focused on providing storage to a server, or a node, or an operating system, which is kind of OK. But we're now in a world where applications are containerized. They're portable. They're meant to move around nodes. And Kubernetes, by definition, tends to have lots of moving parts, where clusters scale up and scale down, and nodes get replaced on demand, et cetera. So we needed to build a data plane that allowed that portability and allowed volumes and data to move with applications.

And then it struck us that one of the big challenges behind that is, of course, the control plane. We needed to create a control plane that could deal with the rate of change in cloud native systems, which, again, is another differentiator between traditional and cloud native. And so we built our control plane with something we call disaggregated consensus, which is a fancy term for saying that every volume has its own mini brain and is able to make its own placement decisions and failover decisions independent of all of the other volumes.

So this gives you a reduced blast radius when something goes wrong in the cluster. But it also means that you can implement the rate of change that is much more typical in these cloud native environments, where driven by automation or CI/CD, for example, you might be creating and destroying tens of thousands of volumes every day. And then finally, the other thing was, of course, to make it easy to use and to integrate well as a Kubernetes-native product.

So we needed it to not just work with Kubernetes in the basic ways of CSI, but also to provide additional functionality like providing data locality by working with the Kubernetes Scheduler, and optimizing placement decisions, and being able to do topology-aware placement for volumes to make sure that, for example, data is available across availability zones or available across racks on-prem, et cetera, for availability.

So it's three things. It's the Kubernetes function. And it's the scaling from the control plane and the reliability and performance of the data plane that bring the whole thing together.

CRAIG BOX: I'm not sure you could launch a startup in observability today, to pick just one area, without the product being fully open source. Storage seems like it's a little bit different to me. Which parts of your product offering, if any, are open source? And how do you make that choice?

ALEX CHIRCOP: We're a little different in that we have a closed core and an open ecosystem model. So effectively, the core data plane and control plane is closed. And that is primarily to give us the ability to innovate and iterate very quickly.

And then anything that interacts with the ecosystem — so things like operators, CSI drivers, use cases, and we've also launched a number of open source projects to integrate with cloud providers and integrate with key management systems — all of those type of ecosystem connections are open source. And that is primarily to make it easier to integrate the product to as many different ecosystem components as possible.

CRAIG BOX: In November 2019, you went from being the CTO to the CEO of the company. What was that change like for you personally? What did you have to learn and change during that transition factoring in also that it was right before the pandemic?

ALEX CHIRCOP: Yeah, that was an interesting time. I think when you're a founder, there's always a certain amount of overlap between all of the different roles. Certainly, I had to be a bit more focused on business functions and not just the technology. Although, of course, my first love is technology. In our type of world, building products, which appeals to the engineers, means that it sells to the enterprises too.

So there was a bit of an overlap. But it was certainly an interesting and challenging transition. And like you said, it was just before the pandemic, which, of course, nobody could plan for. And then I had to run a funding round during the pandemic, which, again, was also a bit of a surreal experience.

Trying to pitch to VCs that you've never met and only talked to virtually to hand over millions of dollars because they believe in your company was a very different type of challenge to the ones I had solved previously. But it was certainly a very satisfying role. And there's nothing quite like building something and watching your baby grow and flourish over the years.

CRAIG BOX: Where do you want to take things over the next few years? Do you think there are any great unsolved problems in the ecosystem that could be taken on either by yourself in a commercial sense or by the community at large?

ALEX CHIRCOP: I think we're at an interesting dilemma within the Kubernetes world today. In almost every metric, it's probably safe to say that Kubernetes has crossed the metaphorical chasm. And I think every enterprise everywhere is either building Kubernetes, or working with Kubernetes, or evaluating Kubernetes in some way.

But I think, like many things, the gravity of those existing systems and the existing infrastructure is probably, actually, quite weighty. And it's going to take a while to move things across. So I think the thing that the community needs to focus on now is not just providing raw capability, but also providing the ability to grow within these environments and provide the ability to create integrations that allow the traditional environments to move into the cloud native environments.

So when I was at KubeCon recently, I was sitting in a room, probably about 300 odd people. And somebody asked, how many of you are running more than 50 or 100 Kubernetes clusters? And around 60% to 70% of the room put their hand up.

And so I think what we're seeing now is the next stages of the challenge are going to be around, well, how do you manage that many clusters? And how do you solve for multicluster? How do you solve for multicloud? How do you solve for our hybrid clients? How do you federate those systems?

And when it comes to storage and data, how do you share volumes? And how do you share databases between those different systems? And how do you migrate the data? And how do you replicate the data and protect the data across all of those different systems? I think we're at the beginning of a very exciting journey up ahead, where now we're going to see not just the adoption but the reliefs and anger across a lot of enterprises, which is quite frankly super exciting.

CRAIG BOX: Well, I look forward to seeing that. And thank you very much for joining us today, Alex.

ALEX CHIRCOP: Thank you so much for having me.

CRAIG BOX: You can find Alex on Twitter at chira001,. and you can find Ondat at ondat.io.

[MUSIC PLAYING]

CRAIG BOX: Thank you, as always, for listening. If you've enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @KubernetesPod. Or reach us by email at kubernetespodcast@google.com. You can also check out the website at kubernetespodcast.com, where we keep transcripts and show notes, as well as the secret link to subscribe. Please consider rating us in your podcast player so we can help more people find and enjoy the show. We'll see you next week.

[MUSIC PLAYING]

View More Episodes

Cloud Native Storage, with Alex Chircop

Chatter of the week

News of the week

Links from the interview

Transcript