Kubernetes Podcast from Google: Episode 119 - Keptn, with Alois Reitbauer

#119 September 2, 2020

Keptn, with Alois Reitbauer

Hosts: Craig Box, Adam Glick

Keptn, a control plane for continuous delivery, came out of the need to install Dynatrace’s software at their customer’s environments. Alois Reitbauer is Chief Technical Strategist at Dynatrace, reponsible for open source, and a co-chair of the CNCF App Delivery SIG. He talks to your hosts about Keptn, observability after deployment, and how owning a 40 year old sports car is more “curation” than “operation”.

Do you have something cool to share? Some questions? Let us know:

ADAM GLICK: Hi, and welcome to the Kubernetes Podcast from Google. I'm Adam Glick.

CRAIG BOX: And I'm Craig Box.

[MUSIC PLAYING]

This is episode 119. Unfortunately, Kubernetes 1.19 was released last week-- actually one day after it was due. They couldn't hold it one more week and make it line up. Would've been lovely synchronicity.

ADAM GLICK: Yes, well, it is eventually consistent. And, you know, worse comes to worst, we can just chalk it up to an off by one error.

CRAIG BOX: What's new in the little free library?

ADAM GLICK: It continues to be impressive-- the number of books. "Thinking Fast or Slow--"

CRAIG BOX: That's a good book.

ADAM GLICK: --showed up this week, which is a great book if you haven't read it. There's lots of really interesting pieces in that. But apparently the free stuff on a curb game in the neighborhood has really been upped.

People, especially at this time, are just kind of giving things away, putting it on the curb, and putting a free sign on it-- as opposed to taking it to thrift stores that might not be open, for instance, or trying to sell it.

CRAIG BOX: Right.

ADAM GLICK: And you're used to seeing, you know, people put out a chair or a desk, things like that. But the game was really up this week when I walked past a 55 inch plasma HD television.

[LAUGHS]

It had a sign on it that said, "free, works." And that's a whole new category of stuff people are giving away at that point.

CRAIG BOX: Those used to be valuable.

ADAM GLICK: Apparently so. But now everyone, I think, wants something 4k.

CRAIG BOX: I do want to call out the relative sizes of the streets, if you're saying people are leaving mattresses and TVs out there. Occasionally you'll get a book or something on the little tiny streets of London, but I can't imagine-- you could block the entire lane by putting a 55 inch TV across it.

ADAM GLICK: [CHUCKLES] Well, it is on the berm.

CRAIG BOX: One thing I did see in my walks recently-- as you've said, the thrift stores are all closed and a lot of people have been dumping things outside. And there was someone who must really have liked "Sex and the City" back in the day. And in the day, it must have been a while ago, because it was all VHS.

So there were a number of VHS tapes. And unfortunately, one or two of them had been smashed. And it's just-- it's a shame. Someone clearly loved this enough to purchase it and then cared enough to donate it almost to a thrift store.

ADAM GLICK: [CHUCKLES]

Now they'll never know if she gets together with Mr. Big at the end or not. The mystery will be forever wondering.

CRAIG BOX: I'm sure you can get it on Netflix if you need to.

ADAM GLICK: Shall we get to the news?

CRAIG BOX: Let's get to the news.

[MUSIC PLAYING]

ADAM GLICK: Bradley Wong and Matt DeLio from Google Cloud posted this week about Anthos Attached Clusters, a new feature that allows you to use Anthos as the management control plane for non-Anthos Kubernetes Clusters-- such as those in Amazon EKS or Azure Kubernetes Service.

The post talks about how you can apply consistent policy control across a variety of clusters all through Anthos and discusses the use of an installable agent that is used to avoid exposing the cluster API to the internet. Support for more Kubernetes distros and services is promised, as well as further features for cluster management.

CRAIG BOX: Speaking of Anthos, Google Cloud announced two new ways to purchase it this week. The first is a pay as you go option that requires no commitment and is billed hourly or monthly.

The second is a term based subscription model that allows customers to commit to an amount of spend, but provides the flexibility to use that money towards Anthos running whatever they choose. The subscription is available at a 33% discount off the pay as you go rate.

ADAM GLICK: Kubernetes made it to the main Google blog this week. The keyword celebrated the five year anniversary of GKE with a post interviewing Michelle Au, Janet Kuo, and Purvi Desai-- three engineers working on Kubernetes. If you need to explain GKE to a 5-year-old, this post may be especially useful.

CRAIG BOX: Cloudian-- a maker of on-prem object storage software that uses the S3 API-- announced a Kubernetes operator for such. Their HyperStore platform promises cloud-like storage in your own data center with mirroring to clouds. The operator is available in Beta.

ADAM GLICK: Canonical is fast out of the gate with Kubernetes 1.19, having updated their MicroK8s and Charmed Kubernetes distributions to use the newly released Kubernetes version.

Additionally, MicroK8s added a number of improvements, including Ingress Support for UDP and TCP, user provision CA support, registry improvements, and updates to many add-ons, including Istio and Prometheus.

Charmed Kubernetes also got additional updates, including support for IPv6 as SR-IOV, Ubuntu 20.04 LTS, and CIS benchmark compliance.

CRAIG BOX: Developer tools company Portainer-- named for a shipping crane-- has launched version 2.0 of their self-titled Portainer CE software. The main new feature is catching up on this new Kubernetes thing, which joins their support for Docker Swarm and Azure Container Instances.

Portainer is a New Zealand company, which you'll quickly learn if you watch the introductory videos. They've also just received a seed round of $1.2 million US, which works out to just over $20 billion Hobbit bucks.

ADAM GLICK: Do you love debugging issues using the kubectl API? If so, you can skip this piece of news.

Otherwise, you may be interested in a post by Yolan Vloeberghs and Peter Vincken that reviews a number of tools that simplify things you may be using kubectl for. The post covers K9s, Octant, Lens, Kubenav, and Infra.App.

The general takeaway is that these tools can greatly increase your ability to get things done. And picking the right one for your task might save you some time while getting you out of the world of typing kubectl command line switches.

CRAIG BOX: Finally, if you happen to end this week's episode with a new interest in distributed tracing, check out Jonathan Gold's guide to OpenTracing, OpenCensus, and OpenTelemetry. He covers the history and benefits of each, as well as giving an overview to help you understand the space and why these tools are needed.

ADAM GLICK: And that's the news.

[MUSIC PLAYING]

Alois Reitbauer is the chief technical strategist at Dynatrace, with responsibilities including open source and technology research. He's a co-chair of the CNCF App Delivery SIG, a founding member of the W3C Distributed Tracing Working Group, and an early collaborator around the OpenTelemetry project.

Welcome to the show, Alois.

ALOIS REITBAUER: Hello, everyone.

CRAIG BOX: You've worked for Dynatrace for 13 years now. But I know that you've had three different names on your business card.

ALOIS REITBAUER: Yeah, that's true. And also different ways Dynatrace was actually written. I started very early on. In the early days when Dynatrace was really a startup, there was always this joke going around with the small D and the capital T. There is even a video about that very initial writing of Dynatrace.

Dynatrace then at some point got acquired by a company called Compuware. And then eventually the companies split ways again and we became Dynatrace again, this time with the capital D and the minor T.

CRAIG BOX: I'm sure that's very important for trademark reasons.

ALOIS REITBAUER: It was back in the day. Like 13 years ago, if you wanted to be a cool company you have to start not with a capital letter. That's not how you did it.

ADAM GLICK: Camel casing for the win.

[CHUCKLES]

ALOIS REITBAUER: Yes.

CRAIG BOX: Is it safe to assume that Dynatrace is a tracing technology company?

ALOIS REITBAUER: Yeah, so that's how Dynatrace started. So the whole idea behind Dynatrace in the very beginning was to build distributed tracing in a way that you could do it for production environments. 13 years ago, it was actually a very big deal doing it fully automatically in large scale environments.

But then it really evolved from what it could do, moving more and more into the analytics space as well-- analyzing the data. What we basically saw was we were collecting more and more data and just the data collection itself was not good enough anymore. We had to do more analytics and eventually also build automation on top of it. That's a very gist of the evolution of that space.

ADAM GLICK: Was the product open source?

ALOIS REITBAUER: No. That is a product, maybe it really was open source.Other [? Other ?] from its history, we just developed it back then as a traditional enterprise type of product with our own technology, with a lot of inventions that we had in there.

And quite frankly, I think it wasn't always all that important back in the day to be open source. Because people were mostly consuming the technology, rather than wanting to actively contribute.

ADAM GLICK: If your background was proprietary enterprise software, how did you get started in open source?

ALOIS REITBAUER: I think we got started-- or myself got started the way a lot of people get started. So the first-- when you get in touch with an open source project, usually it's you're using it. It is library, but it's an HTTP library. Whatever it is, you started using that library. That's the first way you get in touch with the project.

Then you maybe find a bug in that library and realize it might be a good idea to fix that bug. So you start by contacting the maintainers and say there is a bug. And they tell you, yeah, OK. For the project fix the bug. We have to look at it. And then we get it in there.

So that's how we got started. And actually, there are a lot of contributions from the Dynatrace site to open source projects exactly that way, where we're using a lot of libraries internally and contributing back to these libraries.

What my team then did, we centralized this role for handling all the things around open source contributions-- also the legal things, like signing CLAs, ensuring that there's processes, and even making it easy for people to contribute back to open source.

The next level really was the OpenTracing, OpenCensus, OpenTelemetry project, where we got more actively involved. So back to your question about Dynatrace race being a closed source product, one thing that kind of was always a lot of work-- or is a lot of work in the tracing industry was building this actual instrumentation. So as you reverse engineer a lot of frameworks and then build the tracing.

And when that whole movement around OpenTelemetry, OpenCensus came out, we wanted to contribute actively. Because we saw the value of all working together. And also saw that by that time, tracing and figuring out how to get that information was kind of like a soft problem, but it still cost you a lot of time and money. Reverse engineer the code, put it back in there.

And we thought back then-- it was Microsoft, Google, lots of others-- why don't we just collaborate on this together. This is where we became an active maintainer and contributed to the OpenTelemetry project.

And then the next step for us was when we then started to open source some of the software we developed internally after re-implementing it. Which I think we're going to talk about today around Keptn where we started our own open source project. Because there we thought, this is the right way how to build that software.

So it was really gradually getting more and more involved depending on the needs that we have and the type of community interactions.

CRAIG BOX: It's obviously easy for someone working in open source or writing their own software to include a library like OpenTelemetry, which can then send out metrics or traces to whatever service it needs to. Do you see enterprise software adopting libraries like this so that they are all observable?

ALOIS REITBAUER: My take on this one is we had observability to some extent in enterprise software for a very long time. Even if we looked at application service back in the days, you had logs in there, you had GMX metrics, PMI metrics, depending on the software that you were using. It usually was metrics and-- to some extent-- logs, but never traces.

I think traces just being the logical next step to be included in there. And the advantage is-- and also their requirement now in a world where we have these polyglot environments with microservices-- is to have a language independent standard. It's like imagine JMX, which is obviously a Java-based standard.

Today's software might be written in multiple languages. So I see that this has been done in the past. It's now just getting more standardized and maybe more ubiquitous across the industry. And also covering more areas-- not just the metrics and logs, but also the tracing data.

ADAM GLICK: Do you see those areas of logging, and monitoring, and tracing, and telemetry coming together or staying separate? I traditionally think of them as having two separate audiences of who uses those tools and how they think about them.

ALOIS REITBAUER: I think the real value comes when you put them all together, especially in a highly distributed environment. What we invested a lot of time is to build really a topology model on top of it, that you understand how things really play together. Because you want to run queries on topic, show me their load on the servers and what's producing this log statement for all of the instances of this service which was deployed in a certain version.

And therefore, obviously you need this integrated model on top of it. The reason why it is often handled separately is simply the way they are transported is often done separately. So you can be way more efficient on the metric side than you can be on a tracing and logging side. And different transport mechanisms are sometimes helpful to be there.

And obviously, people traditionally used different tools. But I see them as an integrated combination of data, also with that information about where it's coming from on top.

CRAIG BOX: We have OpenTelemetry, which targets mostly tracing use cases. There's also an OpenMetrics project, which we talked to Richard Hartmann about back in the day. Is there an "OpenLogging" system and would it make sense for all these three things to be built by one group, rather than three disparate groups?

ALOIS REITBAUER: There is some work-- obviously in the CNCF, there's logging projects. Also the OpenTelemetry project is looking into logging. It's still a very early stage. They also have a metrics working group. I think it should be standardized and I think they should work closely together.

But I think this is also an evolutionary step, especially if you look at the monitoring in the metric space with Prometheus. This has been around for a very long time. So any smaller change that you are making is actually quite hard. So there's more freedom, obviously, in a project like OpenTelemetry.

But I see them eventually converge. Because that's where the big value comes in, where you say, I have this service and I don't know which language it is written in. I don't know how it actually works and which frameworks and data it's using. But I have to standardize information I can get from it that I simply understand when it comes to observability.

CRAIG BOX: You shouldn't need to know any of that to be able to observe it.

ALOIS REITBAUER: Exactly. But as of today, you would need to know, OK, in some cases which language is it written in, which tooling is available to it. So kind of like all these abstractions that you got with containers about how it's written to some extent goes away when you talk about observability and managing of this component. Because suddenly you need to know a lot of the inner workings again. And that's where I think that observability movement and that OpenTelemetry movement is super helpful.

ADAM GLICK: Is Dynatrace software software that is installed locally into a customer's environment or as it SaaS?

ALOIS REITBAUER: We still have both deployment models-- so people who run their own monitoring environments but also more and more people who switch to a SaaS based approach. The hard truth is at some point you have an inception problem. Because monitoring software by itself is software again, which needs to be monitored.

ADAM GLICK: [CHUCKLES]

ALOIS REITBAUER: Which then uses software that needs to be monitored. And if you already have 99.9% availability, your monitoring should obviously have higher availability than the application that you're running. Because otherwise you would not know about downtime.

So the requirements are really hard on monitoring systems. And the more I talk to people, they actually don't want to buy monitoring systems. They just want to consume them as a service type of fashion. And we see a lot of people using it, obviously as a service running by us.

But we also have this concept of what we call managed, which is more or less using the software on your own infrastructure for whichever reasons you want to run on your own infrastructure. It might be legal issues why you're running it there. But still getting that SaaS experience.

And more or less, you're giving us the hardware and we are doing the rest. Like a Google Anthos type of approach, where you are taking hardware that's available but then the rest of the stack is managed by a third party provider.

CRAIG BOX: You have software which you want to install on other people's hardware. I understand you had to build out some automation technology to make it possible to do that.

ALOIS REITBAUER: Exactly. So our challenge was-- obviously for our own SaaS environment, we run numerous clusters. There was a challenge by itself. But obviously these are multi-tenant clusters that you can usually scale pretty fine. Well, once you install on your customer's environment, you have one installation per customer environment which leads us to having way beyond 2,000 individual installations on hardware that we don't have control, where part of the deployment and the operation procedures are based on what we want to do and can also be influenced by the customer. So that scaling issue was really the reason why we built our own automation layer back when we started with the current generation of the Dynatrace product.

CRAIG BOX: The capital D generation.

ALOIS REITBAUER: The capital D generation. Exactly.

Yeah, that's where we started to build our own automation layer. And what was really important for us, we just didn't want to automate the deployment piece, we also wanted to automate the operations piece. So not just rolling out new software, but also building in capabilities for self-healing, for disabling features if they were not running fine, in a way that the software could do it by itself and didn't require third parties to use it.

CRAIG BOX: That software has been released as open source. It's called Keptn. What exactly is Keptn?

ALOIS REITBAUER: Not exactly that piece of software was released, but more or less a re-implementation of exactly these two concepts was released as Keptn. So we had this software-- one which was called-- which had what's called Cloud Control and Mission Control. So this is a couple of internal software components.

But they were very proprietary to the needs of Dynatrace. So they are great pieces of software but they can manage Dynatrace clusters. Which is not very useful for anybody except for Dynatrace. But we had a lot of customers and also other companies we engaged with that wanted to have similar concepts implemented, but they couldn't reuse what we had.

So that's when we decided to more or less take what we had built as an internal proprietary solution and more or less re-implement it so that you can use it pretty much in every application that you're running, with a focus on Kubernetes based cloud native applications. Because that's where we saw were most of the people who we were working with were headed. And that's what then resulted in Keptn.

And what Keptn essentially is-- the one sentence description that might use a bit of jargon there is it says it's a declarative event based control plan for continuous delivery and operations. So let's break this down into human readable bits and pieces.

So let's start with declarative. That's the same with Kubernetes. You shouldn't have to specify how to deploy something, but you just have to specify the end state that you want to have. Like I want to have an environment that's say, five stages, reproduction, environments, and then everything is built for you behind the scenes.

Event based means it's-- rather than writing scripts-- what you typically do in CI and what you find a lot in CP as well, because people use a lot of CI tools for CP purposes-- you just define events and then hook certain what we call services onto these events.

Like event based architectures-- which we have in microservices-- just applying them also to continuous delivery. So there an event like test started and then every tool that provides test capabilities links into this test started event. It says, yes, I want to test this software. And doing the same for deployment.

The main advantage is-- if you want to add, for example, a security test, or a compliance test, or a chaos test to all of the software that you're running instead of touching hundreds of scripts, you just have this one service that registers to all of those events and then participates in all of these delivery processes.

And then the last component is the control plane. So Keptn by itself comes with a couple of services that it can do deployment testing on Kubernetes based environments. But the problem we were faced with when we started to implement Keptn together with our customers was that people already had tools in place and they were kind of linked together in say, a very one off kind of way. It's like all the same and all different at the same time.

And the reason why we came up with this control plane idea was is it's solving a problem that has been previously solved in the networking industry. So as software defined networks are doing exactly the same thing-- you have all the different planes. You have your application plane. That's where we find more or less how your network is supposed to look like.

In Keptn this is what we have in this declarative approach, where you find these are the events. These are the stages that I want to have. This is the way how I want operational procedures to work. And then you have the control plane, which does the execution and translates it down to an SDN-- to the data plane, in our case-- and to the delivery plane, which means it's more or less pushing it to the right tools.

And what then the data plane does, it moves things from one side of the network to the other side of the network, which is pretty similar to continuous delivery. Like moving something from a artifacts server all the way to production. So we saw all those similarities and more or less took those concepts that already existed then and applied them to continuous delivery.

ADAM GLICK: Your customers already had tools to deploy their applications. Jenkins is popular for continuous integration. There are options like Spinnaker for continuous delivery and newer tools for GitOps like Argo and Flux. Why create something new in Keptn?

ALOIS REITBAUER: Well, we see people using those tools. And one of the reasons why we've started at the control plane level and not like building a CD tool directly is exactly linking those tools together. Because the reality is that people are not using one tool, but they're using six, seven, eight, nine, 10 of these tools.

The first thing for us, we didn't want to build a CI tool, because there's great CI tools out there. So conceptually we really start at the continuous delivery stage. We assume that the artifact and the artifact description is available.

So you have a container available and you have, for example, help chart available. But then usually you pass it on-- not just to one other tool-- so, OK, you now want to run the deployment. So what's happening?

You're pushing it to a deployment tool. Then you're communicating with a testing tool. Then you're fetching the test results from a monitoring tool. Based on these results, you then might trigger a GitOps tool.

And if everything then goes fine, you're then pushing it to say, a Git repository again for your GitOps based operations. Or you might in between push it to a chaos testing tool, depending on what you want to do. And then you want to roll back that whole thing.

So there is this glue layer that's on top of all the tools that people are using today that is highly proprietary. And I wrote this blog post where I think that this is like- we build a lot of software just to ship software, but we're not using the best practices which we have developed as an industry to build that software. It's not test-driven. We're not using microservice approaches. We are building this big, monolithic pile of glue code on top of everything that at some point nobody wants to maintain anymore.

And that's the problem that Keptn is really solving. And we see a lot of people saying, yes, I really like that I can just define how I want to stick things together and then I link them to make system tools that I have. If I already have a deployment tool-- for example, Argo-- Keptn would just use an internal service to work with Argo.

We even started to take those events for this control plane and put them out there as a separate specification where people on the team are now working within the Continuous Delivery Foundation to eventually standardize on this event. So there is the interoperability group within the CDF.

Because at the end of the day, every tool out there should simply understand it. But the hard reality today is, depending on which deployment tool you're using, you have to code against a wholly different API. But conceptually what you do is you say you want one artifact to be deployed in one specific stage in a blue green fashion with a 20% increment.

CRAIG BOX: That sounds very much like what you said with OpenTelemetry and OpenCensus-- there should be one way to do this, the standardized.

ALOIS REITBAUER: Exactly.

ADAM GLICK: One ring to rule them all.

[CHUCKLES]

ALOIS REITBAUER: I wouldn't say one to rule them all. An agreement on things we usually agree with anyways. And maybe at some point it would mean that every tool speaks the same language and you might not need something like Keptn. But let's just get started implementing something that provides these capabilities.

CRAIG BOX: We'll never agree on the language. It should be something like Esperanto.

ALOIS REITBAUER: [CHUCKLES]

ADAM GLICK: Would it be safe to say that this is kind of a meta management layer on top of things? That it's taking a look at all the different things that people are doing for CI/CD and bringing them together in a unified place to be able to manage the whole thing, kind of similar to what Kubernetes does with applications that are running underneath it, and all the different pods, and the things that need to be set up so that it works all together. Is that a fair analogy?

ALOIS REITBAUER: Yeah, it's a fair analogy. It's really this meta level. But really focusing on two things-- on the delivery part of the tooling change, but also on the operations part. I think the operations part is one that not a lot of people spend that much time looking at and investing.

And it's always two different tools. You use one tool for delivery. And while we talk about DevOps and not throwing things over the fence, when you look at most tooling demos and best practices, they kind of stop after their redeployment was successful.

But usually software fails later on. So that's why we invested a lot in what we call closed loop remediation, where as part of your software you ship a remediation file where they developer can specify how in certain situations you can reconfigure this service to then cope with these issues-- whether it's a failure issue, whether it's a load related issue, whatever it is.

ADAM GLICK: What tools does Keptn work with?

ALOIS REITBAUER: As of today, we work with Jenkins. We work with Argo. GitLab would be another one. And we or the community is adding an integration with other tools that are out there. So usually that integration layer is actually pretty thin. So we thought trying to re-implement everything that the tool out there can already do.

So Keptn services, in the simplest way, is just translating this standardized events-- or this unified event layer-- to specific API calls of other tools. That's also why I said at some point that this service might simply go away. Because if we eventually can agree on a more standardized way of communicating about what we want a certain tool to do, the less requirements you might have for this translation layer.

CRAIG BOX: So it's a giant Rosetta stone for continuous delivery.

ALOIS REITBAUER: You could call it that way.

ADAM GLICK: It's an interface definition.

ALOIS REITBAUER: Yeah, it's an interface definition for its protocol definition. It's also a declarative definition for the delivery chain and for the operations instruction, plus an implementation.

CRAIG BOX: A lot of things these days are being defined in terms of Kubernetes objects, and custom resources, and so on. How are the declarative pipelines for Keptn defined?

ALOIS REITBAUER: They're defined as what we call a shipyard file, so just staying in the nautical space there. Which more or less defines the event group flow. You can put it into a COD pretty easily if you want to. Keptn to Keptn services are deployed on Kubernetes underneath. So everything is run as a Kubernetes service, where we can just provide that Keptn service as a container and we take care of all the rest.

But we try to hide as much of Kubernetes for the end user as possible. So we define these yaml files and then we manage them internally. But we would not want anybody that's using Keptn to have to do a kubectl apply on the CRD.

So you just use it directly-- the Keptn API-- and you push the shipyard file, which specifies the stages and how stages should usually be handled and how events should be propagated. So you never see what's actually happening underneath. So we try to hide as much of this from the user as possible.

CRAIG BOX: If I have an application with a small number of microservices-- think of something like the Hipster Shop or the Istio Bookinfo app-- how would I go about orchestrating the deployment of that using Keptn?

ALOIS REITBAUER: The first step is you would have to agree what you want your environments to look like. The first thing is you might say, I have a dev environment, I have a staging, and I have one or two production environments. That's what you would put in the shipyard file.

And then it would define what you want to happen. Like here, I want a direct deployment-- just some smaller tests. Here I want that for longer load test and the blue green deployment. And here I want to have a canary release and then testing against real users.

And then you define how you usually want services to be propagated-- automatically or manually. So you might say from test to staging, if everything goes fine you're doing it automatically. But to production obviously you want to have final control in there.

There are some components which we always put into those stages, like the quality gates. So we never push something into the next stage if it doesn't work. So you have to specify how to assess the quality of your deployment by having a test and an evaluation.

And also Keptn by default-- the way it works-- is if you deploy something that's a big issue and people say, well, we have gates in place that we don't propagate to the next stage, they usually don't clean up the mess they make in dev. To deploy to dev, you break stuff in dev, you're not propagating to staging. That's amazing. But do you really clean up dev? No, you usually don't. That's why then people have dev and dev stable. So we also do that cleanup.

So you would define that file-- what you want to have in there-- or you can use one of the standards if you're not on different specifics. And then you just create a Keptn project and you onboard your service.

Keptn will then look at your service definition-- like your Helm files that you have-- and then based on the deployment strategy, it would build out a separate deployment files for the individual stages that you're running. So you would write them just once and it would build them across the stages, either by using Helm files or by properly creating those stages also with Argo on the back end, if you wanted to.

And then you just write your code. And whenever you have-- and you are in fact available-- you let Keptn know and then let it do its magic. So ideally as a developer, it becomes totally transparent how the application is shipped. And you don't have to care about it. You just get a notification that it worked or it didn't work.

ADAM GLICK: You mentioned "as a developer." Who's the user that Keptn is built for? Is it built for development teams? Is it built for operations, SRE, DevOps teams? Are there applicable parts to both sides of the house?

ALOIS REITBAUER: Yeah. When we talk about target audiences, we see three different target audiences. For a developer, ideally they don't really have to interact a lot with Keptn. Keptn should just make their life easier because you're just building your artifact and the rest is taken care of for you without you having to know all the details on how things get shipped, how they need to be tested, deployed, and moved from one stage to another.

SREs are usually responsible for defining these stages, defining these processes, and how things should happen. The shipyard files, as I mentioned before, this is usually what the SRE would define and agree on.

By the way, this isn't-- this being in a separate file has been very well received by large enterprise organizations for reasons that I wasn't aware of before. But many organizations that have compliance issues, the SREs are actually responsible for taking all those scripts that they're usually very much task driven definitions of what's happening-- to validate them, whether they fulfill the requirements for a deployment within the organization. As you remove all these bits and pieces in between, it makes it easier for the SRE to define those processes.

And then the last audience where I think we take most of the heavy lifting away is for-- whether it's an SRE or a DevOps engineer-- it's basically the person who has to maintain all these tools that make up your delivery and operations tool chain-- linking them together, building them. These are usually the people who we interact with the most, because they see the biggest value not having to build this every time a new service comes around or every time they have to change something.

CRAIG BOX: You are one of the co-chairs of the new CNCF SIG for app delivery. How did that SIG get started?

ALOIS REITBAUER: The SIG got started in an initial conversation with Alexis a while back. It was KubeCon in Seattle. And back then we were really discussing that Kubernetes took a lot of the complexity of deploying applications away. But as people were really moving to move bigger and bigger workloads onto Kubernetes, different questions were arising. And there was also not necessarily agreement on a lot of technologies and new projects were popping up.

Because one, maybe starting with one work that the SIG are doing-- is reviewing projects that come into the CNCF, or that affect the CNCF, help them, guide them. That was one thing.

And back in the day, the TOC just wanted to use SIGs to spread the workload a bit, because more projects were coming in there. Another driver was really starting to create a landscape and start to bring people together to solve the hard problems.

Just at virtual KubeCon Europe, we gave a talk about some of those problems. Like how do you deploy to offline environments? Or how do you ship commercial off the shelf applications? Which is actually harder than you might think. Or how to enforce a chain of custody, ensuring that what you're installing is actually what you should be installing?

And even very simple questions. How do you define what constitutes an application in Kubernetes? You could say, well, I can't have manifest files. But that's not maybe what you want to do. There are things like OAM that are emerging.

So as we move to a higher level of abstraction-- which is what a lot of people want to do in the Kubernetes space, obviously to make it easier to build applications-- a lot of these questions come up more on the application side. And delivery is key there.

Another point with these was just agreeing on how things are done. Like blue green deployments-- everybody agrees that blue green deployments and canary releases are great. But still every tool is kind of doing them somehow slightly differently, also from the implementation perspective. And having an agreement there-- and not necessarily like one solution, but at least a landscape and guiding people in the right direction-- is really what SIG App Delivery are looking into.

ADAM GLICK: Finally, I know that not only are you working on some of the most cutting edge technology in the open source world, but you're also helping work on some classic technology-- in particular in the automotive world. How is the work you're doing on your sports car?

ALOIS REITBAUER: Maybe for the listeners here. I got myself a new hobby by buying a 1977 German sports car. And this is also interesting because I think it teaches you also a lot of skills. So I sometimes like to refer to myself as the curator of that car, because there's a lot of decisions that you have to take.

And I see also a lot of analogies to the way you build applications. It's not like that one thing that you build. And it's like 43 years old now. So you also see that the delivery is one, when the car was built and shipped, but the actual operations and the continuous evolution is something that's going on.

And this is an interesting experience. You have to take a lot of decisions. It's given you a lot of great time. But you have to think how do I do this? Do I upgrade this? Do I keep this the way it is right now? There really needs this new technology over here. Like we used light bulbs, do we have to replace it because it no longer exists?

So interestingly, you see a lot of analogies to the way you're building applications and the way you keep maintaining an old timey sports car.

ADAM GLICK: It's a totally different maintainer challenge. Alois, it's been great having you on the show today. Thanks for joining us.

ALOIS REITBAUER: Yeah. Thanks to you too, Adam and Craig.

ADAM GLICK: You can find Alois Reitbauer on Twitter at @aloisreitbauer, and you can find the Keptn project at keptn.sh.

[MUSIC PLAYING]

ADAM GLICK: Thanks for listening. As always, if you enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter at @KubernetesPod or reach us by email at kubernetespodcast@google.com.

CRAIG BOX: You can also check out our website at kubernetespodcast.com, where you will find transcripts and show notes, as well as links to subscribe. Until next time, take care.

ADAM GLICK: Catch you next week.

[MUSIC PLAYING]

View More Episodes

Keptn, with Alois Reitbauer

Chatter of the week

News of the week

Links from the interview

Transcript