#24 October 10, 2018

Spinnaker, with Steven Kim

Hosts: Craig Box, Adam Glick

Steven Kim is an engineering manager at Google, based in New York City, working on the Spinnaker project. In a companion piece to last week’s episode about CI and CD, Steven talks to Craig and Adam about how Spinnaker evolved from VMs to Kubernetes and support for other cloud native technologies.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

News of the week

CRAIG BOX: Hi, and welcome to the Kubernetes Podcast from Google. I'm Craig Box.

ADAM GLICK: And I'm Adam Glick.

[MUSIC PLAYING]

How's it going, Craig?

CRAIG BOX: Man, I think we spend too long on planes.

ADAM GLICK: Yeah, it's been a heavy travel time recently. Where are you at these days?

CRAIG BOX: I'm in Hong Kong. I'm on my way back to London. I hear you're keeping it warm for me.

ADAM GLICK: I am indeed. I have arrived at the next conference here. It's great. Running into some people with the stickers already, and we'll have more to give out. So thank you to all of you that I have run into so far.

CRAIG BOX: I love that. This is the sticker report. We actually did ask people last week to send in an email, and we got a few. So I will pop some stickers in the post.

I may even pop a picture in the show notes of my dad putting his Kubernetes Podcast sticker on the fridge, as promised. What have you been up to on these flights? How do you keep yourself occupied?

ADAM GLICK: I discovered a game called "Evoland 2," which, if anyone grew up in the generation of Game Boys and Nintendo Entertainment Systems and such, is a wonderful game that basically parodies many of those. And the game evolves as you evolve through the game, to have you play various game mechanics just like those games. So if you enjoy and want a little bit of nostalgia, that one has been keeping me busy.

CRAIG BOX: Can you connect your NES to the seatback entertainment?

ADAM GLICK: I think we'll have to send that one to British Air and see if they can make that available.

CRAIG BOX: I took a couple of those flights in New Zealand where they don't even have a seatback entertainment system.

ADAM GLICK: Oh my gosh.

CRAIG BOX: But interesting, the last flight I took in New Zealand, the previous two prime ministers ago of New Zealand, so the George W. Bush of New Zealand, if you will, he was on the flight with me, and we shared a moment. I've been taking a picture of the seatback entertainment system as I've been flying around the world, and I was taking a picture of the magazine to say, look how great the TV is on this little tiny flight. Do you watch "Last Week Tonight with John Oliver?"

ADAM GLICK: I do.

CRAIG BOX: Do you remember the debate about the New Zealand flag?

ADAM GLICK: I do not.

CRAIG BOX: Years ago, the prime minister of New Zealand, he wanted to change the flag of New Zealand, and there was an open competition where people were crowdsourcing ideas. And, as you get with any crowdsourced competition, there were some interesting choices.

ADAM GLICK: What could possibly go wrong?

CRAIG BOX: One of the more famous choices was basically a kiwi shooting lasers out of his eyes. The contest was roundly mocked in the news media, but the laser kiwi flag has become a little bit of an icon, such that I own not one, but two shirts with that emblem on it. And I just happened to be wearing that shirt when I bumped into the prime minister, who was trying to change the flag, and he was very amused at seeing the shirt that I was wearing on that plane that day.

ADAM GLICK: I take it they did not change the flag for it, though?

CRAIG BOX: No, unfortunately for reasons we won't get into now, we still have the current flag, but it's a lovely flag, and we fly it with pride.

ADAM GLICK: Excellent. Let's get to the news.

CRAIG BOX: The debate of the week-- is Kubernetes a good platform for personal projects? On the for side is Caleb Doxsey, who wrote a blog post making the case that Kubernetes is no more confusing than the Linux alternatives that precede it, and that you can run a Kubernetes environment for just $5 a month. Against is Carlos Rodriguez, who wrote a rebuttal to Caleb's post, in which he suggested he'd rather make the trade-off of doing more work in order to get a cheaper deal. Both posts had a lot of healthy discussion on "Hacker News."

Sitting in-between is GKE, who this week announced a new tool called kubehost. Kubehost is an open source utility for exposing services that run on a single node cluster during development. By avoiding the need for a load balancer, kubehost can lower the monthly cost of running a web service on a single node GKE cluster by over 50% and makes it easy to later upgrade to the Google Cloud Load Balancer.

ADAM GLICK: Cloud Native Buildpacks are the latest project to enter Sandbox stage with the CNCF. Buildpacks are pluggable, modular tools that translate source code into container-ready artifacts. As with many of the evolutions of technology over the past decades, this is achieved by providing-- wait for it-- a higher level of abstraction compared to a Dockerfile. In doing so, Buildpacks provide a balanced control that minimizes initial time to production, reduces the operational burden on developers, and supports enterprise operators who manage apps at scale. Buildpacks were first conceived by Heroku back in 2011, and since then, they have been adopted by projects such as Cloud Foundry, Knative, GitLab, Deis, and Dokku with two Ks.

CRAIG BOX: Amazon announced a set of CRDs for managing AWS objects using the Kubernetes API. The AWS Service Operator for Kubernetes is currently in alpha and launches with support for six AWS resources, including DynamoDB tables and S3 Buckets.

ADAM GLICK: DigitalOcean launched a managed Kubernetes service and early access in May, which we covered all the way back in episode two. This week they have moved it to limited availability.

CRAIG BOX: Platform9 released a new tool, which in honor of my co-host I will call etcd-Adam, for managing etcd clusters. Inspired by the kubeadm tool for managing Kubernetes clusters, etcdadm provides a simple command-line experience to configure, deploy, and operate etcd clusters anywhere with built-in support for recovery and scalability.

ADAM GLICK: Did you hear our interview in episode 21 with Ihor Dvoretskyi about how there are more ways to contribute than just writing code? Ihor, along with Noah Abrahams and Jonas Rosland, have published a non-code contributors guide. It identifies potential areas of contribution outside of traditional software development. Want to write documentation or help with the release team? Check out episodes 5 and 10, respectively, to learn more.

CRAIG BOX: Pulumi are exploring the Kubernetes API in a series of blog posts. This week, they look at the deployment object, how it keeps up with pod and node changes, and how its state changes can be displayed to a user with the open-source kubespy tool.

ADAM GLICK: Ahmet Alp Balkan from Google wrote a post called, How to do health checks of gRPC services running on Kubernetes. Since gRPC health checks are not natively supported in Kubernetes, Ahmet has developed a standard tool called grpc-health-probe, which you can configure to act as the health check binary for all gRPC services.

CRAIG BOX: Security gateway software, Teleport, provides security features like RBAC and audit logging for SSH and with v3.0 released this week, adds similar support for Kubernetes. Interactive sessions or remote commands launched via kubectl are recorded and can be replayed for compliance, knowledge sharing, or root cause analysis.

ADAM GLICK: And that's the news.

[MUSIC PLAYING]

ADAM GLICK: Steven Kim is an engineering manager on the Spinnaker project based in New York City. Welcome to the show, Steven.

STEVEN KIM: Hi, good to be here.

CRAIG BOX: What's the history of the Spinnaker project?

STEVEN KIM: The history of the Spinnaker project is that it was started at Netflix, shortly after which, Google joined in on the project. One of the motivations that Netflix had with the Spinnaker project was that it be extensible and that it goes and supports multiple clouds as a proof point of that. And so Google joining the project, and contributing to the project was a good proof point. We worked together for about a year before we open sourced together. That was back in November of 2015.

We sort of launched an alpha form, and our goal was to really immediately identify the key things we needed to go and address. We knew there was a long list, and we were heads down development working on that for a good year and a half. And then in the summer of 2017, we, in the OSS Project, declared a 1.0, which said-- we're not done.

If anything, we have finally arrived at the start line, where we took the known barriers, and we feel like we've knocked them out. And this is now worthy of a conversation. And we've been working heads-down on what we think are the priorities since then.

CRAIG BOX: For people who are not familiar with the Spinnaker project, what is the main feature set? What are the reasons people would use it.

STEVEN KIM: Spinnaker is a continuous delivery platform, and what it does is its scope of concern is from the point where you have declared a release candidate and you want to then progressively start the process of rolling that out all the way to production. There are obviously numerous strategies to go ahead and do so, but Spinnaker tends to be a bit more opinionated. Its domain is entirely and continuous delivery. So we'd probably discourage people from using Spinnaker for things like CI, for example, as a general automation tool. It's just not what Spinnaker's great at. But in continuous delivery realms, it has the building blocks and first-class citizen treatment of things that are particularly important as you try to do deployments and promotions to production.

ADAM GLICK: You mentioned you think of it as part of the space of continuous delivery, and there's a number of tools that are out there. One of the ones that I hear people talk a lot about is Jenkins. How do you think about Jenkins and Spinnaker?

STEVEN KIM: Yeah, Jenkins obviously has a huge amount of success, and it's a wonderful tool. It's very, very good at specific domains and also as a general purpose automation and orchestration tool. I think that we have sort of framed the DevOps and the CI/CD space as sort of a dev-centric CI/CD and then an op-centric CI/CD. And really it's about who is the person, or the party, who are taking primary responsibility and ownership for building a platform and determining what is the way that our organization is going to do rollouts.

Now sometimes, typically in earlier stages, you want to empower your development teams to explore and figure out what's the best way to go ahead and do that, but I think we all understand that at some point there's a scale where that just doesn't work anymore. And for that matter, the organization has priorities defined, and they have a good idea of what they really want to go and encourage out there. And we call it op-centric CI/CD, for lack of a better phrase, because it's not really the same ops team it was from five or 10 years ago, but its an organization-- call it release engineering, call it whatever it might be-- who are setting up Spinnaker to a point where AppDev teams have to think very minimally about the details of a rollout, and they can be freed to focus on feature development.

CRAIG BOX: So in your mind, do you think of the CI tooling as something that the development team sets up, or would you also think an ops team would set it up on behalf of the developers?

STEVEN KIM: I think that's what we've just seen because Jenkins has made that so successful. What I see often out there is that typically, Jenkins, it scales something more close to linearly with sort of the scale that the development teams needs to operate at, and that is why at some point it becomes unsustainable as you try to extend it to do things that maybe it wasn't initially designed to do. We see Spinnaker typically scale more sub-linearly as larger and larger organizations start to go ahead and adopt it work-wide.

ADAM GLICK: How do you see those different organizations working? Like obviously how developers work is very different than how ops teams work, and you say this is more of an ops-focused tool so that developers just-- it can be abstracted away from them. What's the feature set that makes that more interesting for ops-focused organizations?

STEVEN KIM: I think the interesting thing is the notion of deployment is-- you can see it as a front door to the runtime platform. And so it really does sit certainly adjacent to the operations, if not a part of operations, if you think about it.

So the reasons that you want to deliberately focus on continuous delivery are things like what happens when something goes wrong? The on-call team might be a part of the AppDev team. But in trying to go ahead and figure out the details of sifting through the platform, the logs, and what happened, we actually want to be very deliberate about how we abstract that and enable the AppDev teams to do that really, really effectively.

CI, on the other hand, is actually entirely a code concern and a development team concern. It's them trying to go ahead and keep a green repo, integrating code often, and making sure that the team is effectively working together as made evident by the code base that's out there. And I think that's one way to go ahead and kind of look at what is the problem we're trying to solve in CI and CD, and whose problem is it, primarily. I think that lends a lot of hints to how we should approach this.

CRAIG BOX: I think that separation of concerns was obviously a lot clearer in the world where we were deploying to virtual machines, and that is where Spinnaker started. It obviously started off with Netflix deploying to AWS and Google deploying to Google Cloud. How did the Kubernetes integration get added to Spinnaker, and what changes did that require in the codebase?

STEVEN KIM: I think from a Google perspective, Kubernetes has long been and continues to be a priority. We really see it as the future of cloud. So when we look at deployments, we also want to be very deliberate about how we go and arm and enable the AppDev teams and organizations at large to go ahead and make deployments with Kubernetes.

I think it's true that, especially at large scale I've heard it expressed multiple times, that we actually don't want to pay the cost at scale, en masse, of the AppDev teams needing to learn the intricacies of the Kubernetes manifest, for example. And we want the cost of making deployments of change, whether it's binary change or configuration change out to very increasingly complex environments, multi-clustered, multi-region, multi-stage, and so forth, micro-services that have heavy dependency implications, we really want to try to go ahead and save the AppDev teams as much as we can from what that means. And that really means certainly automation, but the thing that I mentioned ahead of specific things in the problem domain that we want to go ahead and address.

ADAM GLICK: Do you think of it as a container or Kubernetes-specific set of technologies, or is it something that's abstracted you away from that that just happens to be evolving along with that community?

STEVEN KIM: Even though Spinnaker came out of the VM world, Netflix, having been an early poster-child of cloud, had adopted micro-services from very, very early on. Now we believe that Kubernetes is the best environment to go ahead and run micro-services. It's just very, very native to how that's done.

But the philosophy and the approach that Spinnaker was born out of is indeed micro-services. If you look at how the user interface and the ACLs and everything is segmented, it's clearly built around micro-services. And so, two, coming together was really sort of natural, Spinnaker, about where it was and then adding Kubernetes support to what it is.

Now we did sort of a first pass at Kubernetes support back in January of 2016, and it did pretty well. It had a good level of abstraction. We got a lot of feedback for it, and we recently took a second swipe at that. We call it the v2 provider, for lack of better naming. It's an open-source project.

Then it supported manifest natively. And I think that it forced a lot of very, very good conversations that, frankly, aside from Spinnaker, the Kubernetes community was working through as well, the notion of-- what does configuration as a code mean? What does the manifest actually represent? How do we do very complex multi-region and multi-stage rollouts when every binary change means you have to go and update 12 different YAML files sitting in a Git repository.

How do we make that sustainable? How do we make it so that when something goes wrong, we can look back across 12 different deployments that were made, that hopefully weren't made with individual VI editors back to Git-- how do we go and manage things like that?

CRAIG BOX: There are a few companies who are starting to look at the idea of GitOps, which almost feels like-- it's what you just said-- it is doing the deployment by way of managing these files and pushing them out everywhere. How would you contrast that to the more UI approach of the Spinnaker system?

STEVEN KIM: I think that there are many things that make sense about a GitOps approach, which is config as code and making sure that it goes through the proper visibility that Git, for example, as a source control system can go ahead and provide for you as you make changes. I do have things that I don't think scales. So, for example, the notion that your config as code that you go ahead and check in.

Kubernetes manifest, for example, is a system of record for the state of anything that is of interest out there, is probably not going to work out so well. So, for example, Kubernetes being a declarative system. You submit your desires to Kubernetes, and most of the times it'll say yes, it'll come back, but sometimes it's simply unable to if there's not enough memory to go ahead and allocate, for example.

And the thing is, Kubernetes will come back immediately and say yes. Well you may not exactly know what's going on in there. And for that matter, being a declarative system, there can be situations where it discourages reaching in to try to go ahead and fix what's going on. For example, in Borg, which has a lot of similarities with Kubernetes, we have the notion of-- internally at Google-- a big red button that basically says, all right, stop everything. Orchestration later. Stop, because you keep stepping on what I'm trying to do.

And that actually goes and reveals some of the challenges with dealing with declarative systems. So with GitOps, you can ask simple questions, like, well, what happens if a rollout fails? Do you update that manifest to Git and roll it out again? Because that's a pretty long cycle to go in and pay for something to be on fire, right.

What happens if some condition occurs? Autoscaling, for a simple example. Now all of a sudden your manifest starts-- we call this configuration drift, for example, one of the ways that we have configuration drift out in the industry. And the tooling is rather immature at the point to go ahead and deal with those things, where in VM land, for example, some of those concerns are addressed in a more mature way.

ADAM GLICK: When do you think is the right time for someone to adopt a continuous delivery methodology versus just kind of doing things manually and moving fast? What's the right time for them to automate? Is it right at the beginning? Is it as they get more mature? Is it once they wait till they have ops teams built out and let them take care of it?

STEVEN KIM: I do think it is immediate. However, thankfully, we have a broad spectrum of continuous delivery tools that you can adopt. So when you're a dev team of one or two, you don't have to immediately, as you start rolling out into production, use a kubectl and apply dash-F, right. But you can go ahead and use a number of automation tools out there, including Jenkins and Cloud Build and a number of wonderful things.

I think the more interesting question is-- what is the point where you should look at what we're calling an op-centric CI/CD platform? And I think that's actually the point where you get to a scale, where you have either enterprise requirements, such as policy or then deployments that you need to go ahead and do out there, or where you want to go ahead and scale at a point where the AppDev teams are dealing with the complexity. I talked about multi-region and multi-stage deployments and deployment strategies, such as obviously the basic blue-green, but out to automated Canary analysis based deployments, things like that are right for you and you want to go ahead and start focusing increasingly on feature development and velocity at scale, that's probably a good time for you to start looking at taking on something like Spinnaker.

CRAIG BOX: New network tooling like service meshes helps with all of those problems you just mentioned-- canary deployment, for example. What work is happening in the Spinnaker project to support Istio or other service mesh products.

STEVEN KIM: Yes, so obviously Spinnaker has current support for Kubernetes services. We are at the early design stages of what Istio support would look like. Now Istio, as far as Canary releases, plays an important role, which is shaping traffic management across the different candidates that are out there. What Spinnaker does, however, is run the Canary analysis between control and candidate and try to go ahead and figure out across the deviation threshold that you find tolerable, across the different metrics-- CP utilization, memory utilization, latency, and response error rates, for example.

CRAIG BOX: Does Spinnaker collect those metrics itself, or does it connect into your monitoring system, which is--

STEVEN KIM: It connects into your monitoring system, so currently it goes and works with Stackdriver, Datadog, Prometheus. And so it can use that for metrics, and it can also go ahead and use either Kubernetes services or Istio for the actual traffic shaping, and then run the Canary analysis which is done in Spinnaker. And that's the part where we go ahead and figure out, using Mann-Whitney or a number of statistic methods, to go ahead and say, how do we get a good representative number of points to go ahead and confidently say, yeah, this Canary is not looking so good. We should not move forward with this.

ADAM GLICK: As you work on Spinnaker and you look about what's coming next, what are you most excited about that's on the roadmap?

STEVEN KIM: I think that the future of cloud is looking really great. There are a number of things, I think, Kubernetes continues to go ahead and mature, and there's many proof points out there, as well as us learning more about what are the next things that we need to focus on to go ahead and do well. Projects, obviously like Istio, that continue to build on cloud-native approach, to how to go ahead and manage cloud infrastructure and do deployments out there, as well as projects like Knative and how we're pushing the conversation forward in the serverless, for example, are really, really interesting and exciting things. And so we on the Spinnaker team at Google continue to work closely with the respective Kubernetes, Istio, Knative teams here, and we hope that collectively together that we can go ahead and push the world forward towards a better cloud adoption.

CRAIG BOX: Is there a lot of contribution to the Spinnaker codebase from outside Netflix and Google?

STEVEN KIM: I think that right now-- I haven't looked at the statistics last, but I would guess that we're probably still 80%, 85% between Netflix and Google right now. There are a handful of enterprise users who have spun up Spinnaker teams. Target, for example, has a Spinnaker team, and I think there's a handful of others who are major adopters of Spinnaker, who have a Spinnaker team internally.

And then, of course, the cloud providers themselves, so AWS, Microsoft, Pivotal, all have teams that are contributing to the respective providers inside Spinnaker, but they're kind of-- they got to a point where they're looking for user feedback, and they're trying to figure out where they need to go ahead and push forward next. I believe AWS just went and pushed out Fargate support, and they're working on ECS, as well as a number of their platform as well.

ADAM GLICK: So if the people listening want to get involved in Spinnaker, either just starting to use it and learn more about the tool or to actually join in and contribute to it, where are some resources that they could use or places they could go?

STEVEN KIM: The project's primary website is Spinnaker.io, so that's a great place to go to start. And from there you'll find links to our Slack channel, which is-- I think we're at some 5,000 members or something like that. We have forums where you can go ahead and search and go ahead and troll if you want or go ahead and post questions, introduce yourself.

And also, we have a Spinnaker Summit coming up. We have one every year. And this time it's in October, early October in Seattle. So if that works for you, all the information is on Spinnaker.io.

CRAIG BOX: All right, Steven, thank you so much for joining us.

STEVEN KIM: Thanks very much. It was a pleasure.

CRAIG BOX: Thank you as always for listening. If you enjoyed the show, please do help us spread the word and tell a friend. If you have feedback for us or you missed the sticker call last episode, you can find us on Twitter at @KubernetesPod, or reach us by email at kubernetespodcast@Google.com.

ADAM GLICK: You can also check out our website at kubernetespodcast.com. Until next time, take care.

CRAIG BOX: See you later.

[MUSIC PLAYING]