#20 September 12, 2018

Cloud Native Patterns for Ops, with Justin Garrison

Hosts: Craig Box, Adam Glick

Justin Garrison is both a student and a teacher. A senior systems engineer in the media industry, he has boiled his experience and wisdom, as well as that of his co-author Kris Nova, into the book Cloud Native Infrastructure. He talks to Craig and Adam about the Kubernetes community and the process of writing.

Do you have something cool to share? Some questions? Let us know:

Chatter

News of the week

CRAIG BOX: Hi, and welcome to the Kubernetes Podcast from Google. I'm Craig Box.

ADAM GLICK: And I'm Adam Glick.

[MUSIC PLAYING]

CRAIG BOX: Adam, they tell me you've had a colorful weekend.

ADAM GLICK: Oh, ho, ho, ho, look at you with the clever puns. Yes. I decided to start playing around with the Hue lighting system. I don't know if you have one of those, but I picked one up and started tinkering with it. And the API is super easy to use.

And much to the chagrin of my wife, I just started flipping lights on and off around the house, in order to kind of play with it and figure some stuff out. So I'm starting to think about some of the stuff that I could build as dashboards and retro-style displays to build off of things.

It's a lot of fun. I was pretty impressed.

CRAIG BOX: How many light bulbs do you have?

ADAM GLICK: Oh, my gosh. Right now, I have 13 of them, I think-- lucky 13.

CRAIG BOX: Wow, that's a lot. And they come in a pack of two to start with, don't they?

ADAM GLICK: You can get them in ones. You can get them in four packs.

CRAIG BOX: Or you can clearly buy them in bulk.

ADAM GLICK: Cheaper by the dozen, as they say.

CRAIG BOX: I don't know. Again, I've not seen your house. What's the scale? Is that every light bulb in your house, or is that just your work room, with its 13 light fittings?

ADAM GLICK: It is not, but a lot of our fixtures have three light bulb pieces.

CRAIG BOX: I see.

ADAM GLICK: You know, if you haven't played with it, it's just-- they've done a really nice job for IOT lights. And it plugs in with all the other home automation stuff I have, which pretty much is a clarion call to any hacker out there to go start making my house turn into a strobe light. But some pretty cool stuff.

CRAIG BOX: And the back end runs on Kubernetes on GKE.

ADAM GLICK: That's right.

CRAIG BOX: I thought that was why we were talking about it!

ADAM GLICK: Indeed. And I believe that we're actually going to get together in person next week to record a rare show in person.

CRAIG BOX: Yes, every now and then, Adam's and my paths collide. And next week we are at Google Cloud Next in Tokyo. So if you are a Japanese listener and you have not yet signed up, I think there's still a few tickets available on the website. Feel free to drop by.

I'll give a little talk on the Istio ecosystem. And you'll be able to see Adam and I at the GKE Hybrid Orchestra booth on the show floor.

ADAM GLICK: Yes. And we may even have stickers by that point, so stop by and get a sticker.

CRAIG BOX: I look forward to giving those away.

ADAM GLICK: Let's get to the news.

[MUSIC PLAYING]

CRAIG BOX: Cisco Hybrid Cloud Platform for Google Cloud is now generally available. C-H-C-P-F-G-C, with /three/ C's, can be installed in your data center and supported by Cisco. And complements GKE providing Kubernetes, STO, and the GCP service catalog.

Submissions are now being taken for the Cisco and Google Cloud challenge where you can win buckets of Cloud Credits and tickets to Cisco Live 2019 in Barcelona, by showing off your solution running on the Cisco Hybrid Cloud Platform.

ADAM GLICK: HashiCorp. made a number of announcements around integration of their console plus Kubernetes, to be made available in the coming weeks. This includes an official helm chart for installing Consul on Kubernetes, automatic bi-directional syncing of Kubernetes services with Consul, and some more Istio-like features for Consul Connect, including pod injection and support for Envoy as the proxy.

HashiCorp also says that an official Vault Helm chart is coming, and confirms that the releases have no impact on its support of its own Nomad container system.

CRAIG BOX: Google Cloud and video game engine company Unity this week announced Open Match, an open-source game matchmaking service that runs on Kubernetes. Matchmaking need not be just being assigned to 99 substantially lower ping teenagers to pwn you at Fortnite. Instead, Open Match supports matching based on latency, wait time, and an arbitrary skill rating.

Once you've matched players, you need to pass that information to a multiplayer game server. And for scaling those, there is Agones, a collaboration between Google Cloud and Ubisoft, which this week has released version 0.4.

ADAM GLICK: In self-driving Kubernetes news, Couchbase, an open-source, NoSQL database introduced version 1.0 of its autonomous Kubernetes operator. One user claims it reduced their administrative overhead by 80%. But no word on if you have to keep your hands on the steering wheel at all times while using it.

CRAIG BOX: AWS has announced that Amazon EKS is now available in Ireland, bringing the available worldwide region count to three. In related news, Google Cloud has just launched a new region in Finland, and GKE is available there from day one, bringing the available region count to 17, or in other words, all of them.

ADAM GLICK: Kubernetes as a service vendor, Platform9 has introduced support for Arbitrage in their cross platform-- in their platform. Arbitrage helps you deal with the confusion of the AWS spot market by finding regions with cheaper nodes when they are available and can still run to meet your target SLA.

Their cost calculation logic for the consistently priced Google Cloud Preemptible VMs should be a lot easier when they implement this feature for other clouds, as promised.

CRAIG BOX: External DNS, a project in the Kubernetes incubator, has released 0.5.6. External DNS is like KubeDNS in that it publishes endpoints for Kubernetes services. But instead of running a DNS server and publishing them internally, it uses provider APIs to enter their information into external DNS services, like Google Cloud DNS, or Route 53.

The new release adds support for Alibaba Cloud and using the Istio gateway as a source.

ADAM GLICK: Finally, Red Hat continues its blogging about Kubernetes being the next generation of application servers. Starting a new series on the topic, Ken Finnegan talks about how Kubernetes can be the application server of the future.

The article covers some basics about container usage and their benefits. It might be a good intro, if you're new to the containerized infrastructure space.

CRAIG BOX: And that's the news.

[MUSIC PLAYING]

ADAM GLICK: Justin Garrison is a senior systems engineer working in the media industry, which means you'll find his name in IMDB. He's the co-author of a book called, "Cloud Native Infrastructure," enjoys helping people, and learning new things. Welcome, Justin.

JUSTIN GARRISON: Hi, how's it going? Thanks for having me.

CRAIG BOX: So, Justin, I understand this is not your first podcast experience?

JUSTIN GARRISON: No, it's not. Way back in the day, I used to help run a podcast called the mintCast, which was the Linux-Mint community podcast, with a couple of people from the community.

CRAIG BOX: So that must have been quite a way back in the day. It was 13 years ago?

JUSTIN GARRISON: Yeah, that was quite while ago. I actually--

CRAIG BOX: Does that make you a podcast pioneer?

JUSTIN GARRISON: Maybe. I had started recording that podcast before I had ever actually listened to a podcast. I didn't really know what they were, but I thought we'd figure it out.

CRAIG BOX: Fair enough. Well, it's still running today, so you must have done something right.

JUSTIN GARRISON: Yeah, I'm really glad to see the community kept doing it.

CRAIG BOX: So obviously, that's an early experience with both technology and community. Tell us a little bit about how you got involved with technology.

JUSTIN GARRISON: Yeah, I've always just been a tinkerer with things-- take things apart and try to figure out how they work, whether it was technology or not. And really, it led me more and more into figuring things out and learning things.

Someone was like, hey, you have experience with this thing, so-- become an expert in one thing or another. And that led me through various stages of my career as far as working with computers and working my software, and just kind of figuring things out as I went.

ADAM GLICK: How did you come to Kubernetes as part of your software journey?

JUSTIN GARRISON: That was just a progression of-- career wise, as far as starting in operations, figuring things out, and working on them, and did a lot with automating things with software. And was doing the dev-op scene with tech management, and helping people manage servers, and really make it repeatable and always the same, so it was less of a surprise when they'd go to work.

And that progressed pretty naturally into doing things with containers, and then figuring out that managing more than a dozen or two containers is kind of hard for people. And so looking into different orchestration engines and figuring out how these things could be run better for people to interface with them and manage larger-scale systems.

And so I started off doing a lot with-- not a lot, but looking into Mesosphere and Swarm and Nomad, and then progressed into figuring out Kubernetes and understanding how it worked together. And then just seeing why it was designed certain ways, and then how to extend that and use it.

ADAM GLICK: And as you work with the community, what parts of the project do you get involved with? Obviously, it's a large community with a lot of different pieces. What areas do you like to engage?

JUSTIN GARRISON: It's definitely grown quite big. I tend to go on the operator's side of it, as far as making it useful and understanding how people that manage the system and how they set up Kubernetes in general, where you can have it just free range for anyone to play with it, but there are best practices for how to extend it when you're actually making things like operators, or controllers, or APIs-- why it's useful to build some of that stuff in, so it is more on the back end of things to make the consumers of that products easier, where there's a lot of developers, there's a lot of people doing a lot with Helm and all these other things to package applications.

And I've messed with some of that stuff just to understand their use cases, but a lot of the time, I stay on the back side of why you would extend Kubernetes and how to use it to its best-- you know, how it was designed. You don't want to go too far out of the bounds, otherwise you can just make this crazy snowflake of a cluster.

CRAIG BOX: You've codified a lot of your learnings into the book, "Cloud Native Infrastructure," which you wrote with Kris Nova. Tell us a little bit about how you got to writing that.

JUSTIN GARRISON: Yeah, it was really kind of a surprise, I suppose.

CRAIG BOX: Just one day, you woke up and you'd written a book.

JUSTIN GARRISON: It was a lot of work. I do have a blog post outlining a lot of the work that went into it, as far as how much time it took and just the process with O'Reilly. But someone else inside the Kubernetes community has written a few different books, Sebastien, and he reached out to me and asked-- he said, you know, my editor is looking for someone else to write a book on this topic.

And really, it was focused around the Cloud Native Computing Foundation, the CNCF projects, to figure out how people should piece them together and what they're good for. So I started down that path because I was involved at the CNCF as well. I'm a CNCF ambassador.

And it sounded really interesting. And it was a passion I had been doing. And my involvement directly with Kubernetes was slowing down at the time, so I thought I could take what I've learned and turn that into something else that a lot of other people can learn from.

And going through that process, definitely talked to people that had written books and recommended getting a coauthor and having Kris coauthor that with me was great, because the book is so much better because she was involved and was able to bring insights that she had from doing large-scale systems.

And she was a maintainer of Kops at the time, the Kubernetes deployment for Amazon. And so her insights there and patterns she had learned were just great to have as part of the book in general.

CRAIG BOX: Is this the first book that you've written?

JUSTIN GARRISON: Yes, it is the first book. It was about a year of work to go through the whole thing and figure out-- it changed a lot. I mean, we had in the first half draft done, really focusing on the CNCF projects and realized that it was already out of date, and the project really didn't matter.

The technologies themselves weren't important. It was really the patterns around how things work a certain way at scale, having declarative APIs, and really making what we coined as infrastructure as software, where instead of infrastructure as code, you know a directory full of JSON files doesn't help you as much, but software running that actually manages your infrastructure, that helped a lot more.

And that was the pattern we saw from interviewing people at Netflix, and Google, and Amazon. And those patterns were more powerful for people managing this infrastructure.

CRAIG BOX: How has life changed for you as the author of this book? Do people come up to you at conferences and ask you to sign copies?

JUSTIN GARRISON: We have had a couple of signings, thankfully sponsored by CNCF and some other companies, which was fantastic. It was just-- it's really great to have a physical thing that I made, that is very tangible to hand to someone and really help them learn something.

And that's just great to be able to pour a year of my work into-- condensing that as much as possible into this is what I think is important, this is how I think that people can become better versions of themselves, in operations, as a developer, as an engineer, to understand these patterns.

And they don't have to do all that work. They have to do the work of reading the book. And then, hopefully, understand some of that stuff. And that's been great.

CRAIG BOX: Adam, we should get some episodes of this podcast printed on eight track and we should hand them out at conferences. I think that's very tangible.

ADAM GLICK: I'll put that right next to our plan for reel to reel distribution and cuneiform tablet shipping. Do you think you'll do a v2 of the book?

JUSTIN GARRISON: I'd love to keep it updated, because a lot of things have even changed in the last year. More things are being learned and there's a lot more that could be added, as far as extending what's in there. We really distilled a lot of it down to fairly short chapters just to not make it too lengthy for people and not make it just drone on.

Because there was no reason to just add words, and so we really wanted to condense it as much as possible. But there are new insights on how people are being successful with this infrastructure. And it's not always just Kubernetes. It's the pattern there of having infrastructure software and using cloud environments, in general, continuously evolves.

So it would be interesting to definitely keep it up to date.

ADAM GLICK: You've talked about your journey through operations and coming to Kubernetes, and you've also mentioned that you wrote the book thinking about that perspective. And I hear a lot of discussion about operators and also about developers, and how both of them are using Kubernetes.

I'm curious if you think about it from the operator perspective. What are things that people should think about as they're taking on Kubernetes? What are the challenges that they would uniquely run into, and they should be thinking about as they're getting into Kubernetes and using it?

JUSTIN GARRISON: I think operators are mandated differently than engineers, in some cases, where they are managing this base infrastructure for where the value is to the business. And it's always hard to balance that role of saying well, this needs to be stable and can't ever change.

And it's a hard place to be with OK, well, I need to always keep things the same, because I can't change them, but I also need to advance them and make them better for people. And in looking at Kubernetes as another tool that you can use, where people are using Bash, they're using Config management, they're using all these things that they're used to, and whether those are simple or not, everyone's kind of comfortable with our own level of complexity.

Where I've written Bash scripts that are horribly complex, but I understand them, so it's totally fine. And I really don't care how complex it is, because I understand it. And Kubernetes might be a different set of complexities that you're bringing into it, but using that as a tool just to say OK, well, I have to understand some basics here, and understand why I would use this thing and when.

And then once you get comfortable with that level, really figuring out how to add value, and not necessarily focusing on just the technology and just the lower level like-- no business, or no one really gets benefit out of just running Kubernetes. You have to put some applications on it.

You have to get value out of that thing. And the value usually is from faster iterations, from larger scale, from all these things that Kubernetes enables, but the tool itself isn't necessarily valuable, unless, of course, maybe you're selling that to someone else, but even then, you need to be able to iterate because it moves so quickly.

Kubernetes isn't standing still. And if you take all this time, spin it up-- OK, I have this thing now. It's going to change in three to six months, and you need to keep up if you want to keep getting more value out of it.

CRAIG BOX: Do you think that the release cycle is too quick? Do you think we should be looking as a community at nine-month releases, or 18-months LTS releases?

JUSTIN GARRISON: I think it's too quick for people that want to maintain systems in a traditional way. Again, that conflict of keeping things stable versus making things better or getting new features is definitely in conflict. But I think, as much as possible, when people asked me how should I get started, I usually tell them just find a hosted solution. Don't run it yourself.

Because you're not going to get as much value-- if you need extend it, and you can't on some hosted platform, maybe you'll get more value out of doing it yourself, but in general, you can go use GKE and guess what? They'll upgrade it for you, and you don't have to worry about all those pieces.

But you can add value on top of it by extending it or figuring out the workflows for people to get stuff into it and make value for whatever you're doing.

ADAM GLICK: You've talked about how you're making people successful. I also know that, at one point, you gave a very interesting talk talking about the challenges that people can have and how Kubernetes adoption could potentially fail. I was wondering if you could share some of your learnings on what it means to make sure that people are successful with Kubernetes adoption because some of those are technical, and I remember you mentioned that some of those pieces might not have been.

JUSTIN GARRISON: Again, it's like, the hardest thing in technology is the people, by far. And it turns out the most important thing in technology is the people. And it's totally worth focusing on the people aspect of it, on how this tool enables people to work better, whether that's better workflows, whether that's better tooling for them, or better just visibility into what's going on.

Because these systems are so abstract where it's-- a server is running a 100 instances of this pod, how do you know when something's broken? And there's all these things that Kubernetes enhances and brings back to operators and developers.

And in the book, I just generically always call people engineers, because we're always engineering something. It doesn't matter if you're an operator or a developer, you have some insight that you want to get out of the system and some workflow that you want to improve by using this tool.

And really, that's the key for me that I've learned talking to a lot of people is just focusing on the people, their workflows, and how they get work done, and improving that stuff is a much better focus than just focusing on the technology, and saying, we need Kubernetes because it's Kubernetes and everyone's doing it.

CRAIG BOX: Looking at the landscape of CNCF projects, is there anything that you feel is missing? You talk about the human challenges, are there any projects perhaps around workflow or anything that you think that-- if you could wave a magic wand over the Cloud Native community and introduce another project, what area do you think that project would be in?

JUSTIN GARRISON: That's funny you mention that. I gave a talk at SCALE about that landscape. And I was pointing out that I really think the landscape sucks, because it's not meant for engineers to look at. It's not like you shouldn't be doing Cloud Native roulette and like picking your projects out of that.

It's really meant for marketing and for being able to see how big the landscape is, because there's value in people getting involved, but it's not for engineers to go out there and pick things out. And really the whole thing is just tools.

You can use all these different tools in different ways. Finding the right tool for the job, and that was really my, the So-Cal Linux Expo-- my whole talk was about that. And I actually added an icon to the landscape and asked someone to point out which icon did I add.

And it was the "Cloud Native Infrastructure," book. And it was hilarious, because it was like, well, there's the one thing. Documentation is missing from this thing on how to use this stuff. But treating all that stuff as just tools, and you can pick one, figure out if it's the right thing for you-- go to any toolbox and see like, oh, do I need the Philips screwdriver or a hammer?

They're completely different things. Can you do the same thing with both tools? Probably. You might hurt yourself, but using the right tool for the job is important.

CRAIG BOX: So would you recommend that people who are currently working on tools, down those tools for a while and focus more on documentation?

JUSTIN GARRISON: I would definitely recommend people learn why tools were made and the history around-- you can look at a white paper for, say, the Raft protocol, and understand from the white paper why they designed Raft and what the trade-offs they had were.

And then from there, you can say, OK, well, it was built off Raft. OK, well, I want to use etcd because it was based on Raft. Or you can look at Gossip protocol and say, oh, well, you know what? The Gossip protocol white paper is completely different trade-offs than what Raft had, and so, I might want to use Consul over etcd because of why those tools were created and the trade-offs they made.

And so, understanding the history of some of those tools is more important than just like seeing oh, well, Kafka is a streaming-- you can do things on stream, so that's what I want. Well, there's other things that do things on streams, they might be a better tool.

And so a lot of times, yeah, focus on try something out and see if it is the right tool, understand the history of that tool, and then figure out if that's the right thing for you going forward, for whatever you're trying to do.

CRAIG BOX: I believe the Raft paper is a lot more accessible than the Paxos paper, for example. Do you think that the implementations of things-- do you think people will adopt them more because the quality of the technical communication is better?

JUSTIN GARRISON: I definitely see the benefit of allowing people to understand difficult technologies more. There is a tipping point when once there is buy-in on a certain thing, there's just kind of this landfall of people that just go that direction because everyone else did it, is a hard place to be because it could be successful for them, they could get in trouble.

But I definitely think, especially for the early adopters, early in that like hype curve, is important to make it as accessible and understandable for people of what the trade-offs are and why it was designed. Which, again for me, like getting into Kubernetes, those were really hard things.

It was actually why I started with Swarm. I started Swarm, because like this is amazing. It's easy to get installed, and it was up and running, and it was just great. And then, I had the benefit of actually spending an afternoon with Kelsey Hightower, and he and I-- he explained so much to me through that of why certain things were designed, and how to extend them, and what the benefits might be.

And that was really eye opening for me to kind of dig into Kubernetes from then.

ADAM GLICK: What's coming next in Kubernetes that you're excited about?

JUSTIN GARRISON: I'm actually excited at the pace that Kubernetes is keeping up with on releases based on time, I think are a great thing for the community and for the project, and not based on features. Because it does force you to manage Kubernetes differently, where you can't spend a year to spin up this thing and understand it and get it just right, and then you're like, wow, you're four versions behind now.

And you can't do that anymore. And treating this stuff even more so like you have to continuously improve it, and continually bring in changes, might be really difficult for people's workflows. I totally get that. In those cases, I say as much as possible, use a hosted solution and just run with it, and you add value on top.

But for people that are doing it manually, they need to really figure out how that workflow changes, and understand, or at least enable the operators to make those changes quickly, and be a little comfortable with risk.

In looking back at error budgets and SRE principles of like, when things go down, what do we actually do with that and not just, OK, let's put out fires all the time. But how do we improve things to make them more reliable and resilient.

CRAIG BOX: After people have read your book, "Cloud Native Infrastructure," what's the next thing you recommend that they do to keep up with what's going on and that community pace?

JUSTIN GARRISON: If it's specific Kubernetes, the working group, the SIGs, and the working-- the Special Interest Groups and the working groups are great places to get started just to kind of get a little focus on something, where you can't take on a full fire hose. It's just way too much.

You're going to be in SIG meetings and work groups all the time. And you can't go on Slack. Last I looked at Slack, there's 32,000 people in there. There's no way that you're going to be able to constantly keep up with that.

CRAIG BOX: Is there a curated version of what's going on that you find valuable, personally?

JUSTIN GARRISON: Yeah, actually--

ADAM GLICK: Besides the podcast, because, obviously, that's the best place to start.

JUSTIN GARRISON: If you have a way to listen to these podcasts, it's great. Like the new segments on this podcast?

CRAIG BOX: Yeah, we have eight track players. That's how people keep up with them.

JUSTIN GARRISON: There's actually on discuss.Kubernetes forums-- I forget who it is. Josh does a really good, "This Week in Kubernetes," and I get those emails, which are great, because it's just a curated list of-- there's one for development, and kind of just like what's going on.

If you really want to know what features are coming way in the future, he has some important poll requests, and it's kind of what was discussed in some of the SIG session and that's a great place. It's like an overview that is a curated list, and I really enjoy those.

But then, yes, if you want like a specialty, I think getting involved in one of the SIGs, whether that's in Kubernetes specifically, or you can go to the CNCF also has their SIGs for things like server lists and storage and that kind of stuff.

You can get involved in whatever you think will differentiate your business and focus on that thing as a differentiator and then get involved and learn more from it, and then also push back and show what your work load might be and how it might be different for you.

CRAIG BOX: And if you want to hear more from the author of the "Last Week in Kubernetes," newsletter, Josh Berkus, you can hear from him on episode 10 of this podcast, which you can download from kubernetespodcast.com, or buy from your favorite eight track store.

ADAM GLICK: Speaking of keeping in touch and keeping up to date on things, if folks listening want to kind of keep in touch with you and follow what you've been doing-- the publications, the talk that you've been doing-- what's the best way for people to follow you?

JUSTIN GARRISON: I'm probably most active on Twitter. I have taken a step back from that, fairly recently, just kind of muting some of my notifications. There's too much going on. But I'm Rothgar on Twitter-- R-O-T-H-G-A-R.

And then also, my blog, justingarrison.Com just goes to a Medium blog where I post some things. It's usually around, at least lately, have been more Kubernetes stuff and just Cloud Native things. I help people figure out some of the problems that I hear repeatedly, either something that I learned or I hear a lot from the community.

I try to post some of it up there. But again, the project moves so fast and the community moves so fast, some of that stuff is out of date quickly. And so, keeping up with a stream is probably closer to Twitter streams.

CRAIG BOX: What is a Rothgar?

JUSTIN GARRISON: It's actually the King from Beowulf, the old story, which was, for me-- he spells it with an H and I was in high school when I came up with this name and didn't like the silent H. But I actually liked the message around the story where the king gave his life for his people to survive from this dragon, essentially.

And there was a movie at the time called "The 13th Warrior," which was also out. In my high school, we were the Warriors. It was kind of-- I was number 13-- it all kind of made sense in high school, and I just kind of went with it since then.

CRAIG BOX: It's not a Beowulf cluster joke?

JUSTIN GARRISON: No, they do kind of go in tandem, but I didn't learn about Beowulf clusters till much later.

CRAIG BOX: All right, Justin, thank you so much for joining us today.

JUSTIN GARRISON: Thank you.

ADAM GLICK: Take care.

[MUSIC PLAYING]

CRAIG BOX: It only remains for me to thank you for listening. If you'd like to thank us back, a really easy way to help us be discovered by more people is to pick up your podcast player and rate us. Use iTunes or whatever it is that you use.

If you're listening to us on the web, all episodes are this great and you should really subscribe.

ADAM GLICK: If you want to get in touch with us, you can find us on Twitter @kubernetespod, or reach us by email kubernetespodcast@google.com.

You can find the show notes with all the links at kubernetespodcast.com. Until next time, take care.

CRAIG BOX: See you in Japan.

[MUSIC PLAYING]