#65 August 6, 2019

Attacking and Defending Kubernetes, with Ian Coldwater

Hosts: Craig Box, Adam Glick

Ian Coldwater specializes in breaking and hardening Kubernetes, containers, and cloud native infrastructure. A pre-eminent voice in the Kubernetes security community, they are currently a Lead Platform Security Engineer at Heroku. Ian joins Adam and Craig to talk about the offensive and defensive arts.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

News of the week

ADAM GLICK: Hi, and welcome to the Kubernetes Podcast from Google. I'm Adam Glick.

CRAIG BOX: And I'm Craig Box.

[MUSIC PLAYING]

CRAIG BOX: Well, security is in the news this week. It's the Black Hat and DEF CON conferences in Las Vegas. There's been a set of security updates made to Kubernetes, and we have a great security interview on the show today.

ADAM GLICK: We do indeed. Have you ever been to either of those conferences?

CRAIG BOX: I haven't. I've never even been to Las Vegas.

ADAM GLICK: Never? Wow. Well, someday you'll get that opportunity. I someday would love to go to DEF CON. I don't know if you've seen the things people have posted of their DEF CON badges, but they come up with some really cool kind of programmable badges that people have. And that just always seemed like a super cool idea that I was hoping we could do at a conference sometime.

CRAIG BOX: Yeah, the treasure hunt, or the scavenger hunt, I think that's at DEF CON every year, as well. They've got these crazy things people need to decrypt in order to get clues to move on to the next thing. It sounds like you could basically go and just play that game and completely ignore the conference for the whole week.

ADAM GLICK: Oh my god, it would be my little geeky paradise. I would love that. The Spot the Fed game also a popular one.

CRAIG BOX: Can you just dress up as a Fed, and--

ADAM GLICK: [LAUGHTER] Try and get spotted? I don't know, you could try.

CRAIG BOX: Do they have decoys?

ADAM GLICK: The only way to know that is to know which ones weren't caught. Here in Seattle, it has been beautiful and sunny, and I've been enjoying the outdoors.

CRAIG BOX: The big blue room, as we call it.

ADAM GLICK: The big blue room, yes. That and playing around with some riddles, having some fun with the Tolkien riddles and other puzzle creation pieces, as I think about building puzzles. I did a puzzle room here on campus the other week, and that inspired me a little bit. Plus, heading down to California next week to see folks.

CRAIG BOX: Lovely. It's always sunny there. Or is that Philadelphia? I get confused.

ADAM GLICK: I hear that's Philadelphia. But only if Danny DeVito will be there. Shall we get to the news?

[MUSIC PLAYING]

CRAIG BOX: Mesosphere yesterday announced they have changed their name to D2iQ, symbolizing day two intelligence, and a pivot away from the Mesos project and towards operation of Kubernetes and cloud native services in general.

ADAM GLICK: I wonder if you can run it on Amazon, where it's always day one?

CRAIG BOX: D2iQ-- with a lowercase i-- launched their own Kubernetes distribution, Konvoy-- with a K-- and the requisite set of training and professional services to go along with it. They also continue to support the Mesosphere Kubernetes Engine, as well as the Mesosphere DC/OS. In the open source space, D2iQ released a new beta version of their operator platform KUDO, also with a K. KUDO was named Maestro when it was released last December, but was renamed soon after in a moment of foreshadowing we totally missed.

Ice Cube has been contacted for his thoughts on the changes, but had not responded at the time of recording.

ADAM GLICK: Google Cloud announced the beta availability of their Migrate for Anthos tool, which we covered in episode 48. Migrate for Anthos can now migrate apps from Amazon EC2 and Microsoft Azure VMs, as well as on prem, moving them to containers running on GKE.

Google Cloud also preannounced Google Cloud Game Servers, a hosted product built on Agones, the Kubernetes-based game server hosting platform, which you can learn all about in episode 26.

CRAIG BOX: Following on from their Kubernetes day in India earlier this year, the CNCF have announced their first Kubernetes summits. The summits are two events that are paired together in a region in the same week. Day one is a single-day conference, and day two is for co-located events, similar to the day before a KubeCon.

The speakers will catch an overnight flight between the two cities, and recover on the Wednesday, before repeating the event in the second location. The first pairing announced is Seoul, Korea and Sydney, Australia, to be held on the 9th and 10th and the 12th and 13th of December this year. The CNCF is now looking for speakers who wish to talk at both events. Further summits are being planned for 2020 in Mexico City and Sao Paulo, Bangalore and New Delhi, Tokyo and Singapore, and Tel Aviv.

ADAM GLICK: Security release announcement time. Yet more problems with kubectl cp have been found and fixed, as well as an issue where, if you had permissions to edit custom resources in one namespace, you can edit them in all namespaces. Get your new version soon if these things worry you.

CRAIG BOX: IBM announced the first set of integrations after closing their Red Hat acquisition. First up, they launched OpenShift on IBM Cloud, which they say is a flexible fully managed service. Next, they preannounced that OpenShift would be coming to IBM Z series mainframes and the LinuxONE platform built on them. They are also releasing Cloud Paks, with a k, but no c to be seen. Paks are a way to deploy common IBM technologies and containers, focusing on areas including identity management, logging, monitoring, and security, as well as adding extra maps to Quake and Half-Life.

ADAM GLICK: Meanwhile, Cisco has added Azure Kubernetes Service support for their Cisco Container Platform, joining support for managing GKE and Amazon EKS in their CCP offering.

CRAIG BOX: If you ever thought, "out of all the different deployment choices, I'm going to pick Helm", unfortunately, your choosing days are not yet over. Stephen Acreman from the Kubedex has published another Great Comparison Spreadsheet, this time focusing on how you might choose to deploy your Helm charts. Aside from the Helm tool itself, there are 10 other options for pushing these to your cluster. The good news is he suggests you can probably ignore all of them and just use basic Helm, which he deems the best choice. Helm 3 will address many of the challenges that Acreman describes in his blog, but it remains an alpha, with a beta expected very soon.

ADAM GLICK: Alibaba Cloud has posted a blog about how genetic analysis is being done on their cloud. While the blog doesn't go too far into the particulars, one of the most interesting things they call out is the challenge in sharing data, and how hybrid deployments can help when processing large amounts of sensitive data.

CRAIG BOX: Jenkins sponsor CloudBees has announced the CloudBees Jenkins X distribution, a stable track version of the rapidly evolving CI/CD tooling for cloud native. The distribution will release monthly, with tested and documented components. It's offered as a free download behind a sign-up form, and a commercial support package from CloudBees can be added. We talked a little about Jenkins X in episode 44.

ADAM GLICK: Finally, the TiDB operator has gone 1.0 and been declared generally available. If you aren't familiar with TiDB, it's an open source distributed new SQL database with MySQL compatibility. In a blog post from sponsor PingCAP this week, they explained that the operator brings ease of deployment and stability to TiDB running in Kubernetes clusters, and they are currently working well with Google Cloud, AWS, and Ali Cloud.

CRAIG BOX: They've also got one of those fashionable lowercase i's.

And that's the news.

[MUSIC PLAYING]

ADAM GLICK: Ian Coldwater is a lead platform security engineer at Heroku who specializes in breaking and hardening Kubernetes, containers, and cloud native infrastructure. Welcome to the show, Ian.

IAN COLDWATER: Thank you. I'm excited to be here.

ADAM GLICK: How did you get into security? What was your background?

IAN COLDWATER: I started hacking things when I was a kid, like 11 or 12. I was kind of a bad kid who liked computers and didn't really like rules. So people told me that there were some, like, interesting things you could do with computers that broke rules, and I was like, oh, that sounds amazing, sign me up, you know? I didn't start working in security until later in life. I took time off to raise my family, and started working in tech in my 30s.

I got into working in security because I was doing DevOps work in tech, and was doing work on containers and securing the container infrastructure at the company that I worked at. And the company that I worked for knew that I had some security background and some security interest. So they were like, we're doing this Kubernetes thing, can you break it? And I was like, well, sure, I can try. You know, challenge accepted.

And then as it turns out, I could break it. And I got pretty good at it, so I started doing a lot of public speaking about how to hack and harden Kubernetes. And I've been doing that since. Now I work as a lead platform security engineer at Heroku, doing kind of penetration testing internally and working on our cloud infrastructure. Before that, in between, I was independent penetration tester specializing in Kubernetes and container and cloud work.

CRAIG BOX: You've mentioned there working in penetration testing. In some of your talks, you've mentioned also being part of a thing called a red team. What exactly is a red team, and is it the same as being a penetration tester, or are there differences?

IAN COLDWATER: That's a topic of ongoing debate in the security community. Often, a red team is considered to be something internal to a company. And that works closely with the blue team, or with their security team to kind of do internal operations and, from kind of a simulated, adversary standpoint, acting as attackers so that the security team can stop them. And the idea is for those to be symbiotic, and to make each other stronger. So red team works for blue and strengthens blue. That's its purpose.

Penetration testing is more general. It is trying to get in, see what avenues an attacker might have to be able to get into that system, and then give recommendations about how people can work to prevent actual attackers from getting in in the future. Red teaming tends to be a little bit more specific to internal teams. Penetration testing can be done by consultants or pretty much anybody else. That's my definition. If you ask 20 different security people, you might get 21 different answers.

CRAIG BOX: We're always hearing about vulnerabilities in software that are discovered by teams. Project Zero from Google are an example of that, people who are actually looking for vulnerabilities in other people's software. When it comes to network and DevOps testing, penetration testing, you're trying to test a service or something that's running on the network. Are you looking for vulnerabilities in the configuration of that particular software, or are you looking for, knowing what the software is, you're doing things like fuzzing techniques and trying to break into that software? Are both of these angles appropriate ways to try and break into a system?

IAN COLDWATER: Well, when we're talking about Kubernetes specifically, there's two attack surfaces in Kubernetes. There's the Kubernetes as a binary, like as an application, and then there's Kubernetes as an API. So those two have two kinds of different attack surfaces and different ways to go about attacking it. And as a pen tester, you know, you do both. You try to get in in whatever way makes the most sense.

And so you might be attacking the API server using API calls or, like, known attack vectors for that. You might be looking for common misconfigurations, like not putting a proper security policy on or leaving things open that might be opened by default, which happens a lot. You might be looking for software with known security issues. Helm Tiller run without authentication is one of the first things that penetration testers will look for when they're looking for a Kubernetes cluster because it's a pretty easy one.

And also, it's important to just note that containers are only as secure as their run times and their orchestration frameworks and their kernels and their operating systems and everything else, because everything is shared in terms of resources. And so if you have a container that's vulnerable that has a kernel exploit or something else on it, that can be useful for compromising in a cluster context, as well.

CRAIG BOX: Do you find that a penetration tester will normally know that much about the environment? A lot of the things you're mentioning here are saying, hey, Kubernetes, I know it has this, I know it has that. If a third party is hired to do some security research into a system, do they normally get given the information, or do they have to infer, for example, whether it's running Kubernetes?

IAN COLDWATER: There's a couple of different kinds of penetration testing. And the term that gets used is you could be black box testing, white box testing, or kind of gray box testing. In a black box test, you have no idea what's going on in there. In a white box test, they tell you, in some cases in great detail, and might even give testers different kinds of access. It depends what is being tested and what the goal of the company is. Gray boxing is kind of somewhere in between.

I would say the biggest issue for third-party pen testers in a Kubernetes context is actually mostly just that Kubernetes is an emerging thing in the security community that a lot of people are still working on getting greater understandings of it. And so a lot of the time, I think for companies it's more of an issue of, like, hiring somebody who knows what they're doing than it is about, like, whether or not that company tells that tester that they're running Kubernetes or not.

ADAM GLICK: One of the terms that I often hear people talk about in this context is DevSecOps. Can you explain what DevSecOps is?

IAN COLDWATER: DevSecOps is kind of derived from DevOps. And DevOps is more of a culture than a job role. It's the idea of changing the software development lifecycle so that development and operations, instead of being at odds with each other, are working together to improve the efficiency of the software development lifecycle as a whole. DevSecOps is, as opposed to having development and operations working well together and then having security kind of off on the side, it's inserting security into your SDLC.

And a lot of the time, people will talk about DevOps or DevSecOps in terms of tools. So, you know, like oh, well there's automation tools and continuous integration, and that's what DevOps means. And it's more than that. Sometimes when people talk about DevSecOps, they mean putting security tools in an automated fashion into the automated things that you might have set up as part of your SDLC. But I would say, in general, for me, it's more of an idea of having security be integrated into that development lifecycle, and having all of those groups work together in a way that achieves the same goal.

ADAM GLICK: You mentioned having a separate security team. And I've seen organizations kind of approach this in two different ways, of either having a separate security team and a separate group that's focused on defining the security pieces, going and building those, and making it that team's job to harden, or to make that actually part of the engineer's job in what they build, and the pluses and minuses to each. Do you think of DevSecOps as something that's a separate team doing that, or is that something that is integrated into what everyone in the lifecycle are working on?

IAN COLDWATER: I don't think that DevSecOps is a separate team, necessarily, in the same way that I don't think that DevOps actually ought to be a separate team. I think that DevSecOps, like DevOps, means that the engineers need to be thinking about development, operations, and security as part of their work, and creating their work with those in mind from the ground up.

So instead of throwing it over the wall to operations and being like, OK, our sprint's done, it's Friday night, have fun, everybody, then the idea is that people take ownership over the code that they build and are working together with each other and with the higher goal in mind. And I think DevSecOps is that, with security in there, as well.

I do think that engineers should be keeping security in mind as they work, as they code. It is also true that security can be a lot in and of itself, and that engineers already have a lot going on. And so one thing that we used to talk about as a DevOps team is building guardrails, not gates, and having it be so that defaults and the ways that processes are made makes it as easy as possible to do things securely, so that every engineer doesn't have to know every bit about security, but it makes it as easy as possible for them to include security in their work as they go.

ADAM GLICK: I love the idea of the easiest path is the one that's the right path. How do people share knowledge and get trained up? Because as you mentioned, especially as an engineer working on things, there's lots of different asks on kind of what knowledge you have, whether it be performance, whether it be reliability, whether it be the actual business logic that you're writing. How do you help people also understand the security pieces, and not just make it something that's overwhelming?

IAN COLDWATER: Oh, it's hard. I don't know that, you know, most people have managed to crack that code yet. I think helping engineers understand that security is something that is relevant to their work, in fact integral to their work, and not something separate or a blocker or something that will slow them down I think is really important. In terms of understanding the hows as well as the whys, OWASP puts out really good information about how to secure web stuff. There is kind of an emerging discipline of people putting out information about how to harden Kubernetes and container things and cloud infrastructure in general.

Like, the information's out there. I think it's important for people who are doing that kind of security information sharing to have empathy for engineers and where they stand, and be able to explain things in ways that engineers are going to be able to understand, which is not to say that engineers are dumb, just that engineers don't necessarily have exactly the same set of priorities as security people do all the time.

And I think it's important, actually, on the flipside, for engineers to be able to explain to security people what they're doing, because sometimes I think cloud stuff and DevOps stuff move so quickly-- by design-- that I think sometimes security people can be left a little bit lagging. So really, more communication is needed on both sides for greater understanding.

CRAIG BOX: Security contributes a lot to risk, and risk is something that businesses care a lot about, especially as you start getting into public corporations. And so you have a lot of executives who pay a lot of attention to this. Who audits the work that these people do? If we take it as read, this is something that all engineers should be playing a part in in terms of securing a system, is it important to have a team who is responsible for auditing and for checking that the risks are being mitigated in a way that is palatable to the people who make the business decisions?

IAN COLDWATER: You know, risk and compliance is not exactly my wheelhouse, but probably, because certainly that is a separate set of skills than knowing how to get your back end feature out by the end of sprint time. When I'm talking about security, often-- and I will just own that this is my own internal bias as a platform security engineer-- I'm usually talking about application security or infrastructure security, and not as much about the risk and compliance kind of auditing checkboxing things, because it's just not where my experience is. But certainly there are people who are very good at that.

CRAIG BOX: Do you find that the risk and compliance teams are the ones who hire the pen testers?

IAN COLDWATER: No. In my experience, that has not been the case. But it probably varies by company. In my experience-- and obviously, I have not worked at every company on Earth-- the auditing kind of risk and compliance team would be a separate team than the people doing the application penetration testing stuff.

ADAM GLICK: So when you're taking a look at the risk of what you're doing, how do you assess what the impact of a security flaw is - how do you define the risk?

IAN COLDWATER: There's math to this. It's a metric that's published, and it has to do with-- you calculate a few things. You calculate exploitability, how easy it is to exploit. You calculate how much access you would need in order to be able to exploit it. Most critical would be, like, you don't need to have any kind of local user account or authentication at all. Can it be exploited remotely, or do you have to have direct access to it in order to exploit it. And in terms of the risk of the business, how high the risk is technologically, like is it going to knock down the infrastructure of that business, and also the risk to reputation and kind of goodwill, the way that people look at the business.

So there's a few different metrics. And the CVSS score I think is done according to that metric. And if, as penetration testers, when we are deciding whether something is critical, high, medium, or low, then that's the math that we do, too, to be like, how easy is this to exploit, how likely is it to be exploited, and how much of a risk is this to the business, both technologically or in terms of reputation.

CRAIG BOX: Kubernetes makes it easy to run distributed systems, and that necessitates opening communication protocols between machines that weren't previously there. We used to have to rely on humans to walk up to multiple machines, configure them to do different things. And it's better now we have a central place and we can apply things consistently, but we have to open connections between machines in order to make that possible. Is it fair to say that the more connections there are between systems, the bigger the security area that we need to worry about?

IAN COLDWATER: Absolutely, yeah. And just in general, how many moving parts are there and how are they interacting with one another. Kubernetes has a wide multivariate attack surface because there is so much happening there. There's so many moving parts interacting with each other in different ways. There's things that can be exposed. There's any number of things that you can configure, and therefore any number of things that you can misconfigure. As an attacker, it's a candy store. For me, it's a fun surface, because there's just so many different parts that I can look at. And as a defender, I think it can make it really hard because you have to get all of them right, and you have to get all of them right in relation to one another.

CRAIG BOX: In that case, as an attacker, if you're going to look at a Kubernetes system, what's the first thing you'd think about?

IAN COLDWATER: The first thing that I'm going to think about is what is this running. How old a version is this running? Is it on a cloud? If so, which cloud? Does it have any kind of plugins, or what else is in that cluster? Is it using Helm? Is it using some other kind of third-party component that might have some kind of vulnerability or exploitability to it?

The second thing I'm going to look at is how it's configured. So does it have common misconfigurations on it? So does it have RBAC on it at all? And does it have applied security policy on it? Like, what exactly is on there? And so I would probably go through looking for relatively easy wins. And then the first thing you do as a pen tester is enumerate the system. Figure out what's on it. And from there, you can figure out if there's anything vulnerable or exploitable on there.

And from there, you go about trying to exploit it. So that would probably be the order of operations for me. And you just sort of want to know what's happening in there. And then if you can get anything exploited, if you can get a foothold, if you can move anywhere, within the cluster or outside of it, then where can you move to. And then you kind of start that process over again, figure out what's going on in there, and if you can exploit it, if you can move any further.

ADAM GLICK: So if I take a look at some of the components, something like etcd -- what's the risk if someone gets into etcd?

IAN COLDWATER: The risk if somebody can get access to etcd is quite high, because not only as an attacker can I read all of what's in etcd, which can include useful data like, you know, cloud metadata, passwords, information about state, information about what's happening in the cluster, but also I can write to etcd, because the thing about the loosely coupled distributed system is that it actually goes in multiple directions. And so if I can write to etcd and I can just declare that something is part of what is happening in state, I can do whatever I want. Game over. I control the cluster, end of the day.

CRAIG BOX: Does it matter whether or not etcd is encrypted on the disk?

IAN COLDWATER: It does. It matters a lot. And so if you would like to be able to avoid me as an attacker being able to easily do that, you do want to encrypt your data in etcd at rest. And in general, you want to encrypt your data in Kubernetes in transit and at rest.

CRAIG BOX: If I can cause your cluster to run on one of my containers, is that game over?

IAN COLDWATER: It sort of depends on what's in your container and how protected the rest of your cluster is. If you have your admission control set up such that it prevents a rogue container from being able to see or access parts of the system that it's not supposed to, if you have that admission control on and configured well, that's not necessarily game over. There are certain ways to run pods in clusters that can lead to pretty serious privilege escalations and cause pretty serious problems, most of which can be mitigated by using solid admission control and configuring it well. Not all of it, but most of it.

CRAIG BOX: If I do get a pod admitted to a machine, and then it's running on the same host as a different pod, there are protections that I can add, as well, to prevent me being able to read data from perhaps a different tenant of a system?

IAN COLDWATER: Theoretically, yes.

ADAM GLICK: What would some of those be?

IAN COLDWATER: There are protections that you can use for that. You could use AppArmor or SetComp. And you can use different kinds of mission control, like node restriction can help with that. And often, you can use different kinds of isolation technologies that will allow an additional layer of protection. Because containers natively are a single process on a shared system, if you can get access to the host, you can get access to everything else on it.

But there are newer technologies coming about, like gVisor and Firecracker, that do add an extra layer of isolation for containers in relation to the other tenants in the system. Figuring out how to attack and exploit those is an emerging discipline of its own. And I do know people also who will run containers in VMs. I don't know that there's a complete consensus on this, but it's a thing that people are working really hard on figuring out.

ADAM GLICK: What about if someone is able to get access to your service account?

IAN COLDWATER: That depends on the version of Kubernetes that you're running. Older versions of Kubernetes, if you can get access to a service account that has access to higher privileged namespaces, is entirely game over. In newer versions, that's been improved. But one gotcha to watch out for is that, in general, not all newer security features in newer Kubernetes versions will necessarily all come into play if you upgrade in place. So if you upgrade in place, then it might be that, to avoid breaking changes, it will keep things like that the way they are. So generally speaking, if you're running Kubernetes and you want to secure it, an important thing and a sort of underrated thing is to have an upgrade plan, and figure out how to do that.

CRAIG BOX: In the past, you've stated that Kubernetes is insecure by design. What does that mean, and is it still true today?

IAN COLDWATER: I have said that, and I do stand by it. Not because I think that the creators of Kubernetes are not incredibly smart and doing their best work, but because the way that Kubernetes has been designed is that the creators made decisions that the way that it should be in order to gain further adoption and maybe ease of use is that it should be open by default, and that the people who are running and operating it should be able to have granular control, according to their needs and according to their use case, to be able to make their own cluster configuration the way that they want it to be.

And that makes sense, because people do have very different needs and very different use cases. If we look at any given Kubernetes cluster, they're all going to be unique and going to be different. The thing about that is that that has historically led to people assuming trust and assuming that Kubernetes is going to be secure out of the box, running it out of the box with the defaults that it came with, and then being unpleasantly surprised when perhaps those defaults were not as secure as they wanted it to be.

And so when I say that it's insecure by design, it isn't that I think they did that on purpose. It's that I think that the idea is a good one. It's that people should be able to control their own cluster config. Just in practice, I think that that has not always necessarily gone the way that the creators maybe intended. And nowadays, I do think that it's improved a lot from the beginning.

I started using Kubernetes much earlier back in the ABAC days, and a lot has been done since then to improve the sets of defaults and the kind of security posture of it. Like for a while, RBAC was not a thing. Then RBAC was, you know, you could turn it on. And then all of these things have been improving with time. And I know that the Kubernetes community and the Kubernetes SIG-AUTH and everybody have been working really hard on improving Kubernetes' security posture. It's just a thing that has been coming with time.

And it's a hard thing, right? Because Kubernetes has enjoyed this incredible rate of adoption. It's gotten so big so fast. And part of that is the ease of use and the way that it has made people be able to get up and running-- that pun is absolutely intended-- relatively quickly. And having really intense secure defaults in the beginning might have made that harder, you know? I will own my own bias as a security person in that I know that my focus on that is not everybody's focus on that. And that's OK.

CRAIG BOX: Given that you can have that focus for security and usability, do you feel that the defaults that are chosen today are correct, or would you like to see them move in one of the directions?

IAN COLDWATER: I'm about to do a whole talk about this with Duffie Cooley at Black Hat USA in Las Vegas. And well, the first thing I should say, because I know that I should, is that there's not one singular set of Kubernetes defaults, that kubeadm has one set of defaults, Kubernetes core has one set of defaults. Every given public cloud is going to have a different set of defaults. There's not one singular group that can be spoken to in relation to that.

In general, personally, I think it's really important to have defaults that are relatively sane and secure out of the box. And that doesn't necessarily have to be done by the Kubernetes project. And in fact, even if it were done by the Kubernetes project, I'm not sure that it would be adopted by all of the different installers and clouds and different people who you might get Kubernetes from.

But I do think that whoever is setting up Kubernetes in the first place, whoever is setting up that managed cloud config, whoever is rolling their own on bare metal, like, somebody in the beginning of that process should be making those defaults as sane and secure as possible, because the engineers who are going to be getting it eventually are not going to want to think about it. They're going to want to get on with their sprint and get on with their day. And making it as easy as possible for them to do their work without having to worry about it and without having to worry about breaking changes if you throw security at them later I think is the happiest path.

ADAM GLICK: People have said the only completely secure system is the one that's unplugged from the wall. So how should people manage the security spectrum that this means they must choose a place on? How should they choose to make the tradeoffs?

IAN COLDWATER: You know, honestly it's going to depend. It's going to depend on the threat model that any given operator is going to be running on. And if you aren't threat modeling-- if you take two things out of this podcast, one of them would be it's really important to secure your defaults because the defaults out of the box are not necessarily as secure as you might think or hope they might be, and do that according to your needs and your use case. The other thing is that threat modeling is really important, because if you're not threat modeling and you're making security decisions, you're really throwing things at the wall and hoping for the best.

You know, what are you trying to protect? What are you trying to protect it from? And if you know that, then making strategic decisions about how you're going to go about trying to protect that from the people you're trying to protect it from can follow more naturally from there than just trying to decisions in a vacuum. And so for some people, maybe it makes more sense for that business, for that org to have higher levels of usability. If it's an internal system that's not being exposed anywhere, I think a lot of people feel that way.

If there's not anything that needs to be protected, maybe some people will make the decision that it doesn't matter as much. And for some people, if they're working on things that are subject to HIPAA or GDPR data or things that have very high sensitivity and need higher levels of protection, maybe for them higher levels of security and sacrificing usability for that is going to be more important. It's really just going to depend on your needs and your use case.

CRAIG BOX: There are a lot of third-party security platforms available to add to a Kubernetes environment. Do you feel that people who are operating Kubernetes should look to adopt one of these as a standard practice, or only if they're doing something really sensitive?

IAN COLDWATER: This is the part I guess where I say that this is my opinion and not the opinion of my employer, because I am not endorsing a product on behalf of Heroku or anybody else. I think that there are some really good Kubernetes security vendors out there doing really interesting work. And I think that, again, it's going to depend on your threat model and your needs and your use case, but I do think that some of those can offer people things that might be harder to do on their own and that might be useful for them, especially if they are trying to protect sensitive data. And again, people's business decisions about cost of that versus cost of paying their own engineers to do that versus how much they care is up to every individual business.

ADAM GLICK: What is your favorite Kubernetes exploit that you've seen?

IAN COLDWATER: There are so many!

ADAM GLICK: Any of them that stand out as being particularly creative, artful?

CRAIG BOX: Is it bad news for the project that there's many that you have to think, ooh, which of them will I pick?

IAN COLDWATER: Well, you know, I don't expect everybody to be as much of an exploit nerd as I am. Like, there's no reason for most people to be. So it isn't that there are so many exploits on Kubernetes that we should all be running around with our hair on fire so much as just, like, I pay attention to this stuff and I can geek out about it a lot.

I personally am really excited about hostPath, which is not an exploit exactly, but there's so many interesting things that can be done with hostPath, which is a feature, not a vulnerability, that I don't think people talk about very often. Duffie Cooley and I, we're having so much fun with it that we named our Black Hat talk The Path Less Traveled, because there's just so many possibilities of things that you can do with options like hostPath and hostPid that involve being able to get access to the host.

And you can mount the Docker socket with it or your other container runtime socket. And that doesn't actually require privilege, necessarily, if you're a part of the Docker group because exposed Docker sockets. You can run other kinds of things with it that will also allow you right access to the host. So you could just add your SSH key to authorized keys for route. You can do a lot of fun stuff with it, and I've been having a ton of fun with it.

CRAIG BOX: So that sounds like something you could shoot yourself in the foot with. Why is that even in there?

IAN COLDWATER: I think there are legitimate use cases for that, and it's not necessarily something that everybody is going to be using in their cluster unless they need to. I would argue strongly that maybe they shouldn't unless they need to. But it is something that is there by default for those use cases. But people really should know about how powerful and how dangerous it can be because it's wild. And I say that as a security professional who looks at these things all day. You know, when I first heard about it, my jaw hit the floor. It was just like, really? You're kidding.

The documentation in Kubernetes, last I looked, described it as a powerful escape hatch, in quotes. And they're not kidding. It is an extremely powerful escape hatch. You can escape a lot with it. Trail of Bits just put out a great blog post about it.

CRAIG BOX: One of the things that people use Kubernetes for is to get more out of fewer machines. And one of the ways they do that is they pack together lots of people's workloads onto machines, where previously, in the past, they might have all had independent machines. There's no such thing as a container in Linux. There are all these constructs that we use to try and emulate a thing. And similarly, there's no such thing as a tenant in Kubernetes. These things are models that we've sort of invented to try and describe hard multi-tenancy in a way that isn't natural in the system. What would you suggest the Kubernetes project did differently to solve some of those problems, if it were to start again from scratch?

IAN COLDWATER: I think that, if we are thinking about the ways that containers treat the Linux file system and the Linux kernel and the capabilities thereof, and the way that Linux treats everything in a file system as a file, and how that might affect multi-tenancy considerations in containers, it goes before the Kubernetes project. I think it goes to people who were first beginning to work on what became container technology, right?

You know, I think if you go far enough back, some of those decisions, I think had people realized what the implications of multi-tenancy might be much farther down the road, I think that the ways that we think about namespaces and isolation in relation to those, and the ways that we think about C groups and the ways that containers get controlled, especially in relation to one another, might have been a little different. I was not there for those decisions, and I have learned in my time as an engineer that, often, decisions that seem completely inexplicable much later were often pretty reasonable under the circumstances and under the constraints that the engineers at that point were on.

CRAIG BOX: The people who made the internet were all university people sharing with other university people. And they couldn't believe that the internet would one day be full of teenage boys.

IAN COLDWATER: Exactly. And I think that I'm not sure that a lot of people in, like, early container land, or even necessarily who were making early Kubernetes decisions, were thinking that it would be used in production at such scale that it eventually started being used at. And I can backseat drive now. But I think that probably people were being pretty reasonable with the dice that they had to roll at the time, I assume. I could make pronouncements as to what I find less than reasonable. But I wasn't there.

CRAIG BOX: If you could wave a magic wand over the project now and add one feature to it or change one thing, what would that be?

IAN COLDWATER: I think that allowing it to be easier to have security defaults that it is possible to be able to opt out of rather than opt into could be a useful thing. And it's a hard thing because what constitutes sane defaults for any given use case might be different. But having a kind of more well-documented more put-out-there set of here's what we think might be a good idea, I think that there are those of us who would try to do that.

And certainly the CIS benchmarks try very hard to do that. But it's hard because there are so many different clouds and configs and different things that can make that different in different cases. But I think, yeah, checkbox opt out rather than opt in by default could be useful.

ADAM GLICK: Is that one of those things that is just an indication of where something is in its software maturity lifecycle? Since software is almost always written by developers, whose job it is to get something done, and the security things are always the things that stand in the way, and you have to undo it. And so things that have those, it's always the guy that's always, here's how to turn them all off.

IAN COLDWATER: [LAUGHS]

ADAM GLICK: Case in point, any time you're a developer, and you've written something that's working over the network, and it's not working, what's the first thing you do? Turn off all the firewall rules, like just open up the firewall to see if it's all working. And then you'll go set it back later. So if I'm building something as a new developer, don't I want to, hey, make it open. Let's make this thing work. And then we'll worry about locking it down. Is that just like a stage of maturity that people go through?

IAN COLDWATER: You're absolutely right. And I think Kubernetes skipped all of that rigamarole by just turning it off in the first place.

ADAM GLICK: [LAUGHS]

IAN COLDWATER: But as security people, this drives us crazy, right? Because it's like people will just put that up there in their how-to-get-it-up-and-running tutorial. And then they'll just leave it. And then it will just be on show for every script kiddie and their mom to discover. But I know why this happens. And it's a thing. [LAUGHS]

ADAM GLICK: Speaking of which, there's a number of different kinds of hacks that are out there, from hardware hacks to software hacks to social engineering hacks. Which do you think are the hardest to protect from, and which are the most interesting?

IAN COLDWATER: Oh, goodness. Software is probably the easiest to protect from of three of those because static analysis or dynamic analysis can often catch software bugs. People are hard, right? The hardest problem in computer science is people. We can try to do an awareness about social engineering and try to steer people away from that. But people are very prone to it. And it's a huge vulnerability that is actually pretty hard to protect against.

Developers actually have worse rates of click-throughs than marketing and sales do by the numbers. People have done studies on this. And it's because developers think that they're good at this stuff, and so they don't worry about it. The hubris of like, oh, no, I don't have to stress about this, I got that, actually is the thing that can lead to people's downfall.

And further training about social engineering awareness does not actually help with that necessarily. It just, like, oh, I had the training. I'm immune. I got this because people who are good at social engineering, and I say this as somebody who's done that kind of work before as part of pen-testing engagements, they're not going to send you a misspelled thing about princes in Nigeria. They're going to find about your interests and your life story and everything else. And they're going to give you something that's well targeted to you. And that's a lot harder not to fall for.

In terms it's the most interesting, I think hardware stuff is really interesting. I think sort of the art that's being made right now about CPU vulnerabilities and speculative execution and the side channels, the attacks that can come from that, to me, are really fascinating and really important. And I don't know that we've entirely solved that problem yet - for one thing, because we haven't, and for another thing, because if we're bringing it back to a cloud native context, the cloud is somebody else's computer. And I don't know that we have entirely figured for even threat modeling necessarily, around what happens when somebody else's computer gets exploited.

I don't think we've done it yet. And I think there's so much there to be discovered yet from an attacker standpoint, from a research standpoint, because there's whole parts of micro architectures that nobody's touched yet, or even done that research around. We haven't seen the last of those vulnerabilities come down the pike yet. And for us, as people who are working on securing the cloud and these kinds of infrastructures, I really think we need to be thinking on them.

And that doesn't necessarily look like the kinds of threat modeling that we're used to or the kinds of securing that we're used to because it involves things working as designed. It's not a vulnerability that can be patched. And being able to properly secure against it might look like rethinking the ways that we deal with computers.

Moore's law got us to this place. How are we going to be able to interact with computers themselves and secure against these kinds of vulns when everything is built on top of it? I don't know that I have the answer. I know that people are working on answers.

To me, I think that's the most fascinating thing is how are we going to do that? This is, I think, maybe the biggest, most existential set of problems that we have as people who are trying to secure the internet right now and one that I don't know that certainly we have not all figured out yet. And I don't know that we've all even figured out the full implications or what that looks like yet. But I'm excited to try.

CRAIG BOX: A number of people I mentioned your name to before the show were very excited to hear that you'd be joining us, but would also like to know who else you admire in the Kubernetes security community?

IAN COLDWATER: There are such great people doing Kubernetes security work right now. I'm so excited to be able to work with them and talk with them. They're so brilliant. I have been really privileged to work with Duffie Cooley in doing this talk with him at Black Hat. I've learned a ton from him, and that's an awesome.

Brad Geesaman, who did the sort of seminal KubeCon talk, "Hacking and Hardening Kubernetes," does amazing work and is still coming up with amazing things, not all of which I can share. But he blows my mind on the regular. Rory McCune does really awesome work too. He's doing a training at Black Hat about Kubernetes security stuff. And Liz Rice is out there putting out all kinds of great information and doing demos that work a lot better than mine do. And, yeah, there's a lot of really great folks, so those are a few I'd like to shout out.

ADAM GLICK: Ian, it was an absolute pleasure having you on the show. Thanks for coming on.

IAN COLDWATER: Thank you so much. It's been great to be here.

ADAM GLICK: You can find Ian on Twitter @IanColdwater.

[MUSIC PLAYING]

ADAM GLICK: Thanks for listening. As always, if you've enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @KubernetesPod or reach us by email at kubernetespodcast@google.com.

CRAIG BOX: You can also check out our website at kubernetespodcast.com, where you will find transcripts and show notes. Until next time, take care.

ADAM GLICK: Catch you next week.

[MUSIC PLAYING]