Kubernetes Podcast from Google: Episode 140 - Security and Snyk, with Kamil Potrec

#140 March 3, 2021

Security and Snyk, with Kamil Potrec

Hosts: Craig Box, Andrew Phillips

Kamil Potrec is a Senior Security Engineer at Snyk, working on security around Kubernetes and cloud platforms. He joins the show to discuss how to think about securing your infrastructure, the different arts (and colors) of offensive and defensive security, and what not to lose sleep over.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

News of the week

Links from the interview

Offensive unit in American Football
Hand-egg
Red and blue teams
Unreal Tournament
Capture the flag
Kubernetes secrets
- Design document
- Encrypting secrets at the application layer
Antivirus software
Tracer-tee
SolarWinds attack
Reflections on Trusting Trust by Ken Thompson
left-pad deleted from NPM
Snyk Open Source
- The open source parts
Snyk vulnerability database
MITRE CVE database
Kubernetes security at Snyk
Deploy only trusted containers to GKE
Application threat modeling
Kubernetes security best practices, including security context, AppArmor, gVisor etc
CVE-2020-8554: man-in-the-middle attack using ExternalIP services
CVE-2020-14386: packet socket vulnerability with user namespaces enabled
- Earlier related work: CVE-2017-7308 and CVE-2016-8655
- Project Zero writeup
Rewrite it in Rust !
Kamil Potrec on LinkedIn

Transcript

Show full transcript

CRAIG BOX: Hi, and welcome to the Kubernetes Podcast from Google. I'm Craig Box with my very special guest host, Andrew Phillips.

[MUSIC PLAYING]

CRAIG BOX: Welcome back to the show, Andrew.

ANDREW PHILLIPS: Thank you, Craig.

CRAIG BOX: You were last with us on episode 23. And that was a very long time ago. I want to say October 2018. I'm sure a lot has probably changed since then.

ANDREW PHILLIPS: I was about to say, I thought there was no such thing as pre-COVID time. I have a hard time remembering that indeed, there was a life before we were all working from home.

CRAIG BOX: Yes. Now, you grew up in London. And, funny twist of fate, your mum actually would live around the corner from me were she able to make it back into the country.

ANDREW PHILLIPS: Yes, indeed. Yeah, this is very COVID related, in fact. Her mail is piling up massively in London, close to where you are, Craig. but when I'm home, I usually meet up for a quick chat.

CRAIG BOX: Now, you mentioned the big pile of mail that you may have to deal with when she gets back to the UK. I had a big pile of mail on my doorstep once when I returned from a trip away. And I thought, that's quite funny. I'll take a picture of that. And so, the little outside area, it's the front door before my front door, if you will. Look at me, how posh I am. I have two front doors.

But there's a picture of my bike parked in there, and the mail, and so on. And this is relevant because about a month or so ago, my bike was stolen. And I sent in a report to the police, and they sent something back very, very quickly afterwards saying, oh, you'll never see it again. Bikes just never get returned. Sorry.

And that was reasonably to be expected. But then, they called later on and said, hey, we may have your bike here. Do you have a picture of it? And I had to think, well, I don't think I do. It wasn't really a great bike or anything. It was just to get around to the supermarket. And I went to Google Photos and typed in bike, as you do.

And the best photo of my bike I had was this picture of all my mail piled up inside my front door with the bike facing it. It was a bit of rear-on view. It wasn't really enough to identify it or anything. The bike never came back. But the pile of mail was returned to me in an interesting sense.

ANDREW PHILLIPS: Yes, I remember. Certainly, bike theft, when I lived in Amsterdam, was a very recurring theme. And I never even got as far as you did. I never had the police called me to say that they had found something. I suspect it ended up in multiple canals. Or my multiple bikes ended up in various canals.

And that was always the yearly fun in Amsterdam when they would dredge the canals and extract various bikes from them. And it's never quite clear, of course, how they got in there. Whether it was theft, or maybe alcohol-related incidents, or something like that. But that was always a fun yearly event.

CRAIG BOX: On the topic of alcohol-related events and dredging, I'll leave you with a story from my university days. Just before I went to university, but where I went to school, there was a pub around the corner next to a supermarket. And the story would go that a couple of people would go down to the pub. And then they'd flip a coin as to who had to push the other one home in a shopping trolley.

And they'd get back to the halls of residence at the university. And they'd realize, wow, we have a shopping trolley. What are we going to do with that? And of course, they'd throw it in the lake. So every year, they'd dredge the university lake and find a dozen shopping trolleys at the bottom of it.

But unfortunately, the supermarket closed by the time I got there. Then, of course, they shut the pub down as well. And it's all probably now flats.

ANDREW PHILLIPS: Oh, the way things change.

CRAIG BOX: May we live in interesting times. Should we get to the news?

ANDREW PHILLIPS: Let's get to the news.

[MUSIC PLAYING]

CRAIG BOX: Red Hat has released version 4.7 of OpenShift, catching up to Kubernetes 1.20 and bringing iterative improvement in areas like bare metal installation and running Windows containers on VSphere. Kubernetes features like horizontal pod autoscaling on memory and a pod de-scheduler, and now GA. The release also includes a technology Preview of OpenShift GitOps packaging up Argo CD and Tekton pipelines.

ANDREW PHILLIPS: DevSecOps monitoring and governance platform Fairwinds Insights has been updated to 3.0 this week. New features include an improved UI, Prometheus-powered usage reports, and automation rules.

CRAIG BOX: D2iQ, now all in on Kubernetes, has released version 1.0 of Kaptain, with a K, their machine learning platform. Kaptain builds on top of Konvoy, also with a K-- they're Kubernetes distribution-- and adds Kubeflow, opinions, and enterprise support. Kaptain was previously known as KUDO for Kubeflow.

ANDREW PHILLIPS: A zero-day vulnerability has been found and fixed in Envoy, where JWT validation can be bypassed. The bug was introduced in January in Envoy 117. And older versions are not affected. A new Istio version has also been published with a fix.

CRAIG BOX: If you heard episode 91 with Leonardo Di Donato, you will recall that the Falco tool started out based on the foundation of Sysdig, including its kernel module. This week, Loris Degioanni from Sysdig has announced the underpinnings of Falco, including the kernel module and supporting libraries, is now moving to the falcosecurity organization on GitHub and is being contributed to the CNCF. Degioanni hopes the tools will become the basis of other runtime security products. And Falco continues to work towards CNCF graduated status.

ANDREW PHILLIPS: Funding news this week, London's StorageOS announced a $10 million US dollars series B, taking their total funding to 20 million. This round was conducted remotely and includes backers that the company have not met, which CEO Alex Chircop suggests validates their technology and that you may not have to travel to Sand Hill Road to get finance in the future. Also, Platform9 took a series D round, adding $12.5 million to the company for a total of 37.5 million.

CRAIG BOX: Finally, the gas leak year that was 2020 put a stop to the Kubernetes Community Days. But you just can't keep the community down. The events are relaunching online in '21, which allows more accessibility. Two community days have been announced for Africa and for Bengaluru in India. The Africa event will be on April 29 and the Indian event on June 26. CFPs are now open for both.

ANDREW PHILLIPS: And that's the news.

[MUSIC PLAYING]

CRAIG BOX: Kamil Potrec is a senior security engineer at Snyk working on security around Kubernetes and cloud platforms. Welcome to the show, Kamil.

KAMIL POTREC: Hi there, happy to be here.

CRAIG BOX: Your bio says you've spent most of your professional career in offensive security. It also says when you were at university, you were the offensive captain of an American football team. Can I infer anything from that?

KAMIL POTREC: Nothing related to each other. I studied security, but I also kind of played American football at Uni.

CRAIG BOX: Now, that was in the UK. That's not a very common sport, perhaps. How did you get into that?

KAMIL POTREC: When I joined university, I was looking for some cool sport to play. And of course, I came across the rugby. But then next pitch, they were actually playing and having tryouts for American football. And I looked at it, and I was like, hmm, that sounds interesting. I tried out and fall in love with it. So that's how it got there.

CRAIG BOX: The thing I think about American football is that it's not a ball. It's kind of egg-shaped, and you don't kick it with your feet. So the game should really be called "hand-egg".

KAMIL POTREC: That sounds like a good name. [LAUGHS]

CRAIG BOX: Now, one thing that's different about American football to most other football codes is that there are different people who play the offensive and the defensive roles. Do you think about security the same way?

KAMIL POTREC: We clearly have the both sides. We have the red teams, blue teams, which kind of play the different teams. Whether they should be this way, we could debate. But I've been in both sides in my career.

CRAIG BOX: Can one person be Super Bowl quality at both?

KAMIL POTREC: It's a tough challenge. If you want to-- security is so broad. There is so many areas you can specialize in. It would probably take you quite a bit of time to get to the level of being a Super Bowl level of both.

CRAIG BOX: So we know how you got into American football. How did you get into security?

KAMIL POTREC: I was always into IT. But in college, I had a good professor who was a security engineer. He told us a couple of amazing stories about how he set up multi layered security and how he defended against hackers. And I think that's where it started. It's like, oh. I said, this is an interesting, challenging thing, and I want to get into that.

CRAIG BOX: And then the choice, obviously, about red team versus blue team, offensive and defensive. What made one more appealing to you than the other?

KAMIL POTREC: So I've started, actually, on the blue team in a bank. But after a year and taking part in some CTF and challenges, I found it being super exciting to break things, go into the debugger and see how software can break. And that have kind of swung my career towards breaking technology first.

CRAIG BOX: Now, I played a lot of Unreal Tournament back in the day. So we're going to need to set some things here for the rest of the audience, perhaps.

KAMIL POTREC: [LAUGHS]

CRAIG BOX: Red and blue team are not necessarily the colors of the flags. But they are the attacking and the defending aspect of security respectively. Is that correct?

KAMIL POTREC: That is correct, indeed.

CRAIG BOX: And then so CTF, again, not so much sitting on top of a tower sniping at people while they're trying to retrieve the flag. What is a CTF in the security sense?

KAMIL POTREC: It's a hacker team. So if you're in software development, it's basically a set of challenges where you are presented with a security challenge. It could be a network that you can break into. It could be a VM that you can find vulnerability in and exploit it. Or it could be a piece of binary code which you have to reverse-engineer, and find a flaw, and then exploit it.

CRAIG BOX: And so the people who set these challenges up leave some kind of flag there for you to prove that you've succeeded?

KAMIL POTREC: That's where the flag comes in. You have kind of a known target, a known-format string of a file name that you need to find and retrieve it in order to prove that you've completed the challenge successfully.

CRAIG BOX: Now, thinking about security in the context of Kubernetes and Cloud Native, does Kubernetes make it easier or harder to secure an infrastructure?

KAMIL POTREC: [CHUCKLES] I don't think that is an easy answer to this. I do believe if you use the tool properly, it makes certain aspects of securing the infrastructure easier. But at the same time, it is a complex system, which presents its own challenges. And then it also brings up the bar in order for you to secure it properly.

CRAIG BOX: Does consistency matter in the sense that if everyone's running slightly different applications, then there's some obscurity in the system? And we know that's not necessarily a good thing for security. But do you have things that go away when there's a consistent API and everyone knows that they can just attack it using the same tools?

KAMIL POTREC: When you know that every application will retrieve the secrets from a single place, then you can just go after that as your main target and not worry about how that specific application handles secrets, for example. And that's where Kubernetes could come in place.

CRAIG BOX: Now, let's think about secrets in Kubernetes. Because right from the beginning, they weren't actually really that secret.

KAMIL POTREC: No, they were not. I find the name confusing. It should be, probably, some sort of information discovery thing, not the secret-handling solution. But there are challenges around secrets in Kubernetes.

CRAIG BOX: For those who aren't familiar, a secret in Kubernetes is effectively just an object which is encrypted in a very trivial way so that it doesn't accidentally get typed out in the log file. So there was never any encryption that was guaranteed. But a lot of vendors have now connected it up to more secure systems. So first of all, do you think that the Kubernetes out-of-the-box system should be more secure than it is?

KAMIL POTREC: I'm in the middle on this topic. I think I like that it gives the ability to modify it in a way that is secure.

CRAIG BOX: Mm-hmm.

KAMIL POTREC: But I also see it as a quite sophisticated system which you should have a certain amount of engineering skills to use it in the first place. Maybe there is much simpler solutions out there for you if you don't want to dive directly into Kubernetes first-- for example, Docker Compose files.

CRAIG BOX: Now, for someone who has a cloud provider with an option to connect to a key management system or something else that improves security, should they enable that without thinking?

KAMIL POTREC: No. Yet again, it's just more complexity into your system. You definitely should use it if it's available to you. But read up on how it should be used securely, where the logs go, whether the settings in that KMS solution do provide you with additional security.

CRAIG BOX: What would you recommend to someone who's rolling their own Kubernetes, aside from, don't do it?

KAMIL POTREC: [LAUGHS] Look into solutions such as HashiCorp Vault, that can provide you an additional layer as a service in your Kubernetes to secure your secrets.

CRAIG BOX: Now, you said before Kubernetes is obviously a very complex system. Is it possible to make such a complex system secure by default? Is that something, again, that should have been built in from the beginning? Or is it something that, in retrofitting and trying to tack that on over time, we're going to have trouble making it secure?

KAMIL POTREC: I think this is where the distributions come into play. I don't think you can build such a complex system suitable for all use cases. But then you can have vendors who take that and modify it to have special configuration sets that to bring it to the level needed for a specific use case.

CRAIG BOX: If we think about malware on desktop PCs, back in the day, we'd have a virus scanner, which would check all the files on your disk when you ran it. Maybe we would move to a program which would check when you physically inserted a disk in in the case there was a boot sec device, for example. We check only on those cases. And over time, it moved to being able to scan every single access of every file.

This kind of thing had a big impact on performance. And so a lot of people would turn that off. Now, well, the Mac users would say, oh, our platform is safe. We don't have to worry about any of that. You don't need to run antivirus.

KAMIL POTREC: [LAUGHS]

CRAIG BOX: If we bring that to a Kubernetes case, we can apply security in many different parts of the life cycle of code and containers. What are the Kubernetes equivalents to those desktop approaches?

KAMIL POTREC: In Kubernetes, you have a problem of non-persistent storage. So in the first place, when your pods run, they will dispose of whatever files they have every time they spin up again, right? So when in the desktop life and in the server ecosystem, we had to live with servers that have uptimes of days if not months. In pods, you kind of can get rid of this kind of persistent malware when your pod restarts every minute or so.

Yet there are so many problems with scanning files. I have dealt with antiviruses all my life. And I've seen organizations trying to justify not using them in the first place because the overhead of running them was detrimental to actual benefits of using it in the first place.

CRAIG BOX: How do we make sure that doesn't become true for Kubernetes solutions?

KAMIL POTREC: Tough question. I've yet to see a good antivirus vendor for Kubernetes solution in the first place. But what we need to think about is how malware works these days. It's no longer something that sits on a disk. We have variants that sit purely in memory, something that desktop users of the time didn't have to deal with that much. There is a lot of things in this place. I don't have a good answer at this time. [CHUCKLES]

CRAIG BOX: Well, let's talk a little bit about what malware does and how it spreads. Because in the distant past where we talk about the desktop, the goal was, perhaps, to spread something around that would infect as many computers as possible. Whereas these days, malware is quite often targeting individual users on the internet. Or it is trying to not necessarily replicate, but it will, for example, try and lock all your files in such a way that you have to give them some money in order to get them unlocked.

When we think about running containers in these kind of workloads, are we dealing with one particular kind of threat? Or do we have to deal with the whole gamut of malware?

KAMIL POTREC: With specific threats, the malware that will target, you have to be Linux compatible. It has to run on the specific type of pod that you might be running on, right? So the containers that you're importing can be running different type of distribution. They're not all the same. So there is a layer of complexity, which gives you additional security benefit.

It needs to live in memory for short periods of time. It needs to be aware of quite complicated network instrumentation. So before you had access to the internet, your pods might not actually be connected to the internet all the time. They might be limited by something called network policies in Kubernetes. And they need to perform the task that actually brings a value.

So before, where there was a lot of malware that wanted to destroy systems for either profit or competitive advantage, these days, the malware attempts to either mine information or--

CRAIG BOX: To mine bitcoin.

KAMIL POTREC: Exactly.

CRAIG BOX: How much malware that you see in Kubernetes environments today is targeted explicitly knowing that it's a Kubernetes environment versus it's just a general piece of Linux malware?

KAMIL POTREC: Most of the malware you will see on Kubernetes clusters has to be targeted for the fact that clusters themselves are usually not exposed to internet directly. Whereas before, you would expose a specific port on your Linux node, and that could be targeted. Nowadays, it sits behind the ingress between different services. And it needs to get there.

So the malware that I've seen on Kubernetes clusters was highly targeted. We need to think about the threat actor. Who is targeting clusters? And who knows that they want to achieve a certain objective? That's not your typical script kiddies. Script kiddie is a person who found a piece of malware on the internet, and put it in a script, and sent it, and see what it can do.

Those are usually skilled organizations. We need to call it that because their goal is to make money out of either mining bitcoins on somebody else's processing computers or harvesting information-- basically, for encrypting the information for ransom.

CRAIG BOX: Two threads there that I'd like to pick up on. The first one is the cryptojacking. We're hearing a lot of people talking about malware that's running on Kubernetes environments, which by design are probably distributed systems with potentially far more nodes available to them than the average computer on the internet. People who are running these things are trying to mine cryptocurrency. Is it working? Are they making money?

KAMIL POTREC: If it wouldn't be working, we would not be hearing about it. I think these organizations make quite a good living out of exploiting the systems. If you think about big companies who use these clusters, they have quite sophisticated processing power. And you can mine quite a lot in a short period of span in order to get that.

And again, when you think about how bitcoins are mined, this is already a highly distributed algorithm. It is created to work on graphics cards and big farms which need to distribute this load. And Kubernetes is just ideal spot to try to reimplement the same algorithm on somebody else's computers.

CRAIG BOX: So are we seeing these attackers change their malware to explicitly look for Kubernetes and run container-based mining workloads?

KAMIL POTREC: Yes, at this stage in Kubernetes, I think the high-performance computing, it's still in its early development. So I think in the future, it's going to become more persistent where Kubernetes will be able to support more generic graphic cards. But at the moment, I think they just want to run as many pods as possible.

CRAIG BOX: The second thing that you mentioned that I wanted to pick up on is the fact that it's not script kiddies anymore. I remember the video of the guy saying he's running Tracer-tee and he's going to hack you, and he's behind nine firewalls. Now, instead, we're dealing with nation states. We're talking about the SolarWinds hack that happened recently that could well have been a country that was behind this. How can we even defend ourselves against such an attacker?

KAMIL POTREC: First of all, you personally don't have to worry about it. The nation states are not targeting you. But if nation states want to attack you, they probably will.

CRAIG BOX: They'll put something in your underpants.

KAMIL POTREC: [LAUGHS] Don't lose sleep trying to protect yourself against such an actor unless you work for government yourself. What they will try to do is to utilize multiple zero-day vulnerabilities that you probably are not yet aware of and use those highly targeted attacks to obtain a specific action.

With SolarWinds, this is a typical attack for supply chain. So whereas before someone would target the hardware, they would probably put some hardware malware on your CPU before it got to your server, nowadays, they look for these big players in our industry to try to hijack those systems, knowing that they will have access to thousands and thousands of different customers to be able to perform whatever actions they want to do.

CRAIG BOX: Now, you're saying I don't have to lose sleep over nation states. You're a security vendor. Do you have to lose sleep?

KAMIL POTREC: I wouldn't say we lose sleep. We do keep it in mind. Most of the customers that we deal with and the threats that we try to prevent may be a little bit below the nation state pay rate at the moment. [CHUCKLES]

CRAIG BOX: We'll see. Things are picking up in the industry.

KAMIL POTREC: Yeah.

CRAIG BOX: Let's then turn and look at the supply chain attacks that you're talking about. Most software these days is built on open source. And most open source is built by strangers on the internet. And there's an awful lot of trust involved here. There is, of course, Ken Thompson's famous paper on trusting trust.

But then there's also the left-pad incident on the NPM service. And then there are tales of people abandoning open-source projects, and them being acquired by other people, and malware being put in those. When do we start thinking about security? Is it for every line of code that we write and every library we import?

KAMIL POTREC: So I think you should be thinking about security as early as possible. But luckily for us, we've been doing this for quite a long time. So there are solutions to help you make sure your open-source libraries are at least somehow secure.

So nothing is perfect. There is no blue pill here. But you can quite easily get your open-source libraries scanned for known vulnerabilities. And you can then take reasonable risks while importing them into your projects.

I would not discourage anybody from using open source. Remember, if something is proprietary, it doesn't mean it wasn't copied from Stack Overflow. [LAUGHS]

CRAIG BOX: Now, Snyk have an open-source product. Is that product itself open source?

KAMIL POTREC: Parts of it is. So you can clearly look at, for example, our CLI. So everything that we will run in your CI/CD, on your desktop, you can look at how it does it. The backend of it, it's not open source. It is a proprietary piece of software.

CRAIG BOX: And that also connects to your vulnerability database. What's the benefit to your customers and to the community of maintaining such a database yourself?

KAMIL POTREC: Basically, it's an effort of a group of people to maintain such a huge database. One individual basically has no ability to monitor such a vast ecosystem of features. So what companies like Snyk, the one I work for-- they have teams of people looking after different systems. So they will look into Linux. They will look into Windows. They will look at NPM packages, Go Lang, C++. If you start counting them out, we probably would run out of time.

CRAIG BOX: Is there a Wikipedia of vulnerability database? Is there a group across the industry that brings all these together? Is that what CVE is, for example?

KAMIL POTREC: Exactly. So basically, CVE is identification of a single vulnerability. But there is a library behind it. So there are websites like MITRE and CVE that you can go and search for vulnerabilities in your packages. They are categorized by year which they've been introduced. They can tell you how to fix it, if there is a patch available, and if it's actually exploitable or not.

CRAIG BOX: On Snyk's website, you talk about open-source security, code security, container, and infrastructure-as-code security. We talked a little bit, then, about open source and about the programs that you have that allow you to check your open-source vulnerabilities. When I'm writing code, whether it's going to be open or proprietary, how do I integrate security into my thought process?

KAMIL POTREC: Ideally, you would have the security feedback as early as possible. Where these solutions can nicely integrate is your IDEs. So I personally use Vim, which isn't the most developer-friendly tool. But if you use things like PyCharms and other IDEs like VS Code, you can get actual visual indications of the bad code that you're writing right there while you're typing it, with recommendation of, what is the better function to use, what is the better library to import, et cetera.

CRAIG BOX: So it can identify, don't use memcpy and don't use left-pad.

KAMIL POTREC: Correct, indeed. So it can do that. It can tell you, don't use Requests, or use this specific HTTP library.

CRAIG BOX: Does it get all sassy and say, if you were using Rust, you wouldn't have this problem?

KAMIL POTREC: I wish we could say that. But I think we need to think about some marketing there. [CHUCKLES]

CRAIG BOX: Once you've written code, we're now going to compile it and then probably package it up in the container. Because we are talking about the cloud native space. When I come to deploy these containers, should I be scanning them on every pull in the container, on every deploy? Should I be scanning them continuously?

KAMIL POTREC: You should be scanning them as early as possible. So hopefully, your CI/CD will push your images to some sort of registry. And hopefully, there's a private registry. Ideally, you scan them before they're actually there.

Then you will make sure if they are secure. And if you haven't found any vulnerabilities, that once it's in the registry, it's signed by some sort of signing mechanism. And then they cannot be changed. So they're immutable.

So for example, if someone is there to update them, they cannot replace the same image with the same signature. And then when we get to deploying it, you could scan it again. But at this point, you already are quite happy with the image that you're running. But then you run into a problem of, how often should you update those images? And what if they get outdated in time? So hopefully, you have a solution that scans either your registry and makes sure that those images have no known vulnerabilities.

CRAIG BOX: And if I've written something which wasn't detected at the time, is vulnerable for some reason, or some exploit has happened which is novel, and then some known piece of malware starts executing on my cluster, or some sort of pattern that's identifiable, how can I identify that? How can I have something watching my environment and saying, hey, all of a sudden, that's connecting out to something that shouldn't be, or that's started a bitcoin-mining process?

KAMIL POTREC: Sure, so this is where we get into the runtime security aspect. So every check you've done before, it's kind of a static analysis tool which checks on the static file. While something is running, we call it runtime security. You can then use solutions that monitors processes on your clusters. So they look at specific containers. But even in those spots, it looks at processes that are running, what network connections are happening.

And then based on kind of a behavior statistical analysis, it can say, oh, this is abnormal. This should not be happening. Or, for example, the process that had been executing on your pod actually did not exist on the Docker file that you've created. And then you can start stopping those processes while they're executing.

CRAIG BOX: Should I set my cluster up so that it can only run the signed images that are generated in my registry?

KAMIL POTREC: Oh, I really would hope that everybody would follow this practice. It is quite hard, right? You have to think about usability there. But I've seen that implemented well. And it actually solves quite a bit of security threats.

CRAIG BOX: The usability question there is, obviously, in the case that something goes wrong, I want to break glass on my cluster. I want to run some sort of diagnostic tool. How do you balance those two challenges?

KAMIL POTREC: This is, I think, the biggest challenge that faces the concept of immutable infrastructure. And concept of immutable infrastructure is that once you deploy it, you never perform any manual changes on it.

You need to think about your break-glass scenarios. This is where you kind of want to have two actors. They're confirming that we will change our standard practice and allow us manual access. You can think of access-control solutions that will allow, for example, you and your manager to say, between this time, your user is allowed to, for example, do kubectl edit on the pod to perform some actions-- or, for example, kubectl exec to run some additional debugging processes.

CRAIG BOX: And that, of course, will come down to the auditability of those actions, being able to prove that it was only done during those allowed times.

KAMIL POTREC: This is where you get into the realm of trust by verify. What I mean by this is you allow some actions for specific reasons. And then you go and verify that only those specific actions have happened.

CRAIG BOX: You're working primarily on infrastructure-as-code security at the moment. We talk, obviously, about Kubernetes here. But there are Terraform modules. And there are other types of charts that are deployed. What is it that you're working on? And how does it relate back to the things we've talked about so far?

KAMIL POTREC: The work that I do at the moment is to use my experience with breaking into systems, and how the CI/CD pipelines work, and try to implement the best practices in the way you configure your infrastructure as early as possible. So you can do that in Terraform or in declaration file.

And then we look for things such as, have you configured your pod to be running with too much privileges? Or is the IM policy that you are applying to the cluster or the AWS environment grants you too much permissions? And then we try to implement those checks before you deploy them so that when you run CI/CD, it at least will warn you that the PR that you're trying to submit has compromised your system in one way or the other.

CRAIG BOX: And do I think about this the same way I think about my code, in that I should be validating changes I'm making? But then I should also have something that's watching the state of my infrastructure and seeing if there are nodes spun up with GPUs that I didn't expect?

KAMIL POTREC: Yeah, so at the moment, what we're focusing on is to how we provide you the best advice based on the configuration you have. The challenge with that is we have kind of limited visibility, right? So you might be changing a single pod in the PR that you're making.

But that might be impacting the entire cluster, right? You might be opening too many pods for too many applications. And this is where cool new features coming up nowadays-- to see, how can we correlate configuration files to a running environment?

CRAIG BOX: That's sort of the mantra behind GitOps, if you will. It's the idea that you can say everything that's a change to the environment goes through a central configuration system. But the reality of a Kubernetes environment is that you will, for example, have an autoscaler. And that won't necessarily commit a change that says, please scale up. It'll just scale up for you. How do you keep those two things in sync with each other?

KAMIL POTREC: The simple answer is you cannot. All we could do is have reasonable assumptions made that, for example, as you're describing, autoscaler will scale up and down. But that doesn't necessarily impact security as such. We're OK with those issues.

But there's additional challenges. What if autoscaler doesn't work properly? And then its abnormal configuration affects your other systems. We're just venturing into that space, and it's quite a new area. So you'll see probably a lot of new startups coming up with runtime security in the cloud configuration space.

CRAIG BOX: How do I think holistically about all of this? I might have security issues that are in one of my containers that might trigger something that resizes my cluster or does something else. And then it might also be triggered by network traffic. So I presumably want one system that's aware of all of the different places that an actor could be attacking the cluster.

KAMIL POTREC: Ideally, where we want to be is to build this contextual map of your environment that can integrate into different points. So it will have access to what you've configured in code. But it also will have access to the APIs to determine what actions are happening and also to have access to logging and tracing infrastructure to see, what information is the system telling us in logs? Correlate that information together to make kind of a mind map of what is happening at the moment.

And I think when we get to that point, when we can say, we know why this is changing, we can give the developer more confidence to say, OK, there is nothing bad that's happening here. This is just a normal autoscaler that just basically needs to react to higher throughput of our application.

CRAIG BOX: Now, as we said right at the beginning, your background is in offensive security. So when should I be engaging a red team or a penetration test on my environment?

KAMIL POTREC: There is no right answer here. So ideally, you will engage with some sort of security engineer early on to perform threat modeling on your application or your system. And threat modeling is basically a process of going through endpoints on your system and identifying which actors could exploit those specific endpoints, if there is specific entry points to perform some sort of exploit on your system.

When you engage penetration testers, that usually is when you have an MVP, Minimal Viable Product ready, to verify that all the security controls that you've put in place, hopefully identified through the threat-modeling process, are working effectively. And then you will want to look for skilled testers who know which gaps to look for and how to actually test the effectiveness of your control, not just their presence.

CRAIG BOX: Let's have a think about some of the places that an attacker could conceivably attack. We have, first of all, the kernel and the container runtimes themselves. What interesting things would you be looking at there or have you heard about?

KAMIL POTREC: From my past experience and what we keep on reading on the internet, kernel is just this massive entry point to your system. Unfortunately, when you think about pods and containers, they are simply processes to the container. They're just exactly the same like you would run a ps or ls on your system directly.

And they have a vast amount of access to the kernel itself. Kernel manages everything that happens in the system. And once you gain access to that, everything is open to you to perform any actions. You basically cannot control any action that happens in the kernel if it's malicious.

So if the attacker wants to spin up a new process and they already have this arbitrary execution in the kernel, they will be able to do that. Luckily for us, there's been a lot of security engineering done into hardening kernel. And there is ongoing process to make it even harder.

And there are features such as memory segmentation. There are features who attempt to discover that the code that you're committing should not actually be executable. So we're in much better place than we were 10 years ago, I would say.

CRAIG BOX: What are the settings that I should be enabling in my Kubernetes YAML, my pod specs, to get the best benefit out of those security features built into the kernel?

KAMIL POTREC: What you would want to do in the spec file is to make sure you enable security context features. So you look into what user your pod will be running as. And by default, that is root. Unfortunately, there is no easy answer to say what user it should be running as.

CRAIG BOX: That does sound like a poor default.

KAMIL POTREC: You could say that. But at the same time, how do you know what file system matches your pods should have? There is a lot of challenges. I do see the reasons why it was left to the user to decide what ID the process should be running with.

CRAIG BOX: There are a lot of different mitigations that you can use, from things like seccomp and AppArmor all the way through to entire implementation runtime times like gVisor. When should you think about using one or other of those security systems?

KAMIL POTREC: So first of all, some of them are dependent on the distribution of the system you're using. So AppArmor will be available to you if you're running your node cluster on the Ubuntu distributions. Seccomp is a good feature because it's basically built into the Linux kernel.

Unless it's disabled in the kernel-- which, I think all the latest distributions enable it by default-- what seccomp is is this filtering system for entry points to the kernel, so the sys calls. That's how any process communicates with the kernel. And you can basically specify which of those-- and there is, if I'm not mistaken, over 200 of sys calls in the kernel at the moment, allowed or disabled.

Luckily for you, by default, the-- I wanted to say Docker, but that's being deprecated in Kubernetes now. It has a limited set of sys calls that are available to you. So there is already done some hardening. It's not perfect. There are still some of those sys calls that are highly privileged. But it's better than nothing.

CRAIG BOX: Kubernetes itself is not inherently built to be multitenant. There are a lot of people who have added that on in some fashion. Is it possible to secure Kubernetes with that in mind? Or do we need to re-architect it to be able to deal with this?

And for example, there was a vulnerability for some definitions of that word recently where it turned out that anyone could create a service with an external IP address and then redirect traffic from, say, google.com to their IP address if they had that permission, which is effectively something that you can't fix that without completely redesigning the way that services work in Kubernetes.

KAMIL POTREC: I don't think this problem can be solved directly in the current architecture. I've tackled this problem on many sites. And ultimately, if you want to have isolated workloads, the best bet is to have separate clusters and think about how you can then automatically and preferably encode, spin up clusters rather than just individual pods.

There is a lot of challenges with that, of course. But unfortunately, I would not recommend venturing trying to make Kubernetes tenant isolated native. There is some built-in features in Kubernetes, such a node affinities, that try to make some workloads independent of each other. But that's just a lot of orchestration that's required to make that correct.

CRAIG BOX: Don't I then lose the ability to bin pack when I have a different cluster for every tenant?

KAMIL POTREC: You do, indeed. But in today's world, when we have a lot of major cloud providers with the managed services, a lot of big organizations are thinking about, how do we not vendor-lock into a single provider?

And then they have the same challenge. It's like, can we easily split up workloads between clusters in different cloud providers, so Kubernetes on AWS or GCP? And then you start coming up with amazing solutions to, how can I distribute workloads between isolated clusters as well?

CRAIG BOX: There have been a lot of security startups acquired in the last year. Snyk are going it alone at this point. How do you differentiate providing a platform versus the sort of just-enough platform that will be provided by these people who now have some security built into their Kubernetes environment?

KAMIL POTREC: What you gain from external vendor is if they specialize in security, they want to deliver security to you rather than just make sure that you buy their specific version of Kubernetes. So I think that's where the best benefit you have, is to get somebody who deals with different customers who have different requirements, and can actually see a vast majority of problems rather than just specific problems related to a specific vendor.

CRAIG BOX: Now, finally, is there any security vulnerability that you've looked at and seen, oh my god, that's just so amazing, that's clean, I really respect how that was done?

KAMIL POTREC: [LAUGHS] I am always envy of any exploit that people to come up with because I appreciate how much effort goes into trying to find one. The one that sticks to me is the vulnerability in username spaces that was found again in 2020. And it's basically a re-implementation of an issue that was found in 2017.

And I think back in 2016, when I looked at it, I just went and said--

CRAIG BOX: I've seen this before somewhere.

KAMIL POTREC: Exactly. And then I thought to myself, why? How are we in the same space again? So I think that's what I would say.

CRAIG BOX: And what is that vulnerability?

KAMIL POTREC: That was an issue in the packet socket of Linux kernel that is only exploitable if you actually enable something called username spaces. The username spaces allow unprivileged processes in a Linux to execute some privileged commands in a limited context. However, because everything is so intertwined, that allows you to bypass some privilege controls and obtain root on your system.

CRAIG BOX: Why was this not fixed properly the first two times?

KAMIL POTREC: I think in this case, it was the same problem found in a different location. So kernel has been developed for almost 30 years now. The code base is extremely big. You can basically find the same problems in different spaces. And I think the kernel community has realized that. And now they have task forces to basically defeat a group of vulnerabilities rather than focus on a single individual one.

CRAIG BOX: Do you think it's time for them to rewrite the kernel in Rust?

KAMIL POTREC: Oh, I do know that there is an ongoing, amazing project to write kernel in Rust. And I was looking at it for some while. I think the effort to build such a great system is so big. And trying to convince everybody now to switch to the new language is a huge challenge. But I want to be surprised. I want people to be, yeah, let's do this. But I will not hold my breath.

CRAIG BOX: All right. Well, thank you very much for joining us today, Kamil.

KAMIL POTREC: Thank you. It was a pleasure.

CRAIG BOX: You can find Kamil on LinkedIn as Kamil Potrec.

[MUSIC PLAYING]

CRAIG BOX: Thank you very much, Andrew, for helping me out with the show today.

ANDREW PHILLIPS: Thank you, Craig.

CRAIG BOX: If you enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter, @KubernetesPod, or reach us by email at kubernetespodcast@google.com.

ANDREW PHILLIPS: You can also check out the website at kubernetespodcast.com, where you will find transcripts and show notes as well as links to subscribe.

CRAIG BOX: I'll be back next week. So until then, thanks for listening.

[MUSIC PLAYING]

View More Episodes