#212 November 22, 2023

Confidential Computing, with Fabian Kammel

Hosts: Abdel Sghiouar, Kaslin Fields

Fabian Kammel is a Security Architect at ControlPlane, where he helps to make the (cloud-native) world a safer place. In his career, he continuously worked to bring hardware security and cloud-native security closer together. His past projects include:

  • A cloud-native PKIs for on-road vehicle services secured by enterprise HSMs
  • An always-encrypted Kubernetes distribution that harnesses the power of Confidential Computing
  • And more recently securing SPIFFE-based machine identities via hardware attestation.

Do you have something cool to share? Some questions? Let us know:

ABDEL SGHIOUAR: Hi, and welcome to the Kubernetes podcast from Google. I'm your host, Abdel Sghiouar.

KASLIN FIELDS: And I'm Kaslin Fields.

[MUSIC PLAYING]

ABDEL SGHIOUAR: In this episode, we chat with Fabian Kammel. Fabian is a security architect at ControlPlane working on confidential computing for the cloud native ecosystem. Stay tuned to listen to our conversation about confidential computing, standards, open projects, and what's going on in this space.

KASLIN FIELDS: But first, let's get to the news.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Istio have been recognized as the best open source software of 2023 by the Infoworld Bossie Award. Congratulations to the Istio team.

KASLIN FIELDS: On November 7, Google Cloud announced the general availability of GKE Enterprise, a premium new edition of Google Kubernetes engine, which integrates Enterprise-grade tools with a unified console experience. Notable features of GKE Enterprise include fleet team management, enabling platform engineers to easily group similar workloads into dedicated clusters, apply custom configurations and policy guardrails per fleet, isolate sensitive workloads, and delegate cluster management to other specific teams. Teams can run and manage their workloads, as well as view logs, resource utilization error rates, and other metrics, all scoped to their own set of clusters and namespaces.

ABDEL SGHIOUAR: On October 31, the Google Kubernetes Engine team also announced the general availability of the multi-cluster gateway controller for GKE, which natively supports the deployment of a unified application load balancer for a fleet of GKE clusters using the Kubernetes gateway API. With the multi-cluster gateway controller, GKE shops can implement sophisticated patterns such as blue/green deployments or geo-distributed applications, while keeping their most valuable assets protected with advanced security capabilities integrated with Google Cloud's application load balancer.

KASLIN FIELDS: Github's major conference, GitHub Universe 2023, took place November 8 and 9 in San Francisco, California. Announcements generally followed the industry trend of AI all the things, with a major focus on GitHub Copilot. GitHub announced general availability of GitHub copilot chat and previews of the new GitHub copilot enterprise offering, new AI-powered security features, and the GitHub copilot partner program.

ABDEL SGHIOUAR: Microsoft flagship conference, Microsoft Ignite, took place in Seattle November 14 to 17. Announcements from the Azure Kubernetes service team included fleet manager general availability, artifact streaming for Linux in preview, app routing general availability, confidential containers in preview, cost analysis add-on, and Azure cost management integration in preview, Microsoft copilot for Azure, and AI toolchain operator for hosting LLM models on AKS.

KASLIN FIELDS: The CNCF announced the establishment of Cloud Native Community Japan, a Japanese chapter of the CNCF. The new organization will serve as a hub for the Cloud Native Computing Foundation and Japanese community to promote cloud native and open source in the region.

ABDEL SGHIOUAR: Vitess announced the general availability of Vitess 18. The new version brings new enhancements to improve usability, performance, and MySQL compatibility. Vitess is a CNCF graduated project and an open source tool that can be used to run horizontally scalable, MySQL, and Percona server databases in cloud native environments.

KASLIN FIELDS: And that's the news.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Hi, everyone. Today we're talking to Fabian Kammel. Fabian is a security architect at ControlPlane. That's ControlPlane, similar to Kubernetes control plane, but one word with a capital P. Fabian works on securing cloud native architectures, mostly focusing on implementing hardware-based security. Some of the notable projects Fabian have worked on are things like PKI on-road vehicles-- which I'm very curious to understand what that actually means; always encrypted Kubernetes distributions; and, very recently, working on a SPIFFE-based machine identities via hardware attestation.

Welcome to the show, Fabian.

FABIAN KAMMEL: Thanks for having me. I'm super honored to be here.

ABDEL SGHIOUAR: Thank you for accepting my invitation. It was very last minute so thanks for being here. We wanted to chat with you because it was a very interesting article about confidential computing that you co-authored on the Kubernetes documentation. So let's start with a very basic question. What is confidential computing?

FABIAN KAMMEL: Right. Confidential computing, you see a lot more of, I feel, in the recent months. But this is basically a concept or a thing we did forever. So we had specialized hardware to have hardware-protected environments to shield, basically, super-sensitive computations we were doing. So think about your credit cards, or the SIM card in your phone, or maybe even stuff like TPMs or HSMs.

So every time we do a trusted computation in an untrusted environment-- and this is also why we generally refer to them as trusted execution environments. So confidential computing basically takes this concept and enables more use cases. It makes it available in consumer or server hardware so you can run your web server on it in a secure way. You're not going to run your web server on a smart card. Right?

ABDEL SGHIOUAR: Yeah, of course.

FABIAN KAMMEL: So as well as it also provides a remote attestation. And this is a super crucial feature where you can remotely verify that the environment you're running in is genuine. So most of the time today you don't have physical access to the machine you're running on, not only in the cloud, but also data centers are usually managed by some third-party provider.

ABDEL SGHIOUAR: Yeah. Or maybe you even buy a laptop, and you don't know where it's coming from.

FABIAN KAMMEL: Also, yes.

ABDEL SGHIOUAR: I think the idea of competition computing is not new because you mentioned TPMs-- and we're going to talk a little bit more about that. But before we go there, the article mentioned something called the Confidential Computing Consortium, which is a sister organization to the Linux Foundation. Right?

FABIAN KAMMEL: That's correct.

ABDEL SGHIOUAR: What is that?

FABIAN KAMMEL: Yeah, it's a group in the Linux Foundation ecosystem, and they basically try to work on this in a vendor-neutral way. First they have a white paper, which is super excellent to read if you're new to the topic. It will introduce you to a lot of the concepts, and the topics, and also how to make use of them, and how to securely run this stuff. But they also help to facilitate and grow open source projects in that space.

So for example, we have the confidential container project. I think we will talk about this a bit later in the show.

ABDEL SGHIOUAR: Just because I like the name, it's COCO.

FABIAN KAMMEL: It's awesome.

[LAUGHTER]

ABDEL SGHIOUAR: All right. So you mentioned quite a lot of things, and we will try to unpack as much of these things as possible. I did skim through the white paper. It does a very good job at explaining some basic concepts. We'll try to explain some of them today, but if we don't get to explore all of them, I really highly recommend people to review the white paper. So let's try with the TEE, trusted execution environment. What is that?

FABIAN KAMMEL: So TEE is very loosely defined term. It's just any hardware protected environment where any outside observer is not able to see the data you're basically working with or working on. They cannot watch what you're doing, and they cannot manipulate what you're doing. Everyone outside of the TEE doesn't know about the data you have.

So the TEE is very differently defined in the different contexts. So we have, for example, Intel SGX, which is one implementation of a TEE which is process-based. So in the same operating system, you have lots of different processes doing work, and you can request the interchip to create a new process that is memory encrypted from the rest of the system.

You have other technology out there which defines the TEE a bit broader, like with AMD SNP, for example, you get a whole confidential virtual machine with a full virtual machine shielded from other virtual machines on the same host CPU.

ABDEL SGHIOUAR: And is this supposed to be transparent to the application? Is the application supposed to not know it's running inside the TEE?

FABIAN KAMMEL: Yes, exactly. Intel SGX is process-based so you don't have access to the operating system, and you usually need to have an SDK and port your application to that SDK to make it work inside Intel SGX. But if you run into, in a confidential virtual machine context, it's really transparent. You just lift and shift your application. You take your database. You take your web server, whatever, and you run it in this confidential virtual machine.

ABDEL SGHIOUAR: Yeah, and I guess that's essentially what most cloud providers are doing today is providing CVMs, cloud virtual machines.

FABIAN KAMMEL: Exactly.

ABDEL SGHIOUAR: --well, confidential virtual machines-- sorry.

FABIAN KAMMEL: The popularity of the CVMs is a lot bigger because they're easier to use, but they also have different security assumptions. You need to trust all the processes running in the same CVM with you.

ABDEL SGHIOUAR: Yes, they're management software, whatever that management software is, I guess.

FABIAN KAMMEL: Yeah.

ABDEL SGHIOUAR: OK, before we move on, we'll leave some time to deep into the CVMs. Also the article mentions enclaves. So what are enclaves, and how different are they from TEEs?

FABIAN KAMMEL: Enclaves are basically just one implementation of the TEE. You've most often heard the term enclave when it comes to Intel SGX. INTEL SGX creates this isolated process which is memory encrypted, and this memory encrypted process or environment is called an enclave.

So you can, as a normal process on an operating system, you can reach out to the Intel CPU and say hey, I need one of those enclaves to do sensitive computation. You get this process which is memory encrypted, and you can load in both code and data. You can choose to load it when it's encrypted, and only have it decrypted inside this enclave, and then run your sensitive computations in there.

ABDEL SGHIOUAR: And then commit it back to memory, and then potentially persist it back to disk.

FABIAN KAMMEL: Right. You can encrypt it again before you send it out of the enclave, and then store it securely somewhere.

ABDEL SGHIOUAR: OK. So I'm thinking here with my very, very old Linux hat, sysadmin-- what about swap memory? Because that's technically disk-based. Right?

FABIAN KAMMEL: Yeah.

ABDEL SGHIOUAR: So would that be considered in a enclave or in a TEE memory as well?

FABIAN KAMMEL: I don't know about the specifics. There are certain ways on how to flag certain memory pages of being encrypted and not being encrypted and so on. So you really have to look into the details on which parts of your application are secured.

ABDEL SGHIOUAR: I see. OK. I was just thinking about it because we were doing an episode about Kubernetes 1.28, and Kubernetes notoriously was actually not very a fan of having swapped on nodes. But they just enabled this very recently. So I was very happy when I saw that. I was like, yes, finally, Linux coming back. But yeah, anyway, I think, as you said, it depends on the implementation.

So OK, TEEs are a generic, loosely defined term. Enclaves is one implementation by Intel. But then you also mentioned in the article, but also in the conversation, HSM and TPMs. So HSM stands for hardware security modules, and TPMs are--

FABIAN KAMMEL: Trusted platform modules.

ABDEL SGHIOUAR: Yes. So how different are these new modern patterns from? Because HSM and TPMs existed for a while, right?

FABIAN KAMMEL: Yeah, for quite a while, actually. TPMs are just common place. Every main board that is built into a server, a desktop, a laptop, whatever, it has a TPM. These things are super cheap, like few cents maybe, but there are also super limited in their capabilities. They only have kilobytes of memory. They can store a few cryptographic keys, which usually are used to give your machine an identity, and then you have a few cryptographic operations that can carry out, maybe do a signature or an encryption operation, stuff like that, or generate a new key.

So again, these are not programmable. You can't run your web server in a TPM. On the other hand, we have HSMs, and these are hardware security modules. These are expensive, like 10K and up. But these come with quite powerful features. And I used these in the past when I did PKIs for automotive, and you store the root of trust for your root certificate, like the very crown jewels of your cryptographic keys in these HSMs.

Because they are really worth it to spend this much money to have them secure. So HSMs have physical tamper protection. If someone tries to break open the server, and steal the disk, or steal the RAM or whatever to read out the keys, the HSM will know, shut down, null all the memory so no one can steal your data.

ABDEL SGHIOUAR: Oh, interesting.

FABIAN KAMMEL: But again, this is way too expensive to run your average workload and make it more secure. So what these new primitives do is basically allow you to run arbitrary code, get a good performance on a standard server CPU price.

ABDEL SGHIOUAR: I see. There are a lot to unpack actually about HSMs, which I think that people who are using HSMs in the cloud, they just think about them, OK, well, I store my keys in HSM, and then I use HSM through typically Cloud API to do my encryption decryption, signature, operations. When you are using a cloud-based HSM or even a physical HSM, I guess that the cryptographic operations happen in the HSM itself, right?

FABIAN KAMMEL: Exactly.

ABDEL SGHIOUAR: So you would send your encrypted data to the HSM. The decryption operation, whatever operation happens inside the HSM itself. So there is a certain level of protection. And then you will receive the unencrypted data.

FABIAN KAMMEL: Yeah, this is one use case. You can think about having a certificate signing request, and you send this to some PKI provider, and they will have like some sort of HSM at the back end, and the HSM will do the cryptographic operation to issue the new certificate for you. Other cloud providers use this, for example, for key derivation. So if you do disk encryption and you want to encrypt all of your disks, you have one master key in the HSMs, and you just derive new keys from that master key to decrypt or encrypt all the disks you want to attach.

ABDEL SGHIOUAR: Yeah. It's called dual envelope encryption or something like that, or envelope encryption. Yeah, that's very common. OK, that's actually quite interesting.

I remember watching the videos of the-- I don't know if you know this, but the root certificates for the top level domains, they have a ceremony to be able to generate them. And they do usually use HSMs. They are just smaller, form factor. They're just like a USB stick or something like that, right?

FABIAN KAMMEL: Yes, exactly. And you can also buy them with a super security key that are like these USB form factor HSMs. So if you want to store your PGP keys or whatever, you can get them from YubiKey or other popular vendors.

ABDEL SGHIOUAR: Yeah. I was about to mention, like YubiKey are technically HSMs, right?

FABIAN KAMMEL: Yeah.

ABDEL SGHIOUAR: Cool. So then moving forward, I guess. Because I think in terms of probably if we want to talk about this from a chronological point of view, HSM, TPMs, TEEs is kind of somehow modern as a concept, and then cloud virtual machines or confidential virtual machines-- it's confusing. I think that CVM should be called CCVM-- cloud confidential virtual machines.

FABIAN KAMMEL: But they are not native to the cloud. So you can just get the AMD or Intel chip or whatever enables it, just run it in your local data center or at home, if you have the funds to do so.

ABDEL SGHIOUAR: Sure. OK, that's probably because I work in cloud. OK.

So confidential virtual machines. What are those? What do we mean when we say a CVM?

FABIAN KAMMEL: Classical virtual machine. If you think of the stack of all the components that have to come together to create your virtual machine. So you have the CPU that runs some sort of firmware which controls the microcode. You have the host operating system that runs the hypervisor. And then you ask the hypervisor to create my virtual machine, which has, again, some sort of firmware, emulated maybe, bootloader, kernel. And there's all this stuff going in. And you need to trust all these different components to not be malicious and infer, do some malicious stuff with the virtual machine you want to create.

So confidential virtual machines basically allow you to then request the CPU directly to create this memory encrypted virtual machine. And this basically isolates you from the rest of the system. It isolates you even from the hypervisor. The hypervisor is no longer able to look into your virtual machine because it's memory-encrypted, and the hypervisor doesn't have access to either look into the memory pages or get the decryption key.

ABDEL SGHIOUAR: So in a context of a typical hypervisor where the memory that the virtual machine gets is mapped out of the physical memory of the host, and then it's remapped through the hypervisor to look like one big memory page, or multiple memory pages, or whatever. So would the encryption happen at the hypervisor level or is it at the virtual machine-- I don't know if you are guessing what I'm trying to get at with my question.

FABIAN KAMMEL: Yes, definitely. So where does the encryption basically happen? It happens in the CPU. So if you look at the AMD example, so the AMD technology for this is SEV-NSP. SNP stands for secure nested paging so this is actually exactly what you mentioned. So the CPU will memory encrypt the memory pages. So the hypervisor doesn't have access to them, and only the virtual machine will get a clear view of the memory pages.

ABDEL SGHIOUAR: OK. And in this case, where would typically the encryption keys would be stored?

FABIAN KAMMEL: So they reside in the CPU in secure registers, and they can be retrieved. The CPU just uses them for encryption operations to keep your memory pages safe and shielded from the rest of the system, but they never leave the CPU itself.

ABDEL SGHIOUAR: But then how would the CPU pass those encryption keys to the virtual machine? Because the virtual machine-- or the guest OS will have to decrypt or--

FABIAN KAMMEL: It's aware of the context. It can make these memory pages available in plain view of your virtual machine, but not to the host, or the hypervisor, or other guests running on the same system.

ABDEL SGHIOUAR: I see. So that's a way to keep the memory pages encrypted on the physical host across the guests, essentially.

FABIAN KAMMEL: Yes.

ABDEL SGHIOUAR: Interesting. I think that the nested virtualization makes it a little bit complicated to understand where the encryption happens exactly.

FABIAN KAMMEL: Yeah, it also required a lot of code changes in a lot of places. So the hypervisor, and the client, and everyone needs to be aware of that new paradigm. So it took quite a while to actually get the full support for this new type of VMs. But since this year, you can actually use them in most of the cloud providers.

ABDEL SGHIOUAR: And I assume also most of the common upstream hypervisor technologies, like VMware and whatever.

FABIAN KAMMEL: Yeah.

ABDEL SGHIOUAR: OK, cool. So I think that clarifies quite a bit what a confidential virtual machine is. And I think in the context of a cloud virtual machine, or a cloud confidential virtual machine, that essentially means that, as a customer, I know that my virtual machine which sits typically on a multi-tenant environment is shielded from the other tenants. Right?

FABIAN KAMMEL: Exactly. So it's shielded from a lot of potentially malicious parties. We saw attacks in the past of malicious other virtual machines that try to listen in on what is going on the CPU, or even attack the hypervisor to get access to other VMs. It's shielded from the hypervisor itself, which would be the cloud provider. It's no longer able to look into your virtual machine-- not sure how often they do it, but they definitely have the capability to do it. And you're also shielded from attackers that are running in the network of the cloud provider, for example.

ABDEL SGHIOUAR: Yeah, I was about to ask that as a follow-up question. So what about the network part? Because eventually, as a virtual machine, you will have to send some packets through the network, which in a cloud environment, is typically a shared environment.

FABIAN KAMMEL: Yes, exactly. And this is why it's super critical to have all the encryption pieces in place. We often talk about this triad of encryption you want to have in your application. So you have the encryption at rest, so you want to have disk encryption if something is written to the disk, and someone steals the disk, they cannot read out the information.

You also want to have the encryption in transit. This is why you want to use TLS or, even better, mutual TLS, if you know both parties of the conversation. But confidential computing is now giving you to also have encryption in use. When you use the data, when you have it in memory, people outside of this TEE, so other VMs or the hypervisor, are not able to read the data you're handling. So you can really put very sensitive data in the cloud and be sure no one can read those.

ABDEL SGHIOUAR: Yeah. So we'll dive a little bit more into the CIA triads. But before that, in the article as I was reading through it, there was this mention of something called TCB, trusted computing base. What is that?

FABIAN KAMMEL: So this is basically a fancy term to mean every component that is critical to the security of your whole system. So we looked at the stack, every software component that needs to come together to produce this virtual machine you want to run. So if we only attack one piece of the puzzle, if we attack the hypervisor, we can probably break the security of your virtual machine. If we attack the firmware of the CPU, probably attack your virtual machine, and so on. It's the bootloader. It's the kernel. It's all the different components that are going on.

And confidential computing is helping us twofold, with the memory encryption, which is an important part to get rid of the hypervisor, but also the parts that are running in the same TEE. So our bootloader, or our operating system, we have this feature called remote attestation. And this is quite cool. This is like measured boot, but on steroids.

So you're measuring the memory pages as your VM boots up, and you can store a hash of all the memory pages involved in the boot process of your virtual machine and store them in special registers. And these registers are not controlled directly. You feed in some data. They get hashed and appended to whatever is there so no one can control the value directly. But you can feed in measurements or information about the state of the system. And you can then request an attestation report, which contains all of these measurements, from the outside of the virtual machine.

So before you put your sensitive data into this VM, or this CVM, you can check, is it actually running the software I'm expecting it to run. And only if it's good, and no one tampered with my bootloader or my operating system, I will explicitly trust that CVM to be good. And then I can authorize it to run with my sensitive data.

ABDEL SGHIOUAR: I see. So there is a lot to unpack there. When you were talking about during boot time, hashing the memory, is that a way to ensure that the bootloader have not tampered with the virtual machine such a way that it makes it read data from a memory page, where an attacker would put some random stuff? Is that the case?

FABIAN KAMMEL: Yes, exactly. So you could imagine, someone attacks your bootloader to send passwords to some remote server or whatever, like put in a backdoor to SSH into your system. But this is a change to the code, to the bootloader itself. And if we measure the bootloader before it's being used in our boot chain, and we make this information available, we can actually compare, is this the bootloader we expected to run? And if it's not, we will just not make our sensitive data available to the CVM.

ABDEL SGHIOUAR: OK. That's actually a very clever way of doing it, like you have a base to know-- I guess that's probably where TCB comes from. You have a base to know how the bootloader looks like at boot, and then when you run it in a non-trusted environment, in a cloud environment, then you have a report of how this actual bootloader is behaving.

FABIAN KAMMEL: Yes.

ABDEL SGHIOUAR: OK, interesting. Because, as we were talking, I was thinking, that's one of the most common attack vectors, which is expand memory, make the bootloader read something from somewhere where you would have a backdoor, or you would have a SSH key, or something like that.

FABIAN KAMMEL: Yes, exactly. And you can use this concept of the remote attestation for a lot of different purposes. So maybe you want to have an audit log of all the systems you fed your software to. You can even store these memory measurements for future audits. Or you can use them for explicitly authorizing or not authorizing a system to access your data.

ABDEL SGHIOUAR: So this is in the context of something like coregulated industry or something, in case you need--

FABIAN KAMMEL: Yes, exactly.

ABDEL SGHIOUAR: --in case you need access to that information.

OK. So that's partially what remote attestation does, if I get it correctly, right?

FABIAN KAMMEL: Mhm.

ABDEL SGHIOUAR: Is there anything else-- I mean, for me, the first time I heard about remote attestation was in the context of signing an image, for example, before it's deployed to a target system-- by image, I mean a container image before it's deployed to a Kubernetes cluster. So you have Kubernetes cluster. The Kubernetes cluster has a bunch of keys, or has basically a remote CA. And then you would sign your images with certificates coming from that CA. You would deploy them to Kubernetes. Kubernetes would reach out to verify, have these images been attested. Which, I guess, is the same concept.

FABIAN KAMMEL: Exactly. It's a similar concept. So also the remote attestation report you get from the confidential VM, this is signed usually with x.509 certificate chain. And this is the same thing we also do for software binaries for container images. So we have a digest we expect to run, and we have it signed so we can cryptographically verify it and tie it back to some root of trust. For CVMs, this would be AMD, or the chip vendor. For software or container image, it's usually the software engineer or the release engineer who produced the binary.

ABDEL SGHIOUAR: Yeah, so a way to cryptographically verify the origin of a certain piece of artifact, essentially.

FABIAN KAMMEL: Yes.

ABDEL SGHIOUAR: Cool. We talked very briefly about the CIA triad, which I'll briefly mention. CIA triad in security means confidentiality, integrity, availability, which are the three main things that you need to ensure exist to say that a system is secure. I'm dumbing this out quite a lot. So can you give us a little bit more-- can you shed light on CIA and how it plays out in the context of TEEs?

FABIAN KAMMEL: Yeah, definitely. So, as you mentioned, this acronym, or these three pieces is usually a good frame or framework to start to think about the security of your system. You want to keep the information confidential. Again, we talked about this a lot. We have memory encryption. Our sensitive data inside the trusted execution environment is confidential from everyone outside of the trusted execution environment. It's called confidential computing after all.

So integrity protection, we have multiple levels of integrity protections. So we talked about attestation, and we can basically verify the integrity of the system remotely. But we also talked about the secure nested paging feature of the AMD chips. So it has not only memory encrypted pages, but these are also integrity protected. They achieve this by using AES, so the advanced encryption standard, in a particular mode called Galois/counter mode for all the crypto nerds out there, which is achieving both confidentiality and integrity in one go.

So if someone messes with the memory pages, even if they can't read it, they try to flip a bit or something to attack your application, the processor will actually notice that someone tried to manipulate the memory page. So a lot of integrity checks going on.

ABDEL SGHIOUAR: I see.

FABIAN KAMMEL: Availability is an interesting one. Imagine you rent out a confidential virtual machine from Google Cloud, and let's imagine it has the property of availability. It guarantees availability to the user of the confidential virtual machine. And I'll just take back my credit card, and Google is not able to shut down the CVM. This is not going to fly.

ABDEL SGHIOUAR: Of course.

FABIAN KAMMEL: The hypervisor still has full control about the state of the virtual machine. Is it running? Is it stopped? Is it destroyed? But it cannot look into the contents of the VM.

ABDEL SGHIOUAR: I see. In the article, it explicitly said that availability is pretty much outside of the control of confidential computing because it is something that you typically want to have the cloud provider ensure. The example you mentioned of credit card is probably an extreme example. The one I was thinking about is like, what if the HSM storing the keys that act as the source of truth, or as the root CA are not available anymore-- available from availability point of view, right? So how do you ensure the CVM will boot and get encrypted and decrypted the way it's supposed to?

FABIAN KAMMEL: Yeah, exactly. So I think the point in the article we tried to make was availability is really a concern of the cloud provider. They want to have good availability and SLAs to get customers. So it's really nothing that confidential computing needs to tackle, like the cloud provider will make sure that stuff is up and running, if they want to keep you as a customer.

ABDEL SGHIOUAR: Yeah, of course. I mean, it's in their interest, right?

So let's jump boats a little bit. Because this is, after all, the Kubernetes Podcast, and I think a lot of people will care about. So all this stuff you've been discussing, how does it play out in the context of Kubernetes itself?

FABIAN KAMMEL: Right. There are a lot of interesting use cases actually. So this totally depends on the scenario and who do you trust inside your Kubernetes cluster? So we mentioned, you could just use confidential virtual machines. They are available on most public cloud providers today, even GA or like private preview, and you can just spawn your Kubernetes control plane, but also all the worker nodes inside a confidential VM instead of a classical one. And this will give you all the good properties we discussed today.

You have the memory encryption part. You can verify that the correct operating system and so on, was booted. And this will basically wrap your complete cluster and shield it from other folks inside the same cloud provider and inside this multi-tenant environment, but also the cloud provider itself, and attackers on the network. So you can be sure that you're inside your own context.

There might be other scenarios where you either don't trust your administrator, or your administrator being the admin of your Kubernetes cluster. So when you have a sensitive workload you want to deploy inside the Kubernetes cluster, the admin might not have authorization to actually see the data you want to process. So here it gets a little bit more complicated.

Just moving one worker node inside a CVM is a bit problematic because you always have this kubelet running in the CVM, which basically gives the admin full code execution in your confidential virtual machine, which is the compute node. So you have gained basically nothing, in terms of security.

ABDEL SGHIOUAR: Yeah. So I was about to ask, as a follow-up question, because the way you mentioned CVM in Kubernetes context is, I am using Kubernetes clusters on top of a third-party platform. I use a confidential virtual machine to make sure my worker nodes and potentially even my control plane runs inside the trusted environment. And then I can ensure that the cloud provider does not have access to my environment.

However, in a multi-tenant environment where you are running one cluster for multiple tenants, then it becomes challenging with the example you mentioned, which is the kubelet but also, while the virtual machines are still talking to a control plane, which is multi-tenant, across all the cluster.

FABIAN KAMMEL: Yeah. So if you think about how our major cloud providers basically running Kubernetes as a service for you, usually they put a lot of things together to save on costs, like have a multi-tenant database or control plane. So just wrapping that inside a confidential virtual machine context-- again, this shields you from attackers outside of that context. So that is good. There is a security benefit. But it won't shield you from everyone who is inside the same security context.

So if you move your Kubernetes cluster inside, everyone who's inside the Kubernetes cluster, again, it's just transparent for them. They are just shielded from the outside world, not from other workloads in the same Kubernetes cluster.

ABDEL SGHIOUAR: Exactly. So the example that comes to mind maybe is something like storing secrets in etcd. Because the database would be multi-tenant across all the tenants. So is there anything in CVM that is trying to address this? Or it's more like-- because this sounds like a Kubernetes problem, really, not a virtual machine problem.

FABIAN KAMMEL: Yeah. As you said, it's more a Kubernetes problem. It ties back to the architecture that Kubernetes chose. You could try and bring in enclaves again. As we mentioned at the beginning of the show, these are process level-based isolations. If you are able to just move the ectd into its own process and isolate that from the rest of the system, again, you would gain some sort of security guarantees. So everyone who's not running inside the process will not be able to read the memory directly, but has to interface with the etcd API.

But as with security, you have to look at the use case. You have to look at the threat model. And then find a solution that works for you.

ABDEL SGHIOUAR: Yeah, obviously, I was just trying to get your thoughts on that specific topic. It feels like CVMs in the context of Kubernetes will play out very nicely if you are a single tenant on a single cluster, and you're shielded yourself away from the platform in which you are running those virtual machines, but not from same tenants on the same-- well, maybe tenants on the same cluster if you want to guarantee some sort of isolation at the virtual machine level. But still you have a shared control plane, which is-- We'll see, I guess, how this progresses.

FABIAN KAMMEL: Exactly.

ABDEL SGHIOUAR: So then let's talk about this thing that has my favorite name, COCO-- confidential containers. What is this?

FABIAN KAMMEL: Yes, exactly. And they actually try to address some of the concerns we mentioned just earlier. COCO, or confidential containers, is basically an interface that provides Kubernetes with the known pod interface so it can basically schedule a confidential container for you in Kubernetes. But in the background it uses all the cryptographic or the confidential primitives we discussed today.

So COCO can use confidential VM and just deploy your one pod in that confidential VM context, but have it scheduled by Kubernetes as it would have been like a plain old pod.

ABDEL SGHIOUAR: OK. So it's a VM inside the container, essentially.

FABIAN KAMMEL: Yes.

ABDEL SGHIOUAR: OK. That's pretty straightforward to understand. I think COCO is not the only project trying to do this-- by this, I mean VMs inside containers. Just COCO is adding the confidential aspect of it. Because Kata containers or-- what's that AWS project--

FABIAN KAMMEL: Firecracker.

ABDEL SGHIOUAR: Sorry, Fabian-- Firecracker OS, thank you. It's doing the same thing. It's a virtual machine inside the container. COCO is just adding the confidentiality layer to it, essentially.

FABIAN KAMMEL: Yeah, exactly. So they are able to fall back onto this remote attestation report and bring in all the information that is available there, do additional sorts of verification, and give you these additional security guarantees.

ABDEL SGHIOUAR: Yeah. I could see that being a very valid use case for still using somebody's else Kubernetes cluster for jobs, where you have a job that you need to execute. It has a finite amount of time it will be running at. And basically, you're exposing API server-- I mean, a Kubernetes API server-- you just send your job definition, and then you use COCO to say, run my job inside the confidential computing, which is a virtual machine, and then get back my data, but also the report of how-- where is this thing running.

FABIAN KAMMEL: Yeah, exactly. A lot of the work that COCO folks are doing is basically figuring out which parties do you need to trust. And again, this kubelet comes into play. The Kubernetes control plane basically has full control over all the execution that is carried out in your worker nodes. So they're working very hard to get a good security story for confidential containers in a somewhat untrusted Kubernetes cluster.

ABDEL SGHIOUAR: We'll see. We'll add some links to the projects. I think it will be interesting to keep an eye on to see how that progresses.

OK, we talked quite extensively about confidential virtual machines. I think the first thing that comes to mind, as we are discussing, is the overhead. I mean, this must have an overhead. But instantiating a virtual machine with a full encrypted memory, there is some additional processing that needs to be spared to do this. Is the overhead so significant that it actually makes this difficult to adopt to justify from a cost perspective?

FABIAN KAMMEL: I don't think it makes it difficult to justify. So there's definitely some sort of overhead. We don't have a lot of good, real-world benchmarks to actually put numbers to it. We have some vendor benchmarks which are in the range of like 1% to 5%, depending on if you're a heavy disk usage or so, it might go up to like 5% or 10%. But if you weigh the cost of 10% additional compute overhead to the security gains, I think it's well worth it for most applications, and especially, for applications that handle sensitive data.

So we see a lot of machine learning is getting more and more popular. And you need to have the compute to train your models. And if you want to train on stuff like medical data, or financial data, and all this sensitive information, you really need to have a trusted environment to carry out these computations. And then 10% overhead, I think, is fine, a good price to pay.

ABDEL SGHIOUAR: Yeah, again, as you said, it always depends on the context. Maybe 10% in a very high sensitive environment, it's justifiable because the benefit you get from it outweighs the cost. Cool.

Then, before we close out, what are other use cases that we have for confidential computing? We've been discussing quite a lot, Kubernetes itself, but are there any other use cases?

FABIAN KAMMEL: Yeah, definitely. And I'm super keen on talking to folks on how to get confidential computing in your use case and basically take advantage of it. I've previously spoken at sigstore.com end of last year where I presented an idea on how to use confidential computing to basically have verifiable build runners in the cloud. So when you think of GitHub actions, you want to know which runner you are using, and does the runner use a good set of compiler toolchain and operating system. Is this all actually genuine compiler toolchain? Or has someone put in the backdoor into my compiler?

So if I use this feature of remote attestation, I can basically store this attestation report with my build, and my final binary, and so on, and check that no one put in a backdoor in the compiler when my program was built.

ABDEL SGHIOUAR: Oh, interesting. OK. I think that that's a very interesting use case, which is basically tied to what we've been discussing, is running your code on untrusted, between codes environments, and being able to have a way to cryptographically verify what's going on there.

FABIAN KAMMEL: Yes, exactly.

ABDEL SGHIOUAR: Right. Cool.

There is a lot of things that we have been talking about, I guess, that people will have to do a lot of googling to understand the terminologies. Before we close this out, I would like to ask you about PKI on on-road vehicles. What's that?

FABIAN KAMMEL: So PKI is the public key infrastructure.

ABDEL SGHIOUAR: Key infrastructure.

FABIAN KAMMEL: This is basically the public private key pairs you need for the scenarios of certificates, and signing, all the asymmetric cryptography you want to do. So like cars, in the past, were not very smart, but in the future, we could imagine our cars communicating with each other, or communicating with road signs. If there's construction work going on, you want to warn your car, there's a construction heading your way, like a construction site. You want to slow down.

But you also want to make sure that this is secure. Like an attacker could just send a random warning notification and cause total mayhem on the streets.

ABDEL SGHIOUAR: Of course, yeah.

FABIAN KAMMEL: So in order to enable all of these use cases, you need identities. This is basically the foundation for every cryptographic system you want to build. So we build a PKI, give certificates or identities to each car that is on the road. We did some tests on actually the German Autobahn to have roads communicate with the road stop signs, and construction signs, and so on. Yeah, all of this is in place and getting continuously developed. And hopefully, in future, we have autonomous cars talking securely with each other with persons on the streets, with the road signs, everything interconnected. And this is the enabling tech stack for that.

ABDEL SGHIOUAR: As you were talking, I thought that that's super interesting use case. Because most of the cars, as you said, today are connected. And the question is, even as a car provider, how do you actually trust that the car that is trying to do an OTA and over-the-air upgrade is a car that you have issued. Before you had to take your car to the garage, and then they had to hook a physical machine to it and verify the signature somehow.

FABIAN KAMMEL: Yeah, exactly. Just by providing your car with an identity, the car can reach out to the manufacturer, say, I am this type of car from this manufacturer. Please give me my software update. And you can be sure that this car is actually genuine. It will receive the correct update, and so on.

ABDEL SGHIOUAR: Yeah.

FABIAN KAMMEL: You also have a lot of anonymity and pseudonymity concerns. You don't want to be tracked driving around. So actually, you need a lot of road sign certificates to keep the privacy of the end user and so on. So lots of exciting challenges in that space.

ABDEL SGHIOUAR: Yeah, that's definitely a lot of interesting things to keep an eye on. I'm very curious to see how this plays out. And I'm very curious to see how somebody would show up at DevCon 10 years from now and show how they managed to defeat all of this. So it'll be cool.

FABIAN KAMMEL: Definitely. Looking forward.

ABDEL SGHIOUAR: All right. Well, thank you very much, Fabian. It was fantastic talking to you.

FABIAN KAMMEL: Yeah. Thanks for having me.

ABDEL SGHIOUAR: I guess when I reached out to you, one of the first things you said, Oh yeah, I'm a long-time listener. And thanks for just accepting to talk to us on such a short notice.

FABIAN KAMMEL: Thank you so much.

ABDEL SGHIOUAR: Thank you very much for joining us for the show. We've been discussing with Fabian. You can find him on LinkedIn, Twitter at dotash18 or GitHub at dotash. Thank you very much.

[MUSIC PLAYING]

Welcome back to the show, Kaslin.

KASLIN FIELDS: Thank you. It's good to be back.

ABDEL SGHIOUAR: We missed you. You've been busy.

KASLIN FIELDS: Yeah, I had a great time being in Japan for a lot of that.

ABDEL SGHIOUAR: Awesome.

KASLIN FIELDS: It's been a busy time though.

ABDEL SGHIOUAR: I guess, by the time this episode aired, people probably would have figured out that you have conducted actually part of the 210 episodes in Japanese when you were in WasmCom.

KASLIN FIELDS: Hopefully. I'll probably not keep that secret for very long because I'm extremely excited about that. I hope you all listened to the WasmCom episode because I did an interview in Japanese, and I'm very proud of it. And also, Chloe Condon helped do the dub. Thank you so much to Chloe Condon for doing that.

ABDEL SGHIOUAR: Yeah, it was amazing.

KASLIN FIELDS: And also big thanks to my teammate Brian Dorsey for helping with the translation, like to check that my translation was right. Because I performed the interview in Japanese, and I translated it myself.

ABDEL SGHIOUAR: Yeah, it was awesome. I don't understand Japanese so I was just waiting for the English part.

KASLIN FIELDS: Hopefully you liked it.

ABDEL SGHIOUAR: It was, I mean, being able to conduct an interview in a foreign language is not easy, right? So we'll see. Maybe I do some parts in French when we go to Kubecon Europe in Paris.

KASLIN FIELDS: And there's also going to be some more interviews conducted in a foreign language for our Kubecon episode. I suppose it doesn't count as a foreign language if it's sign language.

ABDEL SGHIOUAR: Well, it's a language. It's a way of communicating--

KASLIN FIELDS: --through an interpreter. Very excited about that.

ABDEL SGHIOUAR: Awesome.

KASLIN FIELDS: For all of you who made it to this point in the episode, that's a preview for you.

ABDEL SGHIOUAR: Yes.

KASLIN FIELDS: But let's talk about confidential computing.

ABDEL SGHIOUAR: Sure.

KASLIN FIELDS: So thank you so much, Abdel, for this interview. This is a topic that I keep hearing, of course, because security is always popular, and important, and great to cover. So I hear about confidential computing every now and then. Honestly, it just sounds great as a term.

But I've never really looked into what it means beyond the very surface level of like, you don't let people see into your compute. But definitely got a lot more detail on that in this conversation. There were a lot of concepts that were new to me, like the trusted execution environment, which sounds like it's more of a process level form of the isolation. Whereas what I've heard most about probably is the confidential virtual machines.

And then y'all also talked about a whole bunch of other stuff, like memory-protected environments by Intel. And you also mentioned swap memory. So there were a lot of different technologies that you talked about.

ABDEL SGHIOUAR: Yeah, definitely. It's a very fascinating space due to the fact that it all boils down to a bunch of very basic requirements. That's what we mentioned in the episode, the CIA triad, which is the holy grail of security. But let's start with the TEE, so trusted execution environments. Those things existed for a very long time. So computers, PCs had these things called TPMs, which is the trusted platform modules, which is basically where your cryptographic keys are stored. And that's technically a form of TEE.

So it's not new per se. It's something that existed for a while. I think what's fascinating in this space is trying to adjust those kind of capabilities into the cloud environment.

KASLIN FIELDS: Yeah. And it sounds like there's a variety of solutions for those in the know. But it sounds like the confidential virtual machines are the easiest to use. So pretty much all of the cloud providers have some kind of solution there. And they're easy to use because you get into the VM, and that's the confidential part is whatever is inside of the virtual machine is its own space.

ABDEL SGHIOUAR: Exactly. So it's like, you can think about it as on a high level process isolation, but at the VM level. So even the hypervisor is not aware of what's going on. So it's fully encrypted. And then you have cryptographic way of verifying the integrity of the virtual machine itself. That's the attestation part. So it's a pretty interesting space.

I mean, we work for a cloud provider so for us we don't really think about it that much. But I think if you are a customer, and if you're operating in certain sensitive areas-- I'm thinking like defense, or health care-- basically not trusting your cloud provider is quite natural, I guess, in a way. You're running on somebody else's hardware. So how do you make sure that whoever you are running your stuff on doesn't look into your data.

KASLIN FIELDS: Yeah. Always an issue that folks want to think about.

ABDEL SGHIOUAR: Exactly.

KASLIN FIELDS: And trusted execution environments versus confidential virtual machines, the different ways that they kind of isolate your workloads, it just reminds me so much of containers. Because containers can be anything from just basically a process that's running in Linux with a little bit of stuff around it to kind of isolate it, to something that's basically a virtual machine running within the machine.

So it's kind of the same thing, but with a special focus on security.

ABDEL SGHIOUAR: Yeah, pretty much. That's one way of looking at it as well.

KASLIN FIELDS: Another part that you all talked about that I thought was really interesting is the TPMs versus HSMs. Also, I thought TPM at first was TPU because I've just been having a lot of conversations about TPUs and GPUs and how they affect AI workloads, of course, because we're all talking about that kind of stuff. But no, it's not TPUs. It has nothing to do with AI really.

It's trusted platform module and hardware security modules. So one thing that I caught there that I think is a nice tidbit is a TPM, a trusted platform module, are super commonplace, they're relatively cheap to implement. But there's only so much that they can do. They only have a few kilobytes of memory . They can only hold a few keys. And they can only do a few different types of cryptographic operations.

However, the hardware security module sounds intense.

ABDEL SGHIOUAR: Yeah.

FABIAN KAMMEL: They're really expensive. They've got features in place to help prevent even folks from trying to steal the hardware itself. So they sound very security.

ABDEL SGHIOUAR: Yeah. That's what they call usually anti-tampering measures. You can think about it as like, if you watch any American action movies, anti-tampering is basically having a little bag of dye inside an envelope of money. And if somebody tries to steal the money, it will dye all the notes. It's kind of the same concept.

KASLIN FIELDS: Like on clothes too, those tags that they put on clothes that have the ink in them.

ABDEL SGHIOUAR: Yes, exactly. It's the same concept, except like it costs more millions of dollars, I guess.

KASLIN FIELDS: Yeah, and also involves hardware and destroying the stuff on it.

ABDEL SGHIOUAR: Exactly. I just remember while we were talking, when I was working in data centers, we had to install HSMs because-- for reasons. Before I go there, one thing that people, I think, don't think about it too much is that they think about HSMs as just devices that store and supply keys. They actually do more than that. They can execute the cryptographic operation inside the hardware itself.

KASLIN FIELDS: Yeah. You don't mention that in the interview.

ABDEL SGHIOUAR: Because, of course, if you have the keys stored in the HSM, the last thing you want to do is take them out of the HSM so you can execute the operation. You want them to be-- you want to execute the cryptographic information close to where the key is.

So that's why they are usually beefy machines with a lot of CPU, a lot of memory, et cetera.

KASLIN FIELDS: Wow. It's like a security box.

ABDEL SGHIOUAR: Exactly. It's a secure box. But what I was saying is that I remember when I was working in data centers, we had to install those things, and then you hold it in your hands and you try to put it in a rack, and they're like, oh, suddenly you're reminded this is like $2 million.

KASLIN FIELDS: Oh!

ABDEL SGHIOUAR: Like one piece of hardware.

KASLIN FIELDS: Wow. I had never really thought about-- the security needs of our industry are so great that someone-- many people had to come together and think about all of these tricky ways that we can ensure that things are what they say they are, and just put all of that into a single box even. So hardware security modules kind of blew my mind here.

ABDEL SGHIOUAR: Yeah, it's a very fascinating space.

KASLIN FIELDS: Yeah. And speaking of making sure that things are what they say they are, another thing that you all talked about quite a bit was the PKIs for autonomous vehicles, which when we started talking about autonomous vehicles several years ago-- we started talking about autonomous vehicles before that, I'm sure. But when it really started to become a conversation in the public sphere, I feel like this was part of the conversation of we need to have vehicles that can communicate with the road, and communicate with other vehicles, which would make the whole problem of self-driving cars, honestly, a lot easier, compared to right now, where they have to just see everything around them and try to figure it out, like we do as humans, which computers have other challenges with.

So that was really interesting to hear about that Fabian had worked on that kind of space.

ABDEL SGHIOUAR: Yeah. So that's something I have never thought about until I chatted with Fabian. Because, for me, it sounded like, oh, you have a car. Your car talks to a remote server somewhere, or talks to another car, or talks to a road sign maybe, stuff like that. I never thought about the authentication and authorization part. I just thought, just an API, right? You have a key. You start the key in the car, and then-- by key, I mean a JWT token or something, not a public key. I was reminded of that when--

And this is probably not related to confidential computing per se, but I do have a friend who's working with a music streaming company-- without mentioning names-- and their clients are not always mobile and web. Sometimes they have cars as clients, like you have the app of that company built into your car-- all the new cars.

KASLIN FIELDS: Sometimes they have cars as clients.

ABDEL SGHIOUAR: Yeah, literally. Because--

KASLIN FIELDS: Great sound bite right there.

ABDEL SGHIOUAR: Because sometimes the Infotainment of the car, the system that the car manufacturer builds, have the client installed on it, and then you can just connect-- log in with your account and then stream.

And so we were chatting about it, and I was asking this person, this friend, so how do you identify who is trying to connect to your backend? And they were like, yeah, PKI-- public private keys. So it's stuff you never think about, but it's like, OK, yeah, sure. That sounds interesting.

KASLIN FIELDS: Yeah, thinking about attacks and security of autonomous vehicles is like one of the scariest sci-fi things of the modern world, in my opinion.

ABDEL SGHIOUAR: Oh, yes. Yes. That sounds like an action movie waiting to happen.

KASLIN FIELDS: So I'm glad folks are working on that.

ABDEL SGHIOUAR: I'm also glad people are thinking like oh, let's just have something more secure than just random tokens and passwords.

KASLIN FIELDS: Yeah, that's kind of important.

ABDEL SGHIOUAR: Exactly.

KASLIN FIELDS: So I definitely learned a lot from this conversation. I'm glad that you all did this because I learned so many new terms that I need to go read up on now. We took a bunch of show notes. So make sure if you also want to read up on these terms, like I do, to go check out the show notes for links to the whitepaper and all sorts of other resources about stuff that Abdel and Fabian talked about.

ABDEL SGHIOUAR: Awesome.

KASLIN FIELDS: Yeah. All right. We'll see you next time.

ABDEL SGHIOUAR: Thank you very much.

KASLIN FIELDS: We'll do the wrap-up now.

ABDEL SGHIOUAR: Sorry.

KASLIN FIELDS: Welcome to us trying to wrap up this episode. We just went off in a random direction, and now we're here.

ABDEL SGHIOUAR: Yes.

KASLIN FIELDS: So thank you again for that interview, Abdel.

ABDEL SGHIOUAR: Thank you.

KASLIN FIELDS: We'll see you all on our next episode.

[MUSIC PLAYING]

That brings us to the end of another episode. If you enjoyed this show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on social media at Kubernetes Pod or reach us by email at <Kubernetespodcast@google.com>. You can also check out the website at Kubernetespodcast.com where you'll find transcripts, and show notes, and links to subscribe. Please consider rating us in your podcast player so we can help more people find and enjoy the show.

Thanks for listening, and we'll see you next time.

[MUSIC PLAYING]