Kubernetes Podcast from Google: Episode 189 - Ambient Mesh, with Justin Pettit and Ethan Jackson

#189 September 21, 2022

Ambient Mesh, with Justin Pettit and Ethan Jackson

Hosts: Craig Box

When you think of a service mesh, you probably think of “sidecar containers running with each pod”. The Istio team has come up with a new approach, introduced recently as an experimental preview. Google Cloud software engineers Justin Pettit and Ethan Jackson join Craig to explore ambient mesh.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

News of the week

Links from the interview

Nicira
Open vSwitch
Introucing Ambient Mesh
- Service mesh
First mention of Ambient in 2018
No first class support for sidecars in Kubernetes
Istio working group meeting, August 2021
- Remote proxy proposal
HBONE: HTTP/2-based overlay network environment
mTLS
HTTP Connect
GIF
MASQUE and QUIC
Get started with Ambient Mesh
Ambient Mesh Security Deep Dive
Justin Pettit and Ethan Jackson on Twitter

Transcript

Show full transcript

CRAIG BOX: Hi, and welcome to the "Kubernetes Podcast" from Google. I'm your host, Craig Box.

[MUSIC PLAYING]

CRAIG BOX: There are a lot of podcasts to listen to, and the episodes just keep coming. I know it's hard to keep up. I know it's hard to keep up with making them. On one hand, I had a tweet an hour or so after the release of the last episode saying how great an interview it was. Seriously, if you're new to the show, please do go back and listen to Episode 188. It's quite important.

But on the other hand, I had a tweet last week from a listener who said they had finally caught up to being only one year behind. The trouble is, there's just so much news. It's important to be timely, but timeliness is a tough needle to thread. Every now and then, a piece of late breaking news will come in between recording and release of the episode.

Last week, I sent the show off to the editors, went to bed, and woke up to the news that the Queen had died. Thus kicked off a period of both national and household mourning. I wouldn't say that I'm an ardent royalist, but I appreciate the fact that this one incredibly dedicated person has provided consistency and guidance to the leaders of both of my countries for over 70 years.

As for whether or not that will stay true with the new guy, he has done a pretty good apprenticeship. Wall-to-wall coverage of The Queen moved on to a 24-hour live stream of “The Queue” as 750,000 mourners filed past Her Majesty's coffin, queueing for up to 24 hours for the privilege to do so.

I'm back in New Zealand now, but had I been in London, there's a very real chance I would have done so too. Maybe not so much for now, but to one day I look back and say that I had. People say there's never been anything like it. It turns out there has, but it was over 70 years ago, and most of us have forgotten. People were better dressed in the last queue. At least the rain mostly held off. We've had it all down under. If you listen very carefully, you might even hear it now. Let's get to the news.

[MUSIC PLAYING]

CRAIG BOX: This week's new CNCF incubating project is Cloud Custodian. Originally built at Capital One, the project is a rules engine for policy definition and enforcement. It supports GCP, AWS, and Azure environments, with support for Kubernetes on the roadmap. Custodian joined the CNCF sandbox in August 2020 and claims 100 million downloads since then.

Google has announced general availability of virtual machine support in Anthos. While GCP customers could technically run VMs and containers in VMs and containers, the target use case is Anthos on bare metal, now known as Google Distributed Cloud Virtual.

The Anthos VM runtime is based on the Kubevirt project, with extensions to integrate with the Google Cloud operations suite, support for multiple network interfaces for VM mobility, and VLAN integration with network policies to provide a VPC-like microsegmentation experience. Google also announced that control plane metrics for GKE are now generally available, enabling you to see API server and scheduler status and errors.

The Kubernetes Security Response Committee has reported two new vulnerabilities. The first allows an aggregated API server to redirect client traffic to any URL, and is rated as a medium severity. The second relates to Windows workloads running as ContainerAdministrator even when the runAsNonRoot is set, and is rated as low. If you're using aggregated API servers or Windows containers, patches are available for Kubernetes versions 1.22 through 1.25.

Argo vendor Akuity, whose founder and CTO Jesse Suen we spoke to in Episode 172, has announced Akuity Platform. As you might expect, the platform offers fully managed Argo CD as a hosted service, handling provisioning, scaling, upgrades, and security patching. The service is made possible by a new “data plane/control plane” separation where each cluster runs its own agent and controller. Akuity says this can reduce network traffic by as much as 80%.

In case you prefer Flux, its parent company, Weaveworks, has released a new version of Weave GitOps this week with policy-as-code integrations and team workspaces. First introduced as a commercial product in February, Coroot has moved some of its key functionality to open source and the Community Edition. Coroot is a monitoring and troubleshooting tool for microservice architectures. It offers eBPF-based service mapping, pattern analysis of logs, and advanced Postgres observability, and is now under the Apache license.

German cybersecurity startup Edgeless Systems has released Constellation, a distribution of Kubernetes that uses Confidential Computing. Constellation allows you to keep a Kubernetes cluster verifiably shielded from its underlying cloud infrastructure. Features include Sigstore-based attestation of Kubernetes nodes and artifacts and automatic config-free encryption of cloud storage and node-to-node networking. However, GKE Confidential Nodes might like to have a word with their claim of being the first Confidential Kubernetes.

We are three weeks from this year's Google Cloud Next, where you can get hands-on with the newest products and technologies and meet with the engineers who build Google Cloud. This year, join us online as the team comes to you live from the Google Cloud campus and California and from offices around the world to connect with people who speak and code in your language. Next starts on October 11 at 9:00 AM PST. Registration is now open, and you can find the link in the show notes.

Finally, Dell's Apex cloud services division has announced an extended partnership with Red Hat. Coming after the vendor spun off VMware last year, the new offerings include Dell-managed on-premises Container-as-a-service solutions on Dell infrastructure powered by Red Hat's OpenShift. Reached for comment, Dell Apex marketing lead Adam Glick said, “I can't believe you put this fake quote in the news just for the people who liked the show better when I was still on it.” And that's the news.

[MUSIC PLAYING]

CRAIG BOX: Justin Pettit and Ethan Jackson are software engineers at Google Cloud and leaders of the project to build Istio's Ambient Mesh. Welcome to the show, Justin.

JUSTIN PETTIT: Thank you. It's great to be here.

CRAIG BOX: And welcome, Ethan.

ETHAN JACKSON: Hey. Thanks for having me.

CRAIG BOX: You two are both reasonably new to Google and the Istio project, but I understand you're not new to each other?

JUSTIN PETTIT: That's right. Ethan and I worked for a startup called Nicira where we were two of the lead developers on a project called Open vSwitch, which was an early open source SDN programmable switch. And so we worked for a number of years on that together, and on a couple of other open source projects around SDN and network virtualization.

CRAIG BOX: Do you have any stories about each other that you want to open up with?

JUSTIN PETTIT: Ethan's quite a bit younger than I am. And I remember, at one point, I was in my office, and Ethan came bursting in and said, you know what? Ethernet used to come in 100-megabit. When I was in college, actually, it was a big deal when it went to 100 megabit. That was, like, the big, new technology. But he didn't even know that existed.

CRAIG BOX: Gosh, Ethan, you must be a bit younger than me, even, if that's your idea of what slow is.

ETHAN JACKSON: I'm younger than I look.

CRAIG BOX: And Ethan, was Justin your boss?

ETHAN JACKSON: I'd say, like — he was my tech lead, maybe. I'm not even sure who my boss was back then.

CRAIG BOX: Was he a good guy to work for?

ETHAN JACKSON: No, he's terrible.

CRAIG BOX: Has he improved over time, aged gracefully?

ETHAN JACKSON: Not really. He's gotten worse.

CRAIG BOX: I understand you're also still mid-PhD. How's that going?

ETHAN JACKSON: It's going. About once a quarter, I meet my advisor and we talk about when I should finish it, and then I get busy at work.

CRAIG BOX: No concrete dates, then?

ETHAN JACKSON: Yeah. Maybe when I'm your and Justin's age, I'll take a sabbatical to finish it.

CRAIG BOX: Well, congratulations to you both on the launch of Ambient Mesh. Sell me on Ambient Mesh in a sentence.

ETHAN JACKSON: Ambient Mesh is a new data plane mode for Istio that's designed to feel ambient. And by that, we mean, from the perspective of an application running in Kubernetes, Istio is just part of the network. It doesn't have sidecars or it doesn't have to kind of modify its configuration to use it. And we think that gives it simplified operations and a bunch of benefits that I'm sure we'll get into in depth today.

CRAIG BOX: One of the key features of Istio when it launched was the idea that each workload in a Kubernetes cluster had a sidecar proxy attached to it. The selling point of Ambient Mesh is that you don't need sidecars anymore. What's up with you new guys coming in and upending the status quo?

ETHAN JACKSON: I don't think it's something that we did unilaterally, obviously. This is something that came together as a consensus, at least, among the engineers at Google, and as we'll talk later, also a bunch of engineers at Solo. But at Google, we had a lot of experience with customers running Istio and started to notice some significant limitations to the sidecar architecture.

And over time, it became clear that we needed different ways of running Istio to attract different types of users. And that really was the source of what caused us to start working on this project.

JUSTIN PETTIT: Yeah. I think that when sidecars came out, they were a big improvement over what existed before, which usually, to get those features, there needed to be modifications made to the binary or to insert libraries into the programs, which is pretty invasive. And so sidecars were actually an improvement over that. And I think they still make sense in some architectures, but it's pretty invasive. And so we were looking at alternatives that would be easier for people to deploy to get a lot of the features of Istio.

CRAIG BOX: Let's step back a little bit for people who may not be as familiar with the topic and recap what you get when you install Service Mesh.

ETHAN JACKSON: The way I think of Service Mesh, which may not be the way most people describe it, but I think of it as a way to interpose layer 7 — so HTTP processing, on your network application traffic. So what this gives you is a couple of things. One of the most important ones that bring people to Service Mesh in the first place is encryption of your traffic.

Layer 7 telemetry — so who's making what requests to who. Layer 7 policy — allow this identity to talk to that identity, but not that identity. And then some amount of layer 7 routing. So if the request is coming from this hostname, send it over there. That sort of thing. And I think the core inside of Istio is that doing all that stuff at layer 7 makes more sense than trying to impose network security and telemetry and all of this stuff in the network, where it's harder to do. We pull this policy up into the application layer, where we have more visibility into what's going on and can exert more control.

JUSTIN PETTIT: And one of the other issues which is actually solved by Ambient, because we are pulling some of that network policy, now, into the network. But we still have the identity, you want to have the identity closer to the workloads. One of the things with just having that functionality in the network is that you don't have that strong identity — so to implement policies based on identity — and so regardless, you have the identity closer to the workloads. But if you have that and then you can carry that identity, then where you actually enforce the policy becomes a little less important.

CRAIG BOX: Ambient Mesh as a concept first came up in an article in 2018, so it turns out this isn't actually a new idea. How is it that people at Google came to realize there was a better option?

JUSTIN PETTIT: I would actually say that the architecture that we're using in Ambient looks quite a bit like network virtualization, which is an area that Ethan and I spent quite a bit of time on.

CRAIG BOX: How convenient.

JUSTIN PETTIT: Yeah. The architecture looks a lot like things that we were building when we were at Nicira and at VMware. If you spoke to networking engineers, I think this architecture is probably a little closer to what you would get with that. It's sort of a natural evolution. I think the way that Istio was built and then the way they approached the sidecars, they ended up with one architecture. But sort of stepping back and looking at it, trying to address the issues with sidecars, I think this becomes a natural way to build that.

ETHAN JACKSON: When I joined Google to work on Istio, maybe 18 months ago, I think there was a general sense in the air that sidecars have problems that we'd like to address for some customers. And I think there was a general sense that we wanted to move away from sidecars for many of our users, but not really an understanding of exactly how to do it.

So when I joined, we were actually in a period of a lot of experimentation. You see this kind of polished Ambient Mesh thing. You didn't see all of the failed ideas that me and others internally had that didn't make it. So I think the idea of getting sidecars out of the application pods has been around for a long time. But actually doing the hard work of working out all of the details into something that's actually workable architecture, that only really started maybe 18 to 24 months ago.

CRAIG BOX: You touched a little bit there before — both of you — on the downsides of sidecars, the idea that it's sort of somewhat invasive. Can you talk about the specifics of what problems people have with sidecars as a deployment pattern? Is it just the fact that Kubernetes doesn't support them as a first class object?

JUSTIN PETTIT: We've seen a couple of issues. There was a lot of interest in the ability to enable mTLS, for example, in workloads for GKE. And in order to do that, injecting the sidecars is pretty invasive. The workload has to get restarted. You insert this proxy that is terminating L7. And that L7 proxy can actually be pretty disruptive to some networking stacks. So if the application has a broken HTTP stack, for example, the proxy may actually not be able to process that properly, and it can lead to breakage.

And so what we found was that, once people have deployed sidecars, they work pretty well for those people. But there's a certain amount of work that needs to be done to make sure that it's going to work in that environment, which is pretty different from the ability to just be able to turn them on or turn them off. And so we think that using the ambient approach, getting rid of the sidecars, is going to lower the bar for people to make use of Istio without having to take on all of those architectural changes from sidecars.

ETHAN JACKSON: Just to add another issue with sidecars, they do break some workloads. They're also a little bit inefficient from a resource-usage perspective. So if you think, on a particular pod, you have to allocate the RAM and CPU for that sidecar for kind of the worst-case usage that you expect for that pod.

In practice, what we found with our customers is, most of them just guess. So you end up having these massive resource allocations to these underutilized side cars. They're very inefficient from a RAM/CPU perspective. And that's something that people complain about.

CRAIG BOX: But again, that's something that Kubernetes could have fixed. The idea of the sidecar is a thing that exists inside the same pod is simply because that's the only thing that there is. If there'd been some sort of co-scheduling mechanism, for example, that might have been a way to achieve the same end.

ETHAN JACKSON: I mentioned ideas that failed. That was one that we investigated pretty aggressively with the Kubernetes scheduling team at Google. And their perspective was that there was just an architectural decision made in Kubernetes that the pod is the unit of scheduling and you can't have co-scheduling.

You have to make these decisions and simplifications to have a workable piece of software, but it was going to be so hard to move away from that that it just was kind of a nonstarter.

CRAIG BOX: So to summarize the situation now, we have the Service Mesh pattern out in the wild using sidecars, very much built by application and API people. We now have a couple of guys who have come in from the virtual networking world who have come up with a new model as to how we can improve things. You then presented some of these ideas at an Istio working group meeting about a year ago. How did that meeting go?

JUSTIN PETTIT: When we presented it, there actually wasn't much of a reaction. People thought it was interesting. There was a lot of questions about, we've been doing side cars for so long, and what's the latency going to be? And there was just a lot of architectural questions, but we didn't hear a lot from that meeting about people who are interested in contributing to it.

ETHAN JACKSON: I would concur. There was interest. And to be fair, on our end, the idea was so early it was kind of vaporware. So there was a lot of, “Oh, that sounds cool. Let us know when there's something behind it.”

CRAIG BOX: That's when the team at Solo decided that they were interested in this and possibly going to follow up on some of those concepts. How did you end up working together?

ETHAN JACKSON: I'll say — and it wasn't immediately after — so we had that working group meeting, and then we went off for six months, at least, on our own working on this. And then Louis Ryan, one of the principal engineers at Google, I don't know how, got wind that Solo was working on the same thing, probably from hearing about it in the working group meeting.

So we reached out to them, said, hey, we're working on the same thing. Maybe we should collaborate and get there faster with one product versus building something that's at odds. And had a couple tense early meetings where we were kind of meeting each other and making sure everyone was on the same page, and we were going to be good collaborators.

And we realized that their vision for where Istio should go was remarkably similar to our vision, and we have very complementary skills. They have a much stronger sense of the market than we do, in a lot of ways. So we started just meeting every week and working together on it.

CRAIG BOX: You mentioned before that the idea was very nascent. The collaboration to get to the point where it was ready to present has been private so far. Istio is an open source project. How have you decided at which point to make the work public?

JUSTIN PETTIT: As I mentioned, when we presented it at the networking working group, there is an interest, but there's a lot of obvious questions about the architecture. So for example, what's the latency going to be? Instead of having the processing happening locally, if you push that off into some remote node in the network.

And so we knew that there was a lot of questions about whether this architecture was actually workable. And so when we saw Solo was interested and we started developing it, we wanted to show people that it could work and have some numbers to back that up. And so we worked with Solo on what we are now calling an experimental release that convinced us that this was workable.

We tried to release it as soon as we thought it was viable so that we answered the questions about whether it was going to work or not. And because, one thing you don't want to do with an open-source project, is just have a massive merge and just tell people to accept it just the way it is.

And so the code is quite complete. It's workable. It still has a ways to go. There are inefficiencies. We know we can get the latency better. But architecturally, we know that it works. And we wanted to get that. So what made us decide to get it out was, we wanted to do it as early as we could where we could also confidently say that this architecture makes sense and will work.

ETHAN JACKSON: From my perspective, when we started this project, it sounds really obvious now that we've launched it, but it really wasn't clear to me that this was the right thing to do. And as I mentioned, we had numerous — I mean, Justin and I probably wrote 10 design docs that failed — and even once we got to the point of starting to implement it, I really viewed this as more of a research project than an engineering project in that I didn't actually know if we could build it and if it would work and have the properties that we wanted out of the system.

So we felt that it would be better to go to the community with something that we felt was the right path forward, and we needed to have at least an experimental implementation to get there. Now, on the flip side, we didn't want to implement this to the point that it was production-quality, and we're going to the community saying, merge this. And the code that we've announced isn't there, and it probably won't be there for a couple of quarters at least.

So it's a balancing act there. And that's kind of where we ended up. We wanted working, experimental code that you can play around with that isn't kind of finished and still has space for people to kind of comment on and make suggestions.

CRAIG BOX: People who are familiar with Service Mesh will probably think of it in the archetypal fashion of, all of your workloads now have a sidecar deployed, and they mesh together like a net to give you all of that functionality. At 10,000 feet, how is Ambient different to that? How does it work as a model?

ETHAN JACKSON: I think of Ambient Mesh as three core architectural changes in Istio. First, it's sidecar-less. So Istio, your application sends traffic, IP Tables captures it, sends it to a sidecar, and the sidecar forwards it. In Ambient Mesh, your application sends traffic, it leaves the pod, and it's captured by a node local proxy called the ztunnel, which forwards it. So there's no sidecar. It's more of a node proxy.

Second — and this is a little bit more subtle — the architecture is layered. So in Istio, everything Istio does is implemented in the sidecar, from encryption all the way through L7 functionality. In Ambient Mesh, we actually break the architecture into a secure overlay that's just responsible for basic encryption, identity, and routing of traffic. And then on a per-namespace basis, you can deploy what's called a waypoint, which gives you full L7 functionality, so everything that you're used to with Istio today.

And then a third piece is we interconnect all these pieces using HBONE, which is a new tunneling protocol that — well, it's not that new — but it's a new tunneling protocol that we've developed for Ambient Mesh.

CRAIG BOX: With the benefit of hindsight, what do people actually use Service Mesh for? And how did that factor into how Ambient was designed?

ETHAN JACKSON: It's funny. When you actually look at usage statistics, at least within Google Cloud, a large percentage of our users are just using Service Mesh for encryption. And if you look at why people get started with Service Mesh, I suspect that's almost everyone. They've got unencrypted traffic, and they have some regulatory requirement that says you must encrypt your traffic, and Istio is the easiest way to do that. And then, once they have it installed, they start using more complicated features over time. But basic encryption and identity are kind of the main use case.

JUSTIN PETTIT: As Ethan mentioned, the primary use case of Ambient, being the mTLS encryption, the nice thing about the way that the Ambient architecture works is that if you only want that encryption, we only need to deploy the ztunnels and send the traffic between them. So there's no need to have them go through a waypoint proxy or do any L7 processing at all, which will improve the performance, because it's much more efficient to just tunnel the traffic than it is to terminate it and do full L7 processing.

CRAIG BOX: From the earliest days at Google, there was discussion about whether Istio should be part of Kubernetes — it should be just a feature set, or whether it should be its own thing. And I think the reason that it is its own thing is largely because of the layer 7 functionality. Now we're talking about splitting those things out and having just encryption, and that being a thing people want just itself, is that something that should be built into Kubernetes?

ETHAN JACKSON: I wouldn't say it should be built into Kubernetes, because Kubernetes has always had this approach of allowing different networks and implementations to exist, and you allow for a multitude of implementations and innovation in that space. I do think it could make sense to take the secure overlay, over time, and split it out into its own CNI, perhaps. There's no specific plans to do that right now, but it does start to look a lot like a network CNI, certainly.

JUSTIN PETTIT: I think there are some interesting use cases for that, where this model that we've come up with Ambient Mesh is fairly generalizable, so that currently, we're introducing these waypoint proxies to do the L7 policies of Istio. But you could imagine other sorts of middle boxes being placed in the network path, as well, to do things like DPI or advanced firewall functionality. And so in that way, making this fully generalized, I think, could be an interesting project to do secure overlay with service insertion.

CRAIG BOX: Let's dive, then, into some of the specifics of the implementation. We've talked about the secure overlay agent. You've referred to it as the ztunnel. What does the Z stand for?

ETHAN JACKSON: Zero trust.

CRAIG BOX: Why is it Z and not zed?

ETHAN JACKSON: Because we're Americans.

CRAIG BOX: Did no one suggest that you should call it the Open zSwitch?

ETHAN JACKSON: [LAUGHING] No one did, But, I think that's better name than where we ended up.

CRAIG BOX: Istio has long told people it's not secure to share an Envoy proxy between workloads. So have the Envoy maintainers. Why is it safe to share a ztunnel?

ETHAN JACKSON: We're getting into a little bit of splitting hairs here. Our position is that it's not safe to share layer 7 processing of traffic between workloads in Envoy. But all the ztunnel does is it receives a byte stream, encrypts it, and forwards it. So a lot of the risk associated with the Envoy HTTP stack doesn't really come into play for us. So we do think it's safe.

I'll also say that in the experimental branch, we're using Envoy as the ztunnel implementation, but we almost think of that as more of a reference implementation. And over time, we expect there to be multiple competing implementations, some of which may be Envoy, some of which may not.

CRAIG BOX: Where's the eBPF?

ETHAN JACKSON: So that's another interesting one. Ambient Mesh is agnostic to how the ztunnel captures traffic. And that will depend on what CNI you're using in the Kubernetes cluster that you install Ambient Mesh in. In the experimental code we released, it's not based on eBPF. It's based on IP Tables. I can tell you, in Google, we've got multiple CNIs that someone can use to run Kubernetes, and on those CNIs that use eBPF, we'll use eBPF to capture traffic in Ambient Mesh. So we think eBPF is cool, but it's just a tool for getting traffic to the ztunnel for us.

CRAIG BOX: You say that the networking stack redirects all traffic for participating workloads through the ztunnel. How does it know to do that, and then how does the ztunnel know where to send the traffic?

JUSTIN PETTIT: So the way it knows to do that is, we've set up the CNI to know which pods are running in kind of Ambient mode, and it knows which virtual ethernet devices are associated with pods that are running in Ambient mode. And essentially, the CNI is instructed to take all traffic that egresses that virtual ethernet device and send it to the ztunnel pod. I might be getting a little bit wonky here. But that's how it works.

So the host CNI is responsible for getting traffic to the ztunnel, however it does that. And then we have a controller — the Istio, basically — programs Envoy, which is how the ztunnel is implemented, to know, hey, if I receive traffic with this source IP and this dest IP, encrypt it with this key and forward it to that next hop.

CRAIG BOX: You've talked about the different CNIs that can be installed. Istio itself has a CNI, as you mentioned just there. The Istio CNI plugin is installed on each node and the ztunnel is installed on each node. Why are these two different things?

ETHAN JACKSON: They seem vaguely related, but they're not really. The Istio CNI — first of all, it's not a full network CNI. It's not intended to be run on a Kubernetes cluster without another CNI that handles IPAM and basic L3 routing and all of that sort of stuff. It's got a fairly limited purpose, which is essentially setting up sidecars and pods. Setting up the routing of traffic within the pod to work with the sidecar.

So you still need the Istio CNI if you're going to run sidecars in your cluster. And it's the best way to do that. The ztunnel has really nothing to do with sidecars, so it doesn't really depend on the Istio CNI. And there's a little bit of a misnomer. I use the term CNI kind of to mean a full, complete networking CNI that handles everything, kind of like Calico, or Cilium, or there's a number of these floating around. But the Istio CNI is a much more limited thing.

CRAIG BOX: One of the things people want out of a Service Mesh is security, is to know that traffic is encrypted between all of the nodes and the workloads that run on them. A lot of people will think about security in the context of VPNs, which is a thing that they are used to in their head, where everything goes through some kind of tunnel. People talking about mTLS, they understand that the m means that both ends of the connection are verified.

But I was surprised when I first learned that mTLS is actually a feature of HTTP, and what you're doing is you're taking your existing HTTP connection and you're upgrading the security on it. You're not actually tunneling it. That's the way Istio has worked until today. You're now taking a tunneling approach. Talk me through the decision and whether or not it was possible to achieve what you wanted to achieve with the existing mTLS approach.

ETHAN JACKSON: mTLS is pretty narrowly about encrypting byte streams in a specific way. And the property of it that's particularly important to us at Istio is that both ends of the connection, down to the pod, have a cryptographic identity, and both ends of the connection are verifying the other end of that identity. That's what makes it different than regular TLS, where your browser connects to Google. Google doesn't really care who you are. It'll run searches for you.

So that's one problem. And we think that is the best way to do encryption, for a bunch of reasons. There's a separate problem of, how do you move traffic through the network? And tunneling has a lot of advantages over what Istio was doing before that basically come down to, if you establish a tunnel, you can forward any traffic anywhere without having to care about or know what that traffic is.

So you can forward TCP, or HTTP, or anything. And you don't have to modify it or understand much about it. So that's why we're kind of doing mTLS with HBONE while Istio is doing mTLS upgrade without tunneling, which is less good.

CRAIG BOX: You've mentioned HBONE there. That's something that's introduced alongside Ambient Mesh. What is an HBONE?

JUSTIN PETTIT: HBONE is this new tunneling mechanism that we're using in ambient, although it actually has benefits for sidecar as well. And so it's available for that. HBONE is a new standard that we're defining to allow using HTTP connect to tunnel the traffic. So Ethan mentioned the advantages of tunneling. And so what we're doing is we're using an HTTP-based tunnel. And so actually, HBONE stands for HTTP 2-Based Overlay Network Environment.

And so we do an mTLS connection between the two entities that are talking, and then we create individual HTTP connect streams within that tunnel for any two entities that are talking to each other that are on those nodes. And as Ethan mentioned, it allows us to forward that traffic uninterpreted so that we will not break. If we have something like a nonconforming HTTP stack, because we're not terminating that connection, we're just forwarding it through, we're able to just send that through an HTTP connect tunnel.

CRAIG BOX: Is there any particular advantage to the tunnels being based on HTTP at this point, if they're just tunnels?

ETHAN JACKSON: The more important debate is, do you tunnel packets or byte streams? Traditionally, networking, it's very common to tunnel packets. There's GRE. There's GENEVE. There's vxlan. There's a bunch of packet tunneling protocols. It is less common to do what we're doing in HBONE, where when ztunnel receives traffic, it doesn't actually care about how it was literally packetized.

It takes it, it terminates TCP, it creates a byte stream, and then it shoves that byte stream into an HTTP. So if you're going to send an encrypted byte stream on the internet, there's no alternative HTTP. Like, we could invent a new protocol, but why? There could be a debate of whether we should use HBONE or one of these network-encrypted packet tunneling protocol. For various reasons, there isn't one that we thought met our requirements today, so we went kind of this route.

CRAIG BOX: It just strikes me as strange. There's not necessarily going to be any hypertext involved in the tunneling of TCP streams over this HBONE protocol.

ETHAN JACKSON: Ultimately, the applications could be sending HTML cat GIFs, all Istio is really for.

CRAIG BOX: I should point out that the official pronunciation on this podcast is "giff."

ETHAN JACKSON: [LAUGHING] Again, we're Americans, so.

CRAIG BOX: HTTP Connect is obviously part of the HTTP standard. You talked about standardization of HBONE. Where is that protocol going?

JUSTIN PETTIT: We're hoping to define it as an open standard so that other proxies can do it. So we have a standardized way to communicate with other load balancers or anything else that needs to proxy traffic. But currently, it's limited to TCP, because we're doing it over a TCP stream. And HTTP Connect is limited to TCP.

One of the things that we are keeping a close eye on is an IETF group called MASQUE. And they are looking at standardizing new ways to do Connect. So there is a Connect IP and a Connect UDP. And so this would allow us to tunnel traffic that is not just TCP between HBONE connections. And so that would be a new direction for Istio that HBONE will allow us to do.

ETHAN JACKSON: One thing to add to that really quick. The reason MASQUE is able to tunnel raw IP and raw UDP is, it's based on QUIC, which is also called HTTP 3. And QUIC sends traffic with UDP packets rather than TCP streams. So you can much more comfortably send packetized traffic rather than byte streams.

CRAIG BOX: I should point out that's MASQUE with a Q to give the connection to QUIC. Let's move, now, to the layer 7 proxies. The waypoint proxies are enabled between a client and a service, or a consumer and a producer, if you prefer. They appear to sit in the middle of the network as opposed to the sidecar model, where you had a proxy at both ends. What is a waypoint associated with?

JUSTIN PETTIT: The first thing that happens when you enable Ambient Mesh is that we do this secure overlay. Then we allow — on a namespace basis — to define L7 policies. And so typically, if you were to have an authorization policy, an L7 authorization policy, then, for that producer side, you would enable a waypoint for that.

Then all of the clients then would, instead of connecting directly to the node that is hosting those producers, then their traffic will now be routed to the waypoint. And then that waypoint will then forward the traffic to the ultimate destination. And the L7 policies occur there. So for security, it's pretty clear why you would do that there.

One of the less obvious implications of this is that it modifies the way that load balancing is done. So currently, in Istio, with the sidecar model, each client does the load balancing to decide which server to send the traffic to. And in the model that we're using with waypoints, all of the clients, if there's L7 load balancing, would be sent to that server's waypoint proxy, and then the waypoint proxy would then distribute the traffic.

And in a lot of ways, this is more what you would expect a network to do, because that proxy that's sitting in front of the server has a lot more context about what traffic has previously been seen. So the load balancing should be much fairer in this new model with the waypoint proxies.

CRAIG BOX: It does feel a lot more like how load balancing used to be. But one of the things that was brought up as an advantage of sidecars when they first launched was the idea that this was distributed, and we now didn't have single points of failure with mid-tier load balancers. How do we get around that problem? How do we deal with what happens when our waypoint fails?

JUSTIN PETTIT: Most of the architecture drawings we've been doing have shown a single waypoint proxy, but that's actually not how it will necessarily be implemented. Istio can control how many waypoint proxies there are and load balance among them as well so that we can create some redundancy there to make sure that a waypoint proxy is always up and available.

CRAIG BOX: Can you auto-scale those proxies?

JUSTIN PETTIT: Yes. Since the waypoint proxies are just regular pods that are scheduled in Kubernetes, then we can make use of Kubernetes' ability to do auto-scaling.

ETHAN JACKSON: The way I think of waypoints is that they're the layer 7 proxy for a particular namespace. And the simplest way to think about it is, if you deploy a waypoint on a namespace, then all traffic that enters that namespace will go through the waypoint.

The logical conclusion of that is that it makes sense to implement load balancing and routing policy for a namespace on that namespace's waypoint rather than at the sidecar of everyone who's talking to that namespace.

It's subtle, but what this gives you is actually a much better administrative boundary, where, if a particular namespace wants to implement a particular set of policies, it can deploy a waypoint and all traffic going to it will get those policies, versus in the sidecar model, if I want a particular load balancing policy, everyone who talks to me has to implement it, no matter the cost. And I have to trust them to implement it.

So it's subtly shifting how network policies are implemented to be closer toward the server rather than the client, which is how people tend to think about these things anyway.

CRAIG BOX: One of the great things about Ambient Mesh is it's not all or nothing. You can enable it on certain workloads, and indeed, you can interrupt right between Ambient mode workloads and sidecar mode workloads. How is that possible?

ETHAN JACKSON: This is kind of part of the magic of HBONE. So we're requiring everyone to implement this much more flexible transport protocol. By the way, aside from Ambient, we do expect that all sidecars will migrate to HBONE over the next couple of releases anyway.

But then, once you have that, the Istio controller that has a bunch of waypoints and a bunch of sidecars, and if sidecar sends traffic to a pod that is in an Ambient configuration, the control plane knows, instead of sending the traffic directly to that pod, it sends it to its waypoint, for example. So this is just something that the control plane can do because it knows, for each pod on the network, is it sidecar, secure mesh, waypoint, or not meshed at all? And it routes accordingly.

CRAIG BOX: We've talked a lot throughout about this being applied to namespaces. There is a certain implication that that means that the way you deploy workloads is effectively each set of applications in its own namespace rather than, perhaps, a namespace for a group in a business who have all of their applications running in it. Does Ambient require you to run things in a way that's compatible with it, or can you apply this model to however people choose to align their objects with Kubernetes?

ETHAN JACKSON: The namespace is the configuration boundary. So this isn't exactly true, but in almost all cases, all workloads within a namespace need to have the same ambient configuration. They either need to be all sidecar or all secure overlay or all waypointed.

From a security perspective, waypoints actually aren't deployed on a per-namespace basis. They're deployed on a per-service account, which is the Kubernetes concept of a network cryptographic identity basis. So the practical result of that is you don't have waypoints that are sharing keys. Each waypoint is its own identity. Now, they can horizontally scale. You can have five of them with a particular identity, but you don't have one with two identities.

So from a security perspective, however you organize it from a namespace perspective, it doesn't really matter. But many other things in Kubernetes, namespaces have an impact on how fine-grained your configuration can be.

CRAIG BOX: What about if I want to control traffic leaving my applications rather than coming into it? Is there an idea of processing egress traffic with waypoints?

ETHAN JACKSON: Yes. So the logic about which waypoints to go through and have to go through a waypoint is quite subtle. The algorithm is something like this. If you're sending traffic, if there is a namespace in front of the server, send it to the server. If the client has a waypoint, send the traffic through the client's waypoint. If there's either no server waypoint at all or there's an egress policy.

So if there's some policy that says, add this header to all traffic that leaves this namespace, we have to send it to the client namespace. So that's a little detailed, but at a high level, I'll just say, Istio intelligently figures out whether it has to send traffic through one or two waypoints depending on the policy you apply. And it heavily prefers to send traffic through exactly one waypoint if it can.

CRAIG BOX: It doesn't handle the deployment of the waypoints, though. That's something that users have to specify themselves through gateway resources, as I understand.

ETHAN JACKSON: Yeah. And this is something that will depend a lot on the specific Istio implementation you use. So Anthos Service Mesh will be different than Solo will be different than Tetrate, et cetera, I expect. The way you signify the intent that a namespace needs a waypoint is you deploy a Kubernetes gateway object.

That gateway object actually doesn't say a lot about the actual implementation of that waypoint in the network, like how many there should be, how big they should be, all of that sort of stuff. We expect that to be configured separately in the long term. Right now, in the experimental code, it's actually not configurable. That's something that we have to add before merging it upstream.

But I would think of the gateway object as signaling the intent to have a waypoint more than doing the literally — it's not, like, a Kubernetes pod spec. It's a different thing.

JUSTIN PETTIT: But it's important to mention that, while we're using the gateway API to instantiate the waypoint, we're not requiring that the gateway API be used to configure it. So we continue to support the existing Istio APIs to configure the waypoint.

ETHAN JACKSON: You can use virtual service. You don't have to use the HTTP route.

CRAIG BOX: One of the pros of the sidecar model is that all of the traffic to a pod is intercepted, and thus, I know everything going to a workload has gone through a sidecar. With a waypoint proxy, that may not necessarily be true. How do you stop people bypassing the security functions of the mesh and talking directly to your workload?

ETHAN JACKSON: Actually, that statement you said is not true. An application can pretty trivially bypass its sidecar. It just chooses not to send traffic through its sidecar, and that is an issue with sidecar that makes it maybe not the best to use for firewalling.

CRAIG BOX: I'm talking more about traffic going into a pod rather than out of it, though.

ETHAN JACKSON: OK. So on ingress, yeah, you do have that guarantee, not on egress. It's actually fairly simple to enforce this with waypoints. So basically, the ztunnel knows that a particular pod that it's operating for has a waypoint sitting in front of it. And if it receives traffic from any pod other than the waypoint, it will hairpin it through the waypoint. So it will send it to the waypoint for processing and then back. So you still have that guarantee, but you're relying on the control plane to enforce that for you.

JUSTIN PETTIT: And another thing, I mean, it's not currently implemented, but if it became an issue, the HBONE standard allows metadata to be inserted. So you could imagine that, if someone were particularly paranoid, we could have the waypoint proxy put something in to indicate that the traffic has gone through it that the ztunnel that is receiving it could then check to make sure.

CRAIG BOX: All right. Let's move on to the rapidfire round. Is it more secure?

JUSTIN PETTIT: So like with most things in security, there's a trade-off. And so we think that it is no worse than the sidecar model. It does have a couple of benefits. One benefit is that the application and the sidecar are not co-located. So currently, if you have a vulnerability in either the proxy or the application, both can be attacked. So by separating them, we make it so that a vulnerability in one of them does not affect the other.

And then also, for the ztunnels, currently, they are implemented as envoys. But we run them in just L4 mode, so there is no L7 processing that's happening for policies. And so we believe, also, that the vulnerability surface area is going to be much smaller for the ztunnels than the existing sidecars.

CRAIG BOX: Is it faster?

ETHAN JACKSON: The answer is sort of. The experimental code we released is highly not optimized. And in early tests with that, we found that actually, the performance, the latency and the resource use — the latency is close to sidecar. It's acceptable. And actually, we have some early promising results that it will result in a pretty significant reduction in RAM and CPU. That's the sort-of part. The code that's out there is about the same as sidecar.

I expect the architecture, particularly for people who just want encryption, to be quite a bit faster than sidecar in the long term. And the reason for this is that we've found that the majority of the cost of Istio is in the L7 processing. It's in the HTTP processing. And in Ambient, you go from two HTTP processing steps per request to either one or zero. Now, you pay a network hop to get that one request. But at least Google's network is super fast and it's kind of rounding error doesn't really matter.

So I think the architecture, fundamentally, will be significantly faster than sidecar for the secure overlay and about the same for traffic going through waypoints. And the experimental code, I would say, is acceptable but really not optimized.

JUSTIN PETTIT: And I would say that the resource usage, I think, is going to be a pretty substantial improvement. So I think even the current version is going to be much better. Because currently, every sidecar has to be able to be configured to do the worst-case scenario for traffic that could be received, and people tend to overprovision because they don't want to underprovision. And so these resources are quite large on the application.

And so with the resource reservation, that actually prevents that node from getting fully utilized. And so with the new Ambient architecture, the ztunnels are much smaller. They require fewer resources. And then the ztunnels will be auto-scaled based on need. And so the overall resource savings should be — and I think will be — substantial compared to the sidecar model.

CRAIG BOX: That also kind of ticks off, "is it leaner". Is it cheaper?

ETHAN JACKSON: The short answer is yes. We expect it to be significantly cheaper for almost all users. And if you think about it, in sidecar basis too, you've got all of these proxies running around eating up RAM and CPU reservation that you're paying your cloud provider for that are mostly not doing anything. In Ambient Mesh, the waypoints are scaled to the actual load that's going through them. And the ztunnels aren't that expensive, because they're just doing it for processing.

The other thing to consider, and we're not quite sure how this will work out, but you just have less proxies floating around that need updates from XDS, and the telemetry captured from them, and the operational weight of the system reduces a little bit for that reason. But we're early in the system. So we're going to learn more about that as we kind of get it closer to production.

CRAIG BOX: Is it easier to upgrade?

ETHAN JACKSON: Yes is the short answer. So the upgrade of a control plane will be the same. With Istio today, if you want to upgrade your sidecar, you have to restart the application. That doesn't sound like a big deal, but if you work at a big enterprise environment with a bunch of application teams, some of whom are running stateful workloads that can't be restarted easily, and you have to go to all of them and convince them that they should restart their application, that's a big, painful lift.

In Ambient Mesh, the Envoy proxies can be upgraded without disturbing the application. It's a purely infrastructural change. And the waypoints, in particular, we plan to upgrade them by just running the new version and slowly shifting traffic over. So you don't even have to disturb any kind of currently running traffic at all. So ultimately, it's going to be a much better operational experience from upgrade and installing a new workload, and that sort of thing.

CRAIG BOX: We've talked a lot about the ways that Ambient is better. There will be a few things that don't work in the current example. Perhaps WebAssembly and extensibility is one of them? Can you talk a little bit about the things that are lost for people who choose to move to Ambient and why people might still want to stay with sidecars?

ETHAN JACKSON: For WebAssembly specifically, we have plans to implement WebAssembly. That one just got cut because we didn't have time. But that will get there before it's production-ready. That's important. What do you lose with Ambient? The security model is different. Suppose you vetted the sidecar model from a security perspective, and it's really important for you that the key is in the pod of the application.

That might be a reason to stick with sidecar. Also, sidecars fate-share with the application, so you can pay a little bit less care toward kind of maintaining the waypoints and the ztunnels and that sort of thing. You don't have risk of those failing independently of the application. So it's just a slightly different operational model.

JUSTIN PETTIT: And also, the L7 telemetry will be quite different. So currently, since we have — in side cars, since we're doing full L7, one, we can look at the traffic that's being sent, and also, we can insert information so we could do things like track how long a request takes — and so since we're generally going to be removing one of those hops, we lose that ability to do that sort of tracing.

If you want L7 visibility, you can deploy a waypoint proxy to get that, but we don't really have a good model if you want to have that end-to-end performance, or being able to get the amount of time it takes for a request from end to end. But we can, even with just the waypoint proxy, determine how long it takes the back end to handle a request, for example. We do have some ideas about how we may be able to do L7 telemetry and metrics on both sides that don't require L7 processing in the proxy, but that's still early.

CRAIG BOX: You mentioned it's still early. It's been a couple of weeks now since the Ambient announcement. What has the feedback been so far?

ETHAN JACKSON: I'd say it's been overwhelmingly positive. People are excited about the operational benefits. So our intuition that sidecars are difficult to operate has borne out with people. And the resource savings, we've gotten that feedback a lot. There's a lot of interest and curiosity particularly around the ztunnel being a shared component. That's a new thing, both from a security perspective and from a performance perspective. When we have had questions, it's mostly been around that. Most people that we've talked to think that the trade-off is worth it.

CRAIG BOX: Now that you have this feedback, what is it that needs to happen between now and the proverbial production adoption of Ambient Mesh?

ETHAN JACKSON: There's a couple of categories of work. There's just getting the feature complete and tested, and performance tested, and all that sort of stuff. Getting it kind of through the community, making sure everyone's on board with it, getting it approved, that sort of general area of work, and that's being led by John Howard, who's an Istio TOC member on the Google side.

There's some kind of significant architectural work that we're starting to think about right now, particularly around the ztunnel implementation. The core question there is, how do we evolve Envoy to be an efficient, fast, lightweight, lean ztunnel implementation? Or do we start looking at alternative implementations? We could build something in Rust. We could build something based on BPF. There's a wide open set of possibilities there.

CRAIG BOX: Could you build something on Rust and BPF that would really hit all of the boxes in terms of the hipsters?

ETHAN JACKSON: That would. That would. And use QUIC. [CHUCKLING] So anyway, us being proper hipsters, we're evaluating all of that. Those are the two big things from our perspective.

JUSTIN PETTIT: Well, one, being able to standardize HBONE outside of the Istio use case and support for non-TCP protocols, I think, are going to be two interesting additions.

ETHAN JACKSON: Yeah. QUIC.

CRAIG BOX: All right, guys. Thank you both very much for joining us today.

JUSTIN PETTIT: Yeah, thanks for hosting. It was fun.

ETHAN JACKSON: Thank you.

CRAIG BOX: You can find Justin on Twitter as justin_d_pettit with underscores. You can find Ethan on Twitter as ethanjjackson without underscores. And you can read about Ambient Mesh at istio.io.

ETHAN JACKSON: [LAUGHING] Craig, I love that you know my Twitter handle. I did not know my Twitter handle.

CRAIG BOX: I went and dug it up.

ETHAN JACKSON: I had forgotten. It's very active. [LAUGHING]

[MUSIC PLAYING]

CRAIG BOX: Thank you, as always, for listening. Please follow me on Twitter at craigbox, please follow the show on Twitter at kubernetespod, or please send us an email at kubernetespodcast@google.com. You can find our website at kubernetespodcast.com, where you will find transcripts and show notes, as well as links to subscribe. Thanks for listening, and we'll see you again soon.

[MUSIC PLAYING]

View More Episodes