Kubernetes Podcast from Google: Episode 206 - LeakSignal with Wesley Hales and Max Bruce

#206 August 21, 2023

LeakSignal with Wesley Hales and Max Bruce

Hosts: Abdel Sghiouar, Kaslin Fields

Guests are Wesley Hales and Max Bruce are co-founders of LeakSignal. LeakSignal is an American startup which is building a set of tools and products to detect and prevent data exfiltration in Service Meshes and proxies supporting Envoy and proxy-wasm.

Do you have something cool to share? Some questions? Let us know:

News of the week

Links from the post-interview chat

Envoy Mobile

Transcript

Show full transcript

ABDEL SGHIOUAR: Hi, and welcome to the Kubernetes podcast from Google. I'm your host, Abdel Sghiouar.

KASLIN FIELDS: And I'm Kaslin Fields.

[MUSIC PLAYING]

ABDEL SGHIOUAR: This week, we chatted with LeakSignal, an American startup which is building a set of tools and products to detect and prevent data exfiltration in service meshes and proxies supporting Envoy and Proxy-Wasm.

KASLIN FIELDS: But first, let's get to the news.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Dragonfly is a peer-to-peer file distribution system for cloud native architectures and is a CNCF incubating project. The most common use case of Dragonfly is hosting and distributing container images for Kubernetes clusters.

KASLIN FIELDS: Red Hat announced the preview of their Terraform provider for ROSA, or Red Hat OpenShift, on AWS. The provider allows using Terraform to provision OpenShift clusters on AWS, in addition to using the Red Hat Hybrid Cloud Console, or the ROSA CLI. The provider is available directly on the Terraform registry.

ABDEL SGHIOUAR: Microsoft announced the general availability of its telco offering, called Azure Operator Nexus. The platform was introduced at the Mobile World Conference early this year. It supports virtualization and containerisation for network functions, also known as VNS, and integrates compute, network, and storage in a self-servicing manner via the Azure Portal, the CLI, and the SDK.

KASLIN FIELDS: HashiCorp announced a change to their licenses. The company is adopting BSL, or Business Source License, on all future releases of HashiCorp products. HashiCorp APIs, SDKs, and almost all other libraries will remain MPL 2.0.

ABDEL SGHIOUAR: The open Source Security Foundation introduced S2C2F, the Secure Supply Chain Consumption Framework. This new framework is a set of requirements and tools for any organization looking to adopt software supply chain security into their architecture.

KASLIN FIELDS: The last quarter of 2023 will be busy with Kubernetes and cloud-native events. There's a lot to cover, and we'll leave some more details in the show notes. But here's a rundown of some of the major ones.

ABDEL SGHIOUAR: WasmCon on September 6 and 7 in Bellevue, Washington.

KASLIN FIELDS: The Open Source Summit EU on September 19th through 21st in Bilbao, Spain.

ABDEL SGHIOUAR: gRPC Conf on September 20 in Sunnyvale, California.

KASLIN FIELDS: Virtual IstioCon on September 25th and 26th.

ABDEL SGHIOUAR: KubeCon, CloudNativeCon, and the Open Source Summit China on September 26th through 28th in Shanghai, China.

KASLIN FIELDS: PromCon EU on September 28th and 29th in Berlin, Germany.

ABDEL SGHIOUAR: KubeCon and CloudNativeCon North America on November 6th through 9th in Chicago, Illinois.

KASLIN FIELDS: KubeDay India on December 8th in Bangalore.

ABDEL SGHIOUAR: And finally, KubeDay Singapore on December 12th.

KASLIN FIELDS: Fermyon added SQL database support to their wasmCloud platform. The company announced beta availability of the SQLite database engine, Fermyon supports developing and running serverless apps in the Wasm runtime using the Spin Framework. They also added support for custom domains.

ABDEL SGHIOUAR: Exposed Kubernetes clusters are everywhere. A new article from the security company, Aqua Sec, resumed a three-month investigation over which the team and the company found around 350 exposed Kubernetes clusters. These are clusters that leave HTTPS ports 443 and 6443 and sometimes even HTTP ports exposed to the internet, sometimes also leaving some EPA endpoints, which could allow hackers to collect important information about the internal functioning of the clusters.

The team also ran some honeypot clusters and collected data about the kind of attacks they are exposed to, which seems to be for the most crypto miners. Things like this can be avoided by training your teams and setting your Kubernetes API endpoints to only be accessible from secured perimeter. Also, using RBAC and admission control policies can help prevent lateral movements when malicious attackers find their way in.

KASLIN FIELDS: And that's the news.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Hello, everyone, and welcome to the Kubernetes podcast by Google. I am your host, Abdel. And today, we are talking with Wesley Hales and Max Bruce. So Wesley Hales and Max Bruce are the co-founders of LeakSignal. LeakSignal. is a cloud native solution for preventing data exfiltration in webapps, workloads, and API through microsegmentation. We will dive into the details of what that means in this episode. But before we get started, welcome to the show, Wesley and Max.

WESLEY HALES: Thank you, great to be here, honored to be here, by the way.

ABDEL SGHIOUAR: Awesome, thank you for being here. I had a chat with Wesley a long time ago, when LeakSignal, or whatever is LeakSignal today, was still just an idea kind of bubbling up. And I found it quite interesting. And I think I was pretty intrigued when we had that chat about the term, micro WAFs because I think that was the first time actually I hear about it. I guess we'll dive into details of what micro WAFs means.

But before we get going, I think we need to introduce the guests, as we usually do. So let's get going, starting with you, Wesley.

WESLEY HALES: Great, yeah. So, Wesley Hales, CEO and co-founder in collaboration with Max here, who's on the line. And also, Isaac Roth is one of our co-founders. And he is the creator of OpenShift, so great team here. And I myself have been a long-time hacker, software developer, mainly focused on cybersecurity over the past decade.

During that time, I lead engineering in a few different places, shaped security, which we kind of defined the bot protection market back in 2015, roughly. And I've been with many other security vendors, CDN, WAF-related Runtime Application Security Protection, or RASP, if anyone's familiar with that, an API security, so really taking a lot of the software development history that I had and kind of converting that over to cybersecurity. So, yeah, that's me.

ABDEL SGHIOUAR: Nice to have you. I think it's worth mentioning, also, that you have quite a long experience, including being at Google for a while. You worked on Apigee, I think when Apigee was about-- when we just acquired it, I guess.

WESLEY HALES: That's right, yeah. So APIs were the hotness. That was really my first Palo Alto startup experience, was joining the Apigee folks. And, yeah, that was really before API security was a thing. But APIs existed a long time before what we understand them as today. But, yeah, that was an very enlightening experience.

ABDEL SGHIOUAR: Yeah. I'm also coming from the XMLR PC old school API [CHUCKLES] world.

WESLEY HALES: Oh, yeah. Throw some soap in there as well.

ABDEL SGHIOUAR: Yeah. And we have Max. Can you introduce yourself, Max?

MAX BRUCE: Yeah. My name is Max Bruce. I'm CTO at LeakSignal. I've been a software engineer focusing on layer 7 cybersecurity slash app sec for the last five years or so. My most notable project that I've worked on is Keyhouse key management system at ByteDance it uses SPIFFE IDs for service authorization. I love engineering.

ABDEL SGHIOUAR: Awesome. Well, great to have you folks. So I think let's get going by, what is LeakSignal? What are you guys trying to solve?

WESLEY HALES: We started this idea about a year and a half ago. And it was really about the future of architecture and infrastructure and, really, providing cybersecurity in that environment without being a black boxed kind of vendor that you have to pay millions of a year for. So we were looking at it from multiple angles, the open source angle, the cloud native, kind of, journey and what does that mean to platform engineering, if that's what you want to call it, DevOps, et cetera.

But overall, coming from the background that we have, we knew that legacy security solutions, again, were very closed off, not open, and were disconnected from microservices and kind of where the direction that everyone seems to be heading.

So in that, for the past 25 years, roughly, cybersecurity solutions have been looking at what's coming in the front door right to determine if this is a good or bad user. Is it a bad IP address? Is it a bad user agent? Did they type the right things on the keyboard? So that, i guess, solution has kind of come and has raised the bar for attackers. But attackers are easily bypassing what exists today. So you combine what exists in the new architecture, the mesh, and you take what we had from a legacy standpoint and you look at that and say, OK, what can we do better?

And so LeakSignal is really about analyzing layer 4 through 7 traffic in real time in line and doing that also on the request and the response, so not just looking at request signals, like we've been doing for the past 25 years, but also looking at the full response body, no matter what protocol it's traveling over, gRPC, web sockets, HTTP, and kind of bringing that together in a webassembly. Like, webassembly kind of empowered this, so I'll stop right there. That was kind of the high-level overview.

ABDEL SGHIOUAR: I think you mentioned, quite a lot of things. And I'd like to kind of go into details a little bit. So one thing interesting you mentioned is the last 25 years security for traffic have been mostly focused on what we could call the edge. So let's secure the front door, the API gateway, whatever, that load balancer where traffic comes from outside.

But there is more and more interest in security around microservices themselves, with things like Zero Trust with mTLS, what you folks are doing, and also things like SPIFFE that Max have mentioned, like authentication between microservices without having to implement code in the app itself. The way LeakSignal implements things, you folks are basically mostly focusing on Envoy, right?

WESLEY HALES: Well we've expanded since then. So we support Nginx and Lambda as well.

ABDEL SGHIOUAR: Lambda as in AWS Lambda?

WESLEY HALES: That's right.

MAX BRUCE: Yes.

ABDEL SGHIOUAR: OK. So basically you support whatever supports Proxy-Wasm.

WESLEY HALES: Correct.

MAX BRUCE: And more.

ABDEL SGHIOUAR: And more, OK.

MAX BRUCE: So we support whatever supports Proxy-Wasm, then but then we also have a native Nginx module. And then Lambda doesn't run Proxy-Wasm at all.

WESLEY HALES: So we're shamming until the webassembly runtimes are kind of pervasive.

ABDEL SGHIOUAR: Yeah, so that's because, as we have discussed in a previous episode, webassembly is still a bunch of undefined things that is still being worked on. So let's go back one second. Can you explain to me what is Proxy-Wasm?

MAX BRUCE: Proxy-Wasm, it's a recent standard for deploying Wasm modules in reverse proxies, such as Envoy and Nginx, most notably Envoy, and also the capabilities of a native plugin or module to reverse proxies, while being generic across different reverse proxies. And it also provides a safe sandbox. It's running in V8 engine or some other Wasm engine. And so some kind of crash in your module won't bring down your entire reverse proxy and, therefore, your entire service.

It's been a bit quiet on the community stage the last few years. It's kind of at a stable state as it is. While it doesn't get a lot of visibility, we do see a lot of real potential for it as kind of like a runtime environment for distributed middleware. Notably, everybody's already running this. With Istio and OpenShift Service Mesh, you've already got Proxy-Wasm modules running and collecting telemetry in your cluster.

ABDEL SGHIOUAR: Yeah. Istio have moved, actually, to Proxy-Wasm from implementing Wasm directly in Istio itself. But I was looking at the GitHub repository today, and I think, is the Nginx implementation fully defined, or is it also still a work in progress?

MAX BRUCE: I might get the name wrong, but I believe Apisx, A-P-I-S-X, is working on an Nginx Proxy-Wasm implementation. We considered using it, and we do have some integration with it. However, it, last I checked, doesn't support gRPC outbound streams. So we have held off on having first-class support for it at the moment. But it does exist.

ABDEL SGHIOUAR: OK. So is this Proxy-Wasm something that developers in general will have to worry about in the next, like couple of years? Or is it more like a framework where you would be able to use a plugin like yours and just run it inside Proxy-Wasm without having to worry that it runs there? Like, how do you see this thing progressing?

MAX BRUCE: I see it a lot as, like, a back end, like, a back haul for implementing higher-level services so that your average engineer working in the mesh environment won't need to deal or even think or know about Proxy-Wasm, However, it'll be the people implementing authorization plugins, custom telemetry, stuff like LeakSignal is going to be where you're going to get that interface with Proxy-Wasm.

ABDEL SGHIOUAR: Yeah. The reason I asked that is because when I looked at the GitHub repository, I saw that there was, well, you have the specs and the implementation for the different proxies, including Envoy, and Istio, and whatever. But you also have an SDK for, like, Go on C++. And that's why I was wondering, is this something that an average developer would have to use at some point? But I guess it depends. If you're an end user, probably not. But if you are working on company building plugins, then probably yes.

MAX BRUCE: Yeah.

WESLEY HALES: Yeah, so we actually-- long story there, but at a very high level, Proxy-Wasm gives us kind of a fault tolerant or virtual machine to plug into that Max was saying is kind of like a distributed middleware that sits in front of the services. And it allows you to deploy traffic analysis, things like LeakSignal, pretty easily. Because you just kind of plug it in-- and that's the whole idea-- without having to bring anything extra. So it's a very confined kind of computing environment, but at the same time, provides a lot of power and possibilities as well, yeah.

ABDEL SGHIOUAR: Cool. So there is one thing I'm curious about, considering your position and you're exposed quite a lot to customers using Envoy. A lot of times, when we bring up Envoy, it's usually brought up in the context of Istio, as a sidecar. And even Istio now is sort of drop in Envoy for the ambient mesh new architecture, kind of, like, kind of.

I think my question is, do you see Envoy being used as just a standalone proxy outside of Istio? Because that's how it was designed by Lyft in the first place, right?

MAX BRUCE: Yeah. So as we have kind of, the the startup portion of this, taken this product to market and validated it with design partners, it's been interesting because no infrastructure is the same. We talked to some organizations that have 3,000 to 4,000 or even 15,000 microservices managed by OpenShift or Istio. And then you talk to some that say, hey, we've got a microservices environment, but we've also got all these legacy things going on.

So our approach is really, if this is valuable enough, then the tolerance to put an Envoy proxy in front of these systems, they're accepting of that, and they will do that. To get the value out of this type of traffic analysis, you really have to put either Envoy or some other mechanism, some reverse proxy, in front of those existing systems, yeah.

ABDEL SGHIOUAR: Yeah, you need something in line, basically.

MAX BRUCE: Mm-hmm. Yeah.

And it seems to me the way you're describing it is that your Wasm plugin plus your Cloud Control which we're going to talk about a little bit, would even work with a legacy system. So it doesn't really have to be microservices. It could be a Monolith app, and you just stick an Envoy in front of it.

WESLEY HALES: Yeah. And going back to the past 10, 15 years of cybersecurity, I come from a time where we were taking one new network appliances and asking customers to put our appliance in line with live traffic. So you kind of take this into the new era, and you've got-- it's kind of the same concept. You're putting something in front of that system. But here, it's getting much more elegant, and it's getting less friction is involved with these newer kind of solutions.

ABDEL SGHIOUAR: Yeah. It's quite interesting actually to see all the debates and discussions around Envoy. As I said, it's the context of Service Mesh. But I know at least one customer-- I don't think I know I can say the name-- but I know at least one company, one very large company, using, actually, Envoy as just, like, a layer 7 proxy in front of all their north-south and east-west traffic. So it is a very powerful proxy because it's pretty lightweight.

WESLEY HALES: Yeah. We're dealing with one particular financial organization that has a million and a half requests per second on the Ingress and scaling 3,000 to 4,000 Envoy instances there.

ABDEL SGHIOUAR: Wow. As a proxy layer?

WESLEY HALES: As a proxy layer, yeah.

ABDEL SGHIOUAR: Wow, that's quite cool. That's interesting. Yeah, I think also because, even at Google, we are moving actually away from-- moving away, that's a big term-- but we introduced a new set of load balancers which are based on Envoy. So GCLB, which is Google Cloud Load Balancer, was always based on GFE, which is our own proxy, Google Front End. But we introduce a new set of load balancers which are based on Envoy itself.

WESLEY HALES: Yeah, I'd saw also that you guys took Nginx out of, kind of, the infrastructure layer as well and replaced that with Envoy for, like, all of the Google services, all the Google Cloud services as well.

ABDEL SGHIOUAR: It's in progress, but, yes, that's the direction in which they're going, yeah. I think the main attraction there is being able to offer all the capabilities that Envoy offer, in a native way. Because it has a very extensive API. And it's, quite frankly, quite intimidating if you are not really used to configure Envoy with YAML files. But once you understand how it works, It's really, really capable.

WESLEY HALES: Yeah, totally, totally. And that's our approach too. If it's kind tied into the DNA of what cloud native is or what the future of monitoring an infrastructure is, then it kind of makes sense.

ABDEL SGHIOUAR: Yeah. Awesome So let's then go back a little bit. Can you explain to me how does LeakSignal systems work?

MAX BRUCE: So at the highest level, we've got two main components that are going to get deployed. One is what we call our proxy, and that's what we've been talking about. That's Proxy-Wasm, a direct native Nginx integration, Lambda integration, et cetera.

But then the other main module there is what we call Command, Release Signal Command. It's an asynchronous-to-live traffic service that is in charge of collecting telemetry and then analyzing it and making, kind of like, higher-level policy decisions. So it's going to do your distributed rate limiting. It's going to do your complex traffic analysis, like looking for traces in the network, and doing a different alerting or blocking based on that.

And then on the proxy side is where we're going to do things like microsegmentation, which is, like, hey, does this user have access to this resource? It's looking at that per request kind of level and then uploading that telemetry to Command, where it can then be aggregated and analyzed.

ABDEL SGHIOUAR: Yep. And Command comes in the form of both cloud-hosted version, right?

MAX BRUCE: Yeah, we have both a cloud-hosted version and an on-prem deployment.

ABDEL SGHIOUAR: OK, so customers could potentially just deploy the control plane, if we dare call it the control plane. But they can deploy Command on-prem if they don't want to do Cloud, essentially.

MAX BRUCE: Yes.

ABDEL SGHIOUAR: Cool.

WESLEY HALES: Yeah. And we stripped that down too. I mean, we have this Command environment that manages the policies and does all these great things with the telemetry. But if you kind of rewind a bit, the module itself, the Proxy-Wasm module, or the Nginx module, it does all the processing in-line. So we're not shipping out sensitive data for it to be further analyzed somewhere else. We're classifying.

So really, at the core of what LeakSignal does is we're classifying sensitive data on the request and the response. So if it's a 200-kilobyte response body, we're going to be classifying that in real time, collecting the metrics around that, like, what matters occurred? Was there social security numbers here? Was their customer data? And then taking that telemetry based on that and sending it to Command, or Prometheus if you don't want to use Command. So we're truly cloud native. You can go straight from the module to Prometheus, even, if you want to.

ABDEL SGHIOUAR: Yeah, I was about to mention that you have also support for Prometheus in the open telemetry data spec, right? So basically, you don't have to use Command. You can just have a self-hosted Prometheus in Grafana and you get all these metrics on-prem.

WESLEY HALES: Yeah. Yeah. It's all available today on GitHub. I mean, it can be used by anyone for free today.

ABDEL SGHIOUAR: Cool. I think I'd like to stop here and rewind a little bit, because we talked about a lot. Your plugin can do a lot of things. It can do-- it can do to main things, analysis and policy enforcement, right? So traffic analysis in-line, that's pretty clear. You basically can out requests and responses as they are going from one microservice to another and collecting telemetry about what these microservices are, what the traffic is.

But the term, leak, has a very special connotation, which is, essentially, preventing leaks or preventing that exfiltration. So can we maybe give very specific examples of what are you trying to avoid leaking using your plugin?

WESLEY HALES: If you start with how do these data lakes occur, I mean, if you follow any channel, Twitter, Reddit, whatever, you're going to see data leaks are occurring on almost a weekly basis. And it's usually through the form of API exploits, and misconfigurations, and those types of things leak data. And it could just be an error in the production logic.

But at the end of all this, the only way to figure out if you are leaking data is by classifying sensitive data on its way out, either east-west or north-south. So that was really the thing we wanted to master first and do that very performantly. Because initial prototypes of scanning full-response bodies in real time in Envoy proxy in Proxy-Wasm, there was, like, 100-millisecond latency, 200 milli. And that was, really, before Max was able to come on board and rewrite it. But now, we are under 1 millisecond in terms of latency of doing that processing in line.

So once you understand the outflows of sensitive data from an individual service, even if you have thousands of services, you can understand what normal is. What is an API request? Does it only return a response body with one customer record in it every time? And when it returns one with 50 in it, then you know there's something wrong. And that's really the type of visibility we're trying to bring with that, in terms of leak detection.

ABDEL SGHIOUAR: Yeah. So that's actually one of the interesting things about the product, is, as you describe, it's, like, you're looking at the traffic building the baseline, sort of figuring out what normal looks like. And then when something doesn't look normal, then you say, OK, well, this is a problem, right?

WESLEY HALES: That's right.

ABDEL SGHIOUAR: But the cool thing is they are doing it in layer 7, so you're able to actually look at the actual HTTP requests and extract things like social security numbers, critical numbers, things like that, right?

WESLEY HALES: That's right, yeah. So once you understand that and you understand what normal is, then from a forensics or investigation standpoint, we can audit everything in the response that's going out. And so it's full attestation, if you will, from a cybersecurity standpoint. What data was accessed by, for example, an individual auth token. So if you take a JWT token, for example, and you correlate that with the CNCF sensitive, then we have a token-to-sensitive-data ratio.

And so that allows us to understand, did this individual token, that could've have been phished along the way and someone's just accessing an API, did this token exploit an API and get more data than it should have? Or did someone just force a token and it was brokered as the connection entered the infrastructure? And do we have a bad architecture and a single token as scoped to access a ton of different customer data?

ABDEL SGHIOUAR: Yeah, that's actually quite interesting. I've never thought about this token-to-data ratio. That's an interesting approach of looking at what's a normal identity. Because that's what a token is, what normal identity typically does, in terms of moving data around. And then you can say, OK, well, now I see that this particular token, that have been doing the same thing for a while, now it's just extracting more data than normal. Then that potentially could be a problem. right?

WESLEY HALES: That's right, yeah. So it's really we're providing a new type of visibility after MFA occurs. So after someone logs in, they go through all the checks, all they have is a token, all that request has as a token. So it's really-- and every environment's different. No one really has visibility into that space. So that's really what we're trying to bubble up to our customers and users.

ABDEL SGHIOUAR: Cool. So that's the Proxy-Wasm plugin side of things. So that's the data leak prevention. And then you have the telemetry and visibility part, which is what Command does, or potentially Prometheus, or whatever, right?

MAX BRUCE: Yeah.

ABDEL SGHIOUAR: And so that's where you collect metrics about what's normal and what's not normal.

WESLEY HALES: Yeah. So another big aspect to what we do is microsegmentation as well. But we're doing it based on sensitive data. Historically, microsegmentation's been layered 3 through 4, certificate management, that type of thing. But when you apply this Wasm module, which you can do with a single line of YAML, you apply it kubctl apply. And then, from there we, can automatically microsegment the network based on the sensitive data that the services handle.

So if you have 3,000, 4,000 services and you don't really have visibility into that environment, we kind of do that within minutes. So you deploy the module, we automatically group the services together based on what data they're touching. And then that gives you, like, instant PCI compliance, from a data flow perspective.

ABDEL SGHIOUAR: Cool. Yeah, that's also quite interesting. Then the third aspect, I think, is the policy part, right? So can we talk a little bit about that? Like, what does the policy part does?

MAX BRUCE: So our policy has to kind of main rule engines inside of it. on the enforcement side. And then we've also got a separate what we call a matching engine on the detection side. So our matching engine that-- and all of this is defined in our policy, as a whole.

On the matching engine side, you've got, like, hey, I want to look for Social Security numbers, or credit card numbers, user IDs, emails, whatever. Whatever is sensitive in your environment for your given compliance that you're looking for, that's what you're matching rules are there to find.

And then, on the policy enforcement side, we've got two different sides of this. One is what we call SBAC, or Service-Based Access Control. It covers a wide range of different filters and groupings that you can do. And that all gets acted live in-line in that proxy. So it gets access to the data, sensitive data, that might not get transmitted up to Command. And that allows you to do that in-line blocking, redaction, et cetera.

But then we have what we call our general rule system, which is what's going to operate in Command. And this is going to do things like alerting and blocking on metadata that is aggregated between request and response. So you can say like, hey, I'm going to look at all the sensitive data access for all given tokens, grouped by token, over the last hour. And then, if I see any tokens with accessing more than X amount, then alert, or rate limit. And then those rules get evaluated continuously. And then any kind of rate limits or alerts that they do get propagated out.

ABDEL SGHIOUAR: Yeah. So we are able to do, basically, on-the-spot controls by configuring a rule that says, if a token is extracting more data than it should be, just block it, or rate limit it, or alert somebody, or something like that, right?

MAX BRUCE: Yeah. And part of our capabilities on it is what we call Auto Rules. And so what that system does is it looks at the historical data over time for some groupings. For example, we frequently have it by default on a mode that's looking at sensitive data access by service over time. So we're able to say, like, hey, the service is accessing this endpoint on this other service. And for every single request, it has this particular amount of sensitive information, like, one or two. It's getting back, like, one or two other types of specific kinds of sensitive information.

And so we can look at that, we can create a statistical model around that, and say, like, hey, if we all of a sudden see, like, 20 pieces of sensitive information, that might be worth telling somebody about or, depending on the configuration, even proactively blocking.

ABDEL SGHIOUAR: Yeah, that's actually quite interesting. I think that we introduce such a capability to our WAF, which is called Cloud Armor, so the Google Cloud WAF, which is attached to Cloud Load Balancer, which has those sorts of auto rules. So it can work in recommendation mode so it can just recommend, hey, please apply this rule. Or it can just actually apply the rule for you.

The last time there was an article about, I think less than a year ago, where we blocked-- one of our customers was using that, and they were able to block one of the biggest DDOS attacks. It was an incredible number of requests, like, millions of requests. And they were not using it in Auto mode. They were using it in Recommendation mode. But they were able to get a notification to say, like, hey, please apply this, we recommend you to apply this.

MAX BRUCE: Yeah, click this rule and solve this problem, exactly.

ABDEL SGHIOUAR: Yeah. So that kind of proactive recommendation is actually quite interesting, right?

WESLEY HALES: Yeah. And it really is. And that's where the industry's headed. If you look at Wiz Security, for example, they come in, scan everything in your environment, and say, hey, this is kind of your queue for what you need to do. We're kind of that same thing, but on the production logic, kind of, platform engineering side. Because everyone's shifted left for the past five years and scanning every possible thing you could-- which is great. It's needed.

But platform engineering really has not seen that level of love, in terms of what does security mean? When code is pushed to production, are we going to let our old Akamai WAF, or whatever WAF that might be, protect all these services? Or are we going to do something else? And so that's really our stance on trying to bring this more to the right side of the equation.

ABDEL SGHIOUAR: Yeah. I do have a really big problem with the word, shift left. Because whenever I go around and talk to people about it, the immediate answer I get is developer thinking, well, suddenly we have to do more than we can or than we should. It makes it sound like you're just making developers responsible for everything, including security.

WESLEY HALES: I was a developer for a long time. And developers can be security minded. But bringing in vendor tools and forcing them down a certain workflow and doing certain extra tasks, that has to be hard. I have not been in one of those environments lately. So I would love to hear what that is actually like on the front lines, enforcing it.

ABDEL SGHIOUAR: I think that the term is misused, in the sense that what people mean by shift left is making people aware of what they're doing, so bringing that visibility, whatever that visibility means. Maybe I am a developer, I am using IDE. And then in the IDE, can see real-time scanning of packages that I am using or stuff like that.

But I think to a lot of people, it just means, now, you are a developer and you have to understand how data exfiltration works. And you have to work toward preventing that, instead of just focusing on creating value for the business and having something else do that for you. right?

WESLEY HALES: Right. That's right, yeah.

ABDEL SGHIOUAR: Cool. So let's move forward, then. We talked a little bit-- so we talked extensively about what LeakSignal does. So you folks support Envoy, Nginx, via a native module, AWS Lambda. And I see that you added to the doc we have shared earlier, EBPF. Do you support EBPF?

MAX BRUCE: So it's an interesting story about EBPF. We do technically support it, but we don't have first-class support for it. We did a prototype implementation. And what we found is that existing EBPF vendors, they're doing all of their scanning in-line in the kernel. Which is great for performance, but it limits you because you're stuck to doing Turing complete, like, guaranteed termination scanning. You can't run a Regex engine in EBPF.

And so for something that's doing a lot more complex scanning like ours, even if it's high performance, has to be moved into the user space. And so in order to support that, you end up having to do a lot of work around handling, like, layer 4 packets. And we kind of found that it's like we were trying to fit a square peg into a round hole. It's just, this kind of complex layer 7 logic doesn't really belong in the kernel. It really belongs in the layer before you go to the kernel.

Now, if we have a customer that really needs to have that at the EBPF layer for, say, like supporting some legacy technology, we do have it.

ABDEL SGHIOUAR: Yeah, EBPF is this thing that was quite-- so every single year we go to KubeCon, there is always one of two technologies that is the thing of that version of KubeCon. So KubeCon Valencia 2020, '22, Europe. So EBPF was huge. Everybody was talking about it. Because everybody was saying this is the solution to the sidecars.

And I always had that skeptical view of EBPF. Because I sort of understand how it works, kind of at low level. And I'm like, hmm, doing, as you said, kernel-level stuff, low-level, low-performance stuff, that's fine. Doing complex HTTP traffic analysis in EBPF, I really have questions about how that would work.

MAX BRUCE: Yeah.

WESLEY HALES Yeah, there was a big push for EBPF and the technologies around it. It says it was observability, networking, and security. But the security part of this, I've really questioned. . And it really comes down to blocking, like, IP-level port activity. I mean, it's really good at doing that. But from our standpoint, like Max said, square peg, round hole. And it's not built to kind of do this sensitive data classification. That's for sure.

ABDEL SGHIOUAR: Yeah. Yeah. I think only time will say how this would progress. I find the approach that H2 took with the ambient mesh quite interesting, which is essentially we're not going to do EBPF, right?

WESLEY HALES: Yeah. So the whole sidecar versus waypoint proxy in the new architecture with Istio. I'm all for it. I'm not saying anything negative about it. But when you look at the enterprise, when you look at Fortune 500 companies and you see what they have deployed, it is all sidecar technology, like, architectures. And it's new. And I understand that. It's going to take some time to adopt.

But with a waypoint proxy, you're still dealing with Envoy layer 7. I mean, you still can do the whole-- what we're doing is deploying a Proxy-Wasm module into that environment. It's just, the proxy's at a different location. It's not on every service.

ABDEL SGHIOUAR: Yes. Yeah. I mean, the whole point of the ambient mesh is reduce the footprint of the sidecars. So instead of having one sidecar per pod, you would have one sidecar which is shared across an entire namespace and is horizontally scalable. So if you have more traffic, you can just add more sidecars.

But I think that the interesting thing, what the implementation that issue took is the z tunnel, essentially. It's like building a new Rust-based tunnel, instead of just doing EBPF, like everybody else is doing, pretty much, for the mTLS, the low-level stuff, the layer 4 stuff, right?

WESLEY HALES: Right, yeah. So I'll be interested to see how that progresses as adoption.

ABDEL SGHIOUAR: Yeah. I'm also looking forward. I mean, it just became available in 1.18, but it's still in experimental. So I've been playing with it. It's pretty cool. It works the same way it worked before.

But I think I have a question that would probably lead us to the end of this, is have you folks been looking at the gateway API in Kubernetes, right? Have you looked at that?

MAX BRUCE: Yeah, we've taken a good look at that.

ABDEL SGHIOUAR: So I don't know if you're aware of this, but Istio is eventually going to move away from their own custom-built CRDs, toward using the gateway API for traffic management.

MAX BRUCE: Yep.

ABDEL SGHIOUAR: I don't know if you have any thoughts about that.

MAX BRUCE: I've got it deployed in my home cluster, the new Kubernetes gateway APIs. I don't think it has too much direct impact on us, though.

ABDEL SGHIOUAR: OK. So the future will look like traffic management in Istio, instead of using the destination rules, and virtual services, and all the CRDs. They will just use the gateway API objects, like the HTTP route, CP route, et cetera route. Because they can do the same traffic management. So I think we'll see how the future goes, right?

MAX BRUCE: Yeah. Part of our approach on mapping out services, figuring out routing all that, is we're not looking at the control plane, which is I think why that doesn't impact us.

ABDEL SGHIOUAR: Oh, yeah.

MAX BRUCE: And so what we're doing is we're just looking at the actual communication, seeing where the traffic is actually going, not where it's configured to go.

ABDEL SGHIOUAR: Yeah, that's what I found interesting about LeakSignal, is that you don't actually rely on the Istio control plane, which makes it work across anything that has, pretty much, a proxy, right?

MAX BRUCE: Exactly.

ABDEL SGHIOUAR: So it would also work with legacy systems, which is quite interesting. All right, well, that concludes the questions I had for you folks. I learned quite a lot, got updated about how the project is going.

WESLEY HALES: Yeah. Yeah, that was great. And thanks for having us again. I appreciate it.

ABDEL SGHIOUAR: Thank you very much, folks. We'll have all the links to the conversation in the show notes. And thank you very much.

[MUSIC PLAYING]

KASLIN FIELDS: Thank you so much, Abdel, for that conversation with LeakSignal. I didn't know anything about this company coming into this. [LAUGHS] So-- I definitely learned some things here. The main thing I think I got out of that is Proxy-Wasm, Proxy-Wasm, Proxy-Wasm. [LAUGHS]

ABDEL SGHIOUAR: Yeah, we spoke quite a lot about Proxy-Wasm. So what happened is Wesley have reached out to me. I think the beginning of COVID. Because I did a talk with the Istio community. And he reached out and said, hey, let's have a chat. So we jumped on a call, we had a chat, and he explained to me what they were trying to build, which is essentially the first idea of LeakSignal, which is-- as we discussed in the interview, it's a plugin plus a cloud dashboard for detecting data that is being siphoned out of your platform. And it supports anywhere where Envoy runs, or whatever proxy supports Proxy-Wasm, which is why we spent a lot of time talking about it.

KASLIN FIELDS: Which I think their name is pretty descriptive.

ABDEL SGHIOUAR: Yeah.

KASLIN FIELDS: They're trying to provide you signals about leaks that are happening. [LAUGHS]

ABDEL SGHIOUAR: Yeah, I find the name kind of interesting. And I find it genius.

KASLIN FIELDS: It's pretty good. [LAUGHS]

ABDEL SGHIOUAR: Yeah. Yeah. And it's actually interesting in the sense that their platform doesn't require you changing anything to your code. You just drop this thing, you drop Envoy, and then you add their plugin, which is Wasm plugin. And that's it, that's all you have to do.

KASLIN FIELDS: Which is a common thing that we say about service meshes too. And it's something where you get additional functionality where you don't have to change the code.

ABDEL SGHIOUAR: Yeah.

KASLIN FIELDS: And LeakSignal works with service meshes as well.

ABDEL SGHIOUAR: Yeah. I think, in my opinion, that's the beauty of this whole service mesh Envoy thing, which is this inline processing you're able to do on request as they're coming in and out of your service. And Envoy supported filters for a very long time. And people have been using Lua Wasm, and using Lua to write this Envoy. But I think what these guys are doing, they're actually giving you those plugins so you don't really have to worry too much about it. And so this Proxy-Wasm is just offering a whole new set of capabilities.

KASLIN FIELDS: And I also found the conversation about EBPF there at the end very interesting.

ABDEL SGHIOUAR: Oh, yeah.

KASLIN FIELDS: It's nice to see an use case where EBPF is not a good fit. [LAUGHS] That's always important for any new technology--

ABDEL SGHIOUAR: Yeah, yeah, yeah.

KASLIN FIELDS: --the places where it's not great. [LAUGHS]

ABDEL SGHIOUAR: Yeah. Yeah, yeah. I think for the kind of processing they do, doing it in EBPF would be almost impossible, right?

KASLIN FIELDS: Yeah.

ABDEL SGHIOUAR: Which is why I find it very interesting when some companies claim that they're going to solve all the sidecar problems with EBPF. And I'm like, hmm, I'm wondering how are you going to do this?

[LAUGHTER]

KASLIN FIELDS: Some things that don't quite fit here. [LAUGHS]

ABDEL SGHIOUAR: Yeah. How are you going to do layer 7 authorization with SPIFFE, in the kernel? Please show me. [CHUCKLES]

KASLIN FIELDS: Yeah. And this conversation also hit on a lot of new and trending technologies, I feel like, a lot of talk about Wasm. We mentioned EBPF. We're talking about all of these service mesh components, like Envoy, proxies, all sorts of things here.

ABDEL SGHIOUAR: Yeah. And, I mean, to be clear, LeakSignal does not require Service Mesh. It just works with Envoy. So if you have Envoy, even if you don't have the Mesh, as long as you have Envoy. We talked in the interview about a company that I couldn't mention the name. But there are a lot of companies using Envoy like a layer 7 proxy for their ingress traffic, right? And Envoy, you could run it on virtual machines. You can run it on physical servers. You can even run it on-- I don't know if you know this, but they have Envoy Mobile. You can actually run the proxy on the-- yeah. Yeah, yeah, there are some companies using that.

[LAUGHTER]

KASLIN FIELDS: I made a face.

ABDEL SGHIOUAR: Yeah, yeah, she made a very--

KASLIN FIELDS: I was not expecting that. [LAUGHS]

ABDEL SGHIOUAR: Because actually, there is Envoy Mobile. I think it's called Envoy Edge. I don't remember the name.

KASLIN FIELDS: OK, well, all right, that makes sense.

ABDEL SGHIOUAR: So you can do a lot of interesting things with it. I was looking into what's possible. Imagine you can do experiments on the phone of the user. So you can actually send the configuration to the Envoy proxy on the user phone to enable certain features in the app. So you can remotely change the behavior of the app.

KASLIN FIELDS: I guess a network's a network, huh?

[LAUGHTER]

ABDEL SGHIOUAR: Yeah, exactly. Exactly. So I think that a lot of people made the same face you made when Envoy announced Envoy Mobile for the first time. They were like, what? And, yeah, all the time, I talk to people using it in super interesting ways.

KASLIN FIELDS: Yeah, I would love to hear some use cases on that.

ABDEL SGHIOUAR: Yeah.

KASLIN FIELDS: I got to look up some user studies there. [LAUGHS]

ABDEL SGHIOUAR: Yeah, we'll add some links to the show notes. But imagine things like-- imagine being able to actually monitor traffic on a corp device, and you can prevent, basically, attacks and security problems on a person's corporate device, corporate phone.

KASLIN FIELDS: I remember the days when we weren't that worried about viruses and malware on phones because they were too new. [LAUGHS]

ABDEL SGHIOUAR: Yeah. Those days are over, I guess.

KASLIN FIELDS: We're past that point.

ABDEL SGHIOUAR: Exactly, exactly. So, yeah, it was an interesting conversation.

KASLIN FIELDS: Yeah. I definitely learned a lot. And I feel like I learned a lot about a lot of different technologies that I'm trying to learn all at the same time because they're all new and trending.

ABDEL SGHIOUAR: Yeah.

KASLIN FIELDS: A lot of interesting use cases here.

ABDEL SGHIOUAR: And this space is moving so fast, it's actually kind of difficult to keep up with.

KASLIN FIELDS: Really? I feel like Wasm is still so new. A lot of people are very excited about it. But I'm not-- I haven't been hearing a whole lot of real use cases. But this one was one.

ABDEL SGHIOUAR: Yeah.

KASLIN FIELDS: The more use cases I hear, the more I can learn about it.

ABDEL SGHIOUAR: Yeah. I mean, there is a dedicated conference for it now. So--

KASLIN FIELDS: Yeah, that's true.

ABDEL SGHIOUAR: --I guess that that's the signal.

KASLIN FIELDS: You should go there.

ABDEL SGHIOUAR: Whenever the CNCF said we're going to do something con about this, you know it's big.

KASLIN FIELDS: Yeah. All right, well thank you very much, Abdel. And we'll catch you all next time.

ABDEL SGHIOUAR: Thank you.

[MUSIC PLAYING]

KASLIN FIELDS: That brings us to the end of another episode. If you enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on social media, @KubernetesPod, or reach us by email at <kubernetespodcast@google.com>. You can also check out the website at kubernetespodcast.com, where you'll find transcripts, and show nodes, and links to subscribe. Please consider rating us in your podcast player so we can help more people find and enjoy the show. Thanks for listening, and we'll see you next time.

[MUSIC PLAYING]

View More Episodes

LeakSignal with Wesley Hales and Max Bruce

News of the week

Links from the Interview

Links from the post-interview chat

Transcript