Kubernetes Podcast from Google: Episode 186 - Gateway API Beta, with Rob Scott

#186 July 21, 2022

Gateway API Beta, with Rob Scott

Hosts: Craig Box

Three years after they were first proposed, the new Kubernetes Gateway APIs - the evolution of the Ingress API - are in Beta. Rob Scott is a software engineer at Google and a lead on the SIG Network Gateway API project.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

News of the week

Argo security audit:
Kubernetes Cluster API integrates continuous fuzzing
- The report
- OSS Fuzz
Cilium 1.12
GKE Cluster Autoscaler location policy
The quest for neutrinos
- Ray traced Quake II

Links from the interview

Transcript

Show full transcript

CRAIG BOX: Hi, and welcome to the Kubernetes podcast from Google. I'm your host, Craig Box.

[MUSIC PLAYING]

CRAIG BOX: You know how last week I said it was going to be the hottest day ever. I wasn't kidding. And as you will probably have heard, Monday was uncomfortable. As a result, I learnt a little bit about the art of temperature measurement.

One might expect when they say Heathrow Airport had a temperature of 40.2 degrees Celsius that it might be something to do with all the planes. In fact, later in the day, there was a reading of 40.3 degrees at Coningsby, a Lincolnshire village with an air force base nearby. Put off flying yet, you planet-warming tourist you? Turns out, it's not that simple.

Near me, as I speak, are the Royal Botanical Gardens at Kew. There's a little box that looks a little bit like a beehive, fenced off in the middle of a nice lawn. I've seen it many times, even a couple of weeks ago. I did know it was a piece of science equipment, though it doesn't have a nice plaque with its Latin name, like all the trees do.

This week, I learned it's a thing called a Stevenson screen, which is a box that allows airflow but stops direct sunlight, snow, or airplanes, I guess, influencing the weather. Planes do fly overhead, much to the dismay of local residents. But it's otherwise a nice green oasis in the middle of West London. And it's normally very close to the Heathrow reading.

Fun fact, the nominal Mr. Stevenson had a son, Robert Louis, who went on to write "Treasure Island" and "Jekyll and Hyde." It actually turns out that some of the weather stations don't even report in real time. So it might be that someone else has a higher number waiting to be submitted.

I'm not sure how a provisional reading becomes a confirmed reading. They can't send someone out to Coningsby with a thermometer today because they'll get rained on. Let's get to the news.

[MUSIC PLAYING]

CRAIG BOX: The Argo project has published the results of a recent security audit performed by ADA Logics. They identified 26 issues, seven in Argo CD, 6 in Workflows, and 13 in Argo Events. Seven CVEs were issued for CD with events picking up too. Learn more about Argo and episode 172 and stay tuned to future episodes to learn more about security audits.

Ada Logics has also worked on integrating continuous fuzzing into the Kubernetes cluster API, enrolling the project into Google's OSS-Fuzz system and developing a set of followers to bring the code coverage to a mature level. Version 1.12 of Cilium brings its service mesh features to general availability as well as adding other new features. You can now use a sidecar-free data path if you're willing to accept the security trade-off of having one shared Envoy instance per node. For those who want full security, you can still use Cilium under Istio, or configure multiple Envoy instances using the new CiliumEnvoyConfig resource. There's also a new Ingress controller option with Envoy embedded in Cilium and support for egress through set gateway instances.

The GKE Cluster Autoscaler has added a location policy with two options for how nodes should be added when scaling up. The “balanced” mode will try and keep node counts the same in each zone, where the “Any” mode will prioritize use of unused reservations. The latter is useful for working with spot VMs or in the case you have purchased reservations in the particular Compute Engine zone.

Finally, the IceCube Neutrino Observatory is trying to detect subatomic particles at the South Pole. Thankfully, it turns out that tracing the path of a neutrino is quite similar to tracing the path of a beam of light in Quake 2, so the NVIDIA GPU is more than up to the task. The IceCube team, backed by the San Diego Supercomputer Center and the University of Wisconsin-Madison, are able to use GPU sharing on GKE to deal with the fact that the GPUs can process data faster than the CPU can feed it to them. Their write-up on the Google Cloud blog talks about the performance results and how to tune sharing for your workload, unless, of course, your workload is Quake 2, in which case you'll still want a GPU all to yourself.

And that's the news.

[MUSIC PLAYING]

CRAIG BOX: Rob Scott is a software engineer at Google working on Kubernetes networking. He is a maintainer of the Gateway API project. Welcome to the show, Rob.

ROB SCOTT: Thanks for having me.

CRAIG BOX: One of the things that I find interesting, looking at the engineering teams at Google, is there are a lot of, what I'd call, second-generation Kubernetes people now. There are obviously a lot of the people who built the product still involved. But there are people who came on board with Kubernetes in their blood perhaps, people who have used it outside and who are now helping to build it at Google. You are one of those people. Can you tell us a little bit about your path to Kubernetes?

ROB SCOTT: Yeah, thanks. My first experience with Kubernetes goes all the way back to a small startup I was working at called Spire. And I got the opportunity to really lead their infrastructure towards Kubernetes and lead that migration. And I was really sold on Kubernetes at that point.

I convinced the company that was the direction we should go. But it wasn't until shortly after that migration where AWS had a service disruption. And on our status page, all these services had yellow or red statuses that gave me cause for concern. I'd convinced the company to move to the system on this cloud. And now there's this big outage that was affecting many, many companies.

And throughout it all, our infrastructure stayed up. And one of the things that just blew my mind is, we lost a Kubernetes node. And in the previous system we had run on, if we lost a VM, it would have been a significant amount of effort for us to replace it. And we likely would have had some downtime in the process if it was a critical one like this.

CRAIG BOX: You would have had to put posters up on all of the lampposts in the neighborhood saying “lost VM.”

ROB SCOTT: Exactly, but in this case, with Kubernetes, we just didn't care. It was amazing to watch everything happen. We lost a node. We could not replace it because of the service disruption. But Kubernetes scheduled all our workloads on our existing VMs.

And it just worked. No downtime, everything just continued. And at that point, I was thinking to myself, wow, this Kubernetes thing is something else.

I already liked deploying to it. I already liked the experience. But then seeing it really save us from some downtime made the whole investment pay off. And from that point on, I was like, I need to spend more time on this Kubernetes thing. This is something else.

And I think that was around Kubernetes 1.3, if I remember correctly, and from then on, I just wanted to spend more time working on Kubernetes. That led me to a company that, at the time, was called ReactiveOps. And we rebranded to Fairwinds at some point. But there, I just got to help different customers with Kubernetes.

And I really enjoyed that. We got to build some open-source tooling that made working with Kubernetes a little bit easier. We saw what our customers were struggling with, built our back manager, Polaris, some other open source tools.

And really love that experience. But the more I worked with Kubernetes, the more I wanted to build Kubernetes itself. And Google gave me that opportunity.

CRAIG BOX: You've come to the right place.

ROB SCOTT: Yes, and I've gotten to work on Kubernetes since then. I work with Bowei. I think he's been on here before.

And the first thing that he asked me when I talked with him is what I wanted to fix about Kubernetes networking. And well, I said, I want to make Ingress better. And well, the rest is history. Here we are.

CRAIG BOX: It is a noble cause. We generally try not to cover the same topic twice on the show. But it was back in May 2020 that we talked to Bowei about what was then called the services APIs. Am I breaking my rule by talking to you today?

ROB SCOTT: Well, I've got great news. We've rebranded. So this is Gateway API. It's a whole different thing.

CRAIG BOX: Is it Ingress V3?

ROB SCOTT: Yeah, well, if we ever want to be on the podcast again, we'll be sure to change our name again.

CRAIG BOX: Yeah, I was going to ask, are you finished renaming the project?

ROB SCOTT: In fairness, I really hope so. And just a couple of months ago, I think I finally removed the last little bits of references to Service APIs, the name. And renames are tough. So although Gateway API may not be the perfect name, I think it's the right name for us.

CRAIG BOX: You could have just put a placeholder variable in the code everywhere so that if you did rename it again or maybe if you wanted to localize the name, you could get away with that.

ROB SCOTT: We should optimize for renaming. That's a great idea.

CRAIG BOX: On the topic of Ingress, you pointed out to Bowei, it's a thing that needs to be fixed. It's something that I'm familiar with. It's something that I presume most Kubernetes users are familiar with. After all, it was in beta for a very, very long time. What is wrong with Ingress? Why is it not enough?

ROB SCOTT: I don't think there's anything particularly wrong for Ingress, in the sense that it solved the very small scope of things that it was trying to do. And in some sense, it was wildly successful in just how widely the APIs is used and implemented. But as it's been implemented more widely and used more widely, we've seen how much more users want to do with the API.

And so a number of implementations have responded to that by, for instance, creating custom annotations on Ingress that they'll support for all the capabilities that are missing from the API itself. And working with annotations is just a poor experience. And that's not the fault of implementations.

There's no place in the API to configure these things. So if you're using the Ingress API itself and it works for you, that's great. It will continue to exist. But with Gateway API, we were really trying to unlock all of these advanced capabilities that so many users have been wanting and asking for, and provide it in a portable way that really works across lots of different implementations.

CRAIG BOX: There are a few design goals and concepts that you refer to when talking about the Gateway API. Portable is one of them. It's also designed to be expressive, extensible, and role-oriented. I'd like to dig into a couple of those perhaps. What does it mean to be role-oriented and how was Ingress not that?

ROB SCOTT: With Ingress, we had a single resource, which in Kubernetes means when you're configuring your are RBAC roles and role bindings, you can really just say, this user can either edit Ingress or they can't. But if you wanted to say, this user can modify the TLS certificate but not the routing rules or vise versa, that wasn't really possible.

With Gateway API, we saw this infrastructure pattern where organizations that were using Kubernetes often had different roles in their system. So one common example is that some organizations might have an infrastructure provider that is managing the Gateway classes available in the cluster. So in that case, they would be saying, I want to allow external load balancers in this cluster and internal load balancers. And I want them to be implemented by this specific provider.

And then the next level down, we would have cluster operators who would create gateways, the actual load-balancing infrastructure in that cluster using the Gateway classes. And finally, you'd have your application developers who would actually configure routing logic for their applications. And with Gateway API, we've separated all those roles into different resources so different teams can own different pieces of the stack.

CRAIG BOX: How are Ingress and Gateway API different when it comes to namespaces?

ROB SCOTT: That's a very good follow up for this. It just builds on the role-oriented aspect of this where we really wanted to segment out access for this. And one part of that is the idea of connecting across namespaces.

So for example, with Ingress API, because it was a single resource, everything really had to exist in a single namespace. With Gateway API, we've made it so really, each piece of the stack can exist in a different namespace. So for example, gateways may exist in a infrastructure namespace. And then each gateway can describe which namespaces it trusts for which domains, for example.

So it may trust the foo.com domain to the team and the Foo namespace. And the team and the Foo namespace can configure routing logic. So that's been a really key part of the Gateway API design again, building on that role-oriented concept and allowing different teams to have different access levels for their applications.

CRAIG BOX: And I think that's somewhat an emergent behavior because I don't think when Kubernetes was initially designed, the idea was that namespaces would be used for tenant separation quite the way that it has turned out that people do use them.

ROB SCOTT: Yeah, you're completely right. I think this is something that has naturally evolved as we've seen how Kubernetes is actually used, that namespaces really provide that boundary for multitenant use cases. And Gateway APIs really just building on top of that pattern.

CRAIG BOX: So this is going to solve the problem whereby anyone who wants to change a route in an Ingress effectively has to have access to the private key for the entire domain?

ROB SCOTT: [CHUCKLES] Yeah, with Ingress, again, because everything really was in that same namespace, that meant routing configuration, your application pods, and usually even your TLS certificates all lived in the same place. So you can imagine that if, say, your application developer needed to read or write a secret for their application, it was very difficult to differentiate that from reading or writing access to the TLS cert for their application, which you really don't want to share broadly. And with Gateway API, we've really built on that and we've said, hey, your TLS certificates can live in a different namespace than your routing configuration. And your routing configuration can live in a different namespace than your application code. And we've really tried to again, allow for organizations that want to split these concepts up to have that capability.

CRAIG BOX: Ingress was one of the first cases where people would look at a cluster and see the difference between a resource and an implementation of a controller that consumes that resource and does something. People would create an Ingress object. But if there was no Ingress controller installed, nothing would happen. On GKE, there's a default Ingress controller installed for the Google load balancer. In some on-premises environments, people might install NGINX, for example.

But that locks you down to one Ingress controller for each cluster, effectively. The Gateway API has the concept of Gateway classes where you can have different classes. Is the idea that you will want to have different classes installed within a single cluster? Or is this just a way of making it easy to move resources between different clusters that have different controllers?

ROB SCOTT: I think it's both. I think there are going to be some use cases where a single Gateway class is all that organizations need. And that's great. And for others, they're going to want multiple Gateway classes. With GK, I think we're already bundling four Gateway classes with our implementation.

So we have one for an internal load balancer, external load balancer, and then the same again for multi-cluster versions of those. At the same time, you can see other implementations bundling one or more Gateway classes themselves. So this really allows users in a cluster to have access to different types of load balancing, depending on what their application actually needs.

CRAIG BOX: So with that design goal of portability, if I have a Gateway API set of descriptions for how my Ingress should work and the services it connects to and so on, I should effectively be able to swap out the Gateway class or I should be able to take those and put them on a cluster on a different provider that has a Gateway class for its own load balancer and assume everything will continue to work?

ROB SCOTT: Yeah, that's definitely been one of our goals. For example, GKE bundles, Gateway classes in our clusters with our implementation, at the same time, we're hoping that users will be able to create, say, a generic class like, say, external LB. And the underlying implementation of that Gateway class could be entirely different, depending on what environment they're running in, what cluster they're running in.

So if you're running across multiple clouds, the application developers or Gateway owners really don't need to care about who is implementing the external LB. Just, this is what it means for an external LB or an internal LB in this environment. We've been working a lot on building out our conformance test suite to ensure that all implementations of the API really get this consistent experience.

We already have, I think, more than 20 test files full of conformance tests, far more thorough than any other API I've worked on. But with that said, there are some extended portions of the API that we recognize that not every implementation is going to be able to support. So for example, one of the things in the API right now is header modification. And header modification is something that a large proportion of our implementations can support. But unfortunately, not all of them can.

So we call that an extended support-level feature. We document it as such. And then each individual implementation is required to document whether or not they support that feature. So if you're using some of these extended features, you do need to double-check before you move from one provider to another that that extended feature is supported across both. But largely, we do want to provide a consistent experience across implementations, across providers.

CRAIG BOX: As you mentioned before, one of the problems with Ingress was the idea of having to use annotations for everything. You also have a goal for extensibility here. You've talked about how some objects have fields that may not necessarily be supported by every Gateway implementation. What if something new comes up? Let's say I've invented a load balancer that supports quantum tunneling. How do I define a thing which wasn't thought of or isn't common between implementations?

ROB SCOTT: Well, I think there's really two ways to approach that. First, we have this policy attachment model that allows extending any existing Gateway API resource. So we have a consistent approach to finding policy that can extend any Gateway API resource at any point with custom logic, custom policy. And maybe quantum tunneling fits into that category of something that can easily be represented with some kind of policy.

CRAIG BOX: What's an example of a policy?

ROB SCOTT: So on GKE, we've defined a policy that allows configuring things like CDM or other implementation-specific things that really may not make sense to other providers but are very useful for Google Cloud users. They expose some of our unique features. And that's really where we see policy being valuable. It allows implementations to expose capabilities that are unique to them.

CRAIG BOX: That might be an interesting choice of word then because when I think of policy, I think of like, who can do what, for example. But it almost feels like this is a place where you can define extra attributes that aren't available everywhere.

ROB SCOTT: As it turns out, policy is a very broad term. And we are using it for a lot of different things. In this case, policy is something that is really being used as a generic extension mechanism.

Another example of a policy would be something for configuring custom health checks or custom timeouts that may be unique per implementation. And again, with policy, we're thinking of it as a great way to extend and build on the API without needing to rely on something like annotations. To go back to that quantum tunneling example, I think there's one other approach we could potentially use here. And that's the idea of bringing your own gateway in this case.

So if you're really sold on this quantum tunneling concept and it is something that can't be modeled well with say, policy attachment. You really need a different way to represent a gateway. You can bring your own gateway to this API, your own definition of a gateway and say, my implementation uses this other kind of resource to represent a gateway. You can still make use of Gateway class, of all our advanced routing configuration. But you may have a different way to express gateway configuration and that's completely fine.

CRAIG BOX: It sounds a bit like subclasses.

ROB SCOTT: Yeah, yeah, that's a great parallel.

CRAIG BOX: Let's talk then through the new API objects that someone will have to use. Whereas previously, we had Ingress with its single object. We have the Gateway class, which we've talked about. And then you have Gateway resources and routes. Talk me through the different objects that someone has to know about and perhaps which set of users are going to care the most about which objects.

ROB SCOTT: Right, so I think Gateway classes, which we've already covered, are something that are largely going to be bundled with implementations themselves. So most users, all they'll need to know about a Gateway class is that it exists and you can reference it and use that specific kind of load balancing technology.

CRAIG BOX: So that's a little bit like storage where you might have a thing that says fast disk and that might be backed by SSD. And you might have a thing that says cheap disk. And that might be backed by cloud storage, for example.

ROB SCOTT: Yeah, exactly like that. With storage, there's a very similar concept called storage class. Gateway class looks very similar to that. And Gateway class is used by gateways. Gateways describe an entry point into the system, usually represents some kind of load balancer or proxy.

For example, on GKE, that represents a GCP load balancer. With other implementations, it may represent, say, an Envoy control plane. It is completely dependent on implementation. But it's a way for traffic to get into your system, primarily right now.

There has been a lot of discussion about, well, what if a gateway could represent egress instead of ingress. We don't have any implementations of this right now. But I think there's a lot of promise for a gateway to not just represent ingress traffic but also egress as it's leaving your system.

And then finally, there's the routes side of this. And that's where the vast majority of logic in our system lies. As the name suggests, routes are all about routing configuration. So if you want to configure how traffic should be routed, route resources are for you.

That includes things like path matching, header matching, method matching, but also things that can modify the request, whether that's mirroring, rewriting, header modification, and then finally, where matching requests should be sent. So usually, that's a service or a set of services that traffic should be split between. But routes really include all that important logic in the API.

CRAIG BOX: So if I have the canonical example where I want to send 10% of my traffic that comes from a certain user agent to a canary service, I'll be configuring that in a route object.

ROB SCOTT: Exactly, right. So that is entirely possible with the API today. We have header matching. And we have the ability to do traffic splitting where 10% of your traffic can go to one backend. And 90% can go to another backend.

CRAIG BOX: One of the ideas of the implementation here was that there would be many different types of routes. When I'm looking at things that are in the beta, it's only the HTTP route object that has been marked as beta. What about things like TCP and UDP?

ROB SCOTT: Yeah, that's a big focus going forward. At this point, HTTP route is far and away our most stable, widely implemented route. It's implemented by, I think, every implementation of the API right now.

And it's widely tested, widely used, and something we're confident in graduating to beta. Unfortunately, we haven't had as much effort yet towards TCP, UDP, and other L4 routes. That's something we're really trying to push forward in our next release because we also want to see them graduate to beta soon.

CRAIG BOX: You're also talking about the GRPC route as a new feature that you're working on. Why is GRPC different from HTTP when it just sits on top of it?

ROB SCOTT: We spent a lot of time discussing that. As we went through the GEP process, and GEP is really similar to a Kubernetes Enhancement Proposal, but it's a Gateway Enhancement Proposal. But as we were discussing this for GRPC route, a lot of the questions were well, could you model this with an HTTP route? And although it is technically possible, we ultimately determined that providing a GRPC route would provide a much better user experience and allow for some GRPC-specific extensions in the future that would be difficult or impossible to include in HTTP route itself. So there's a lot more information about why we made this decision in the GEP itself. So if you are curious about this, I'd recommend looking at our website for the GRPC route GEP specifically, where we really detail why GRPC route makes sense instead of just extending say HTTP route.

CRAIG BOX: You mentioned enhancement proposals there. How do new features get implemented? If someone wants to come along and add something, what is the process of them creating an enhancement proposal and then working on the code?

ROB SCOTT: Right, so if you're familiar with the Kubernetes development process, any major API change starts with a Kubernetes Enhancement Proposal. And following that same lead, we've created Gateway Enhancement Proposals. They seem very similar to Kubernetes Enhancement Proposals. But they're just a little bit simpler and really, the goal of a Gateway Enhancement Proposal or a GEP is just to describe why we want to add this feature, what exactly it will look like, and any alternatives that we've considered along the way.

And this is important because we have a lot of discussions in the API development process, whether that's on GitHub in our community calls, on Slack, wherever it is. And it's very hard to keep track of all the rationale for why a specific decision was made. GEPs allow us to document very clearly why a decision was made and what exactly that decision looked like for anyone from the future looking at those decisions and trying to understand how we got to the resource we did.

CRAIG BOX: The Ingress object is shipped with Kubernetes, even though the controller itself may not be. The Gateway APIs are implemented as CRDs that you have to install. Does that imply that one is more official than the other?

ROB SCOTT: Well, they are both official Kubernetes APIs. But the Ingress APIs already generally available. It’s something that is considered stable and, from my perspective, is generally frozen as it is. So if you want an API that you know is going to be the same for a long time to come, Ingress is for you.

Gateway API, we've just graduated some of our key resources to beta. That provides a significant amount of stability. It is just as official as the Ingress API. But there is still a little bit more potential for change and certainly additions to the API.

One thing I'd say about Gateway API, I've been involved in the design and development of both Ingress and Gateway API. And with Gateway API, we have this extra layer of review for all our changes. So Gateway APIs has subproject inside SIG network. And we have a set of maintainers, including myself, that review every single GEP or enhancement proposal that comes through and every API change. And we try to make sense of that.

And then the next layer on top of that is we take these changes and send them off to the SIG network API reviewers that would review something like Ingress. And they also double-check that what we're proposing makes sense. So I think Gateway API is just as official as Ingress but also likely more reviewed and hopefully slightly better thought-out than the Ingress API that came before it.

CRAIG BOX: At the time that it goes V1 or generally available itself, will it become part of Kubernetes?

ROB SCOTT: I think one of the key things to keep in mind with the trajectory of Kubernetes is that we're trying to push more and more APIs outside of core Kubernetes. So this is, I think, Gateway APIs really the first major set of APIs like this that is defined outside of core with CRDs. But I think this is the first of many. The idea being that not every cluster needs routing. Not every Kubernetes instance needs this kind of L7 and L4 routing capability.

Some clusters may be entirely focused on backend instances that don't need this kind of routing capability. And that's fine. We're trying to make sure Kubernetes itself is modular.

So we're not just building the core larger and larger and larger, but instead, allowing users to add on components and modules as they see fit. So I don't think there's ever going to be a time where a Gateway APIs installed or comes by default in every Kubernetes cluster. But I think it will be a very common component or module that is added to many or most Kubernetes clusters in the future.

CRAIG BOX: How then can I write software against this API and trust that it will be installed on every cluster that it needs to be?

ROB SCOTT: I think the beauty of the Kubernetes API system is such that it's very easy to tell what APIs exist on a cluster and to work within that. And I think that anyone who is attempting to use Gateway API software is going to ensure that they have Gateway API in their clusters. I think for most clusters in the future, Gateway API may just be a default that comes bundled with many providers.

But not everyone will need it. So they may turn it off. They may choose not to have that as part of their system. And I think that's fine.

CRAIG BOX: If you squint at the Gateway APIs, you can see a similarity to the Istio traffic-routing APIs in that they have separate types of objects for gateways versus routes and Istio also handles post routing, for example. Does that mean that the Gateway API can be used for service mesh?

ROB SCOTT: I think there's two parts to that question. And one is yes, we've definitely been inspired by other APIs. And I really want to thank the contributors from Istio and contributors from Contour and other projects that have helped us here.

We've taken what we think are the best pieces of different projects like that. And teams from Istio, from Contour, et cetera, have really helped provide guidance on which parts of our API have worked well and which parts they may regret or may want to change. And their guidance has been really helpful as we build out Gateway API.

So we definitely have similarities from different projects and different APIs like that. But of course, there are some differences and some new concepts that we're building in. As far as if this makes sense for service mesh, I really think so.

I have been excited to see all the traction and all the momentum that we're seeing on the service mesh side for Gateway API. We've seen, I think, just about every major mesh implementation get on board recently and start to explore what Gateway API looks like for mesh. And I really want to thank the Istio team specifically for really starting that discussion.

I think they were one of the first major mesh implementations to really believe in Gateway API for mesh. They have started to show just how that can work. And I'm excited to see what the collaboration of all these other mesh implementations will come up with.

CRAIG BOX: One of the things announced with the beta release of the Gateway APIs the GAMMA initiative. What can you tell me about that group and what they hope to do with service mesh.

ROB SCOTT: Yeah, I'm really excited for that group. GAMMA is an acronym that stands for Gateway API for Mesh Management and Administration. An easy way to think about it is really just exploring how mesh makes sense with Gateway API, trying to define a set of patterns for any mesh implementation that is going to be using Gateway API.

This is a really challenging project in the sense that it is trying to build a standard set of practices API for mesh. And GAMMA really has a great set of people behind it. We have really this collaboration of people from SIG network, from Gateway API, from Istio and from SMI.

This represents a lot of the organizations that were behind SMI really coming in to explore how service mesh and Gateway API could make sense. So getting all these different groups together, working on Gateway API for mesh is going to be really exciting. I'm excited to have Keith Maddox who's been lead for SMI, John Howard who's been our lead for Istio, and Mike Morris who's also been involved in SMI and is coming from the HashiCorp side all involved in this and leading the effort of GAMMA and this Gateway API for mesh exploration.

CRAIG BOX: We've talked about those gateway classes before and how we talk about traffic coming in from outside through some kind of load balancer. Is it as simple as having a gateway class of type sidecar where we talk about traffic coming not from outside the cluster, but from inside it?

ROB SCOTT: There have been a lot of different proposals on how this API will work for mesh. I'd say at a high level here that routing traffic is fundamentally similar, whether it's north-south or east-west. With that said, I don't know exactly how it's going to be modeled as far as the resource above route.

So in some models and some proposals we've seen that Gateway, the resource, is used for mesh, but just in a different way and similar with Gateway class. On other proposals, we've seen that routes can have a different kind of parent than say, a gateway. So some proposals have said, well, maybe we'll create a mesh resource that lives in parallel to the Gateway resource. I'm really excited to see where this leads. But again, I'll leave that to GAMMA and that initiative to make the appropriate recommendation on what makes the most sense for mesh implementations.

CRAIG BOX: We've talked today about the fact the APIs are now quote unquote "beta." But the project also has a version number that's released 0.5.0. It's not a controller in the Ingress sense. This project is a spec with a set of APIs that other controllers consume and handle. So what is the thing that you are releasing and that is versioned?

ROB SCOTT: I think the best parallel here is looking at Kubernetes versioning itself. So you think when Kubernetes releases a new minor version like say, 125, there are a few things that can happen with that. Some APIs will get new fields added to them, even though their actual API versions like V1 beta 1 may not change. Some APIs will graduate from say, V1 beta 1 to V1.

And some alpha APIs or fields may be deprecated or removed. And that's really the same thing with Gateway API releases. So any time you see a new Gateway API minor version, the things that can change in Gateway API are largely the same as the things that can change in Kubernetes.

So it may be some new fields being added. It may be some resources graduating to beta or GA. For example, the 0.5.0 release, we had three resources graduate to beta. We also added some experimental new fields. And I think that's similar to what we'll see in future releases of Gateway API.

CRAIG BOX: Is there any code? Or is it simply a definition of an API in specification.

ROB SCOTT: Yeah, there's a little bit of code. Because we're using CRDs, there's only so much that we can build into the CRD validation itself. So we add code for two reasons. One is webhook validation to extend what is not possible in CRD validation today and ensure that all our resources are really valid and consistent.

But then two, the other thing that's included with our releases is a set of conformance tests. So implementations can say, I'm conformant based on Gateway API at V 0.5.0.

CRAIG BOX: Now that you've got that 0.5.0 release out and the API object, the core ones that people use are beta. What is going to come next?

ROB SCOTT: We've got so much in the pipeline. There are endless opportunities to get involved. To start with, I think GAMMA, that mesh initiative is going to be massive. So if you're interested in mesh, if you're interested in Gateway API, I encourage you to get involved.

At the same time, we'll be continuing to expand and stabilize our core APIs. So one of the things I'm most excited about in our next release will be route delegation, the idea that right now you can delegate from one namespace to another a hostname or a domain name. But some very common use cases would be, I want to delegate a specific path to a different namespace. So I want to delegate routing for slash foo to the Foo namespace. That's what we're exploring with root delegation.

Then, as we've already mentioned, GRPC route is coming along. The GEP has been approved. I think it's highly likely that's going to be included in our next release. But we need to work through a few more little details.

And as part of stabilizing this API, we're really focusing on building out our conformance tests even further. They're already fairly comprehensive. But we want them to cover even more. And maybe most importantly, we want to make sure that the resources that we currently have in alpha can get to beta. So for us that means a lot of work on L4 routing specifically and making sure that our L4 routes are really ready to graduate to beta, making sure they make sense and we're confident with them and they're widely used and implemented.

CRAIG BOX: It's been over two years since we talked to Bowei about what were then called the services APIs, and very prealpha if anything, I would have to assume. Is it going to be two more years until the GA?

ROB SCOTT: I am awful at predicting timelines. Let me just say that. This is a complex project.

We are charting a new course, in many ways as the first major Kubernetes API built on CRDs. I would absolutely love to see Gateway go GA next year. I hope that's possible. But I don't want to overpromise to anyone. I do think that is our next major milestone is to get some of these key APIs across the line into GA.

CRAIG BOX: Can we assume that 0.5 is halfway?

ROB SCOTT: I hope we're a little further than halfway, but at least halfway.

CRAIG BOX: When you're not working hard on Kubernetes APIs, you like to ride your bicycle, I'm reliably informed. And with the pandemic-enforced lack of commuting, what did you do to fill the hole in your life?

ROB SCOTT: I've always been a fan of biking. But it wasn't until the pandemic that I really got into it and started to see where I could go. The pandemic locked down everywhere. And I was spending all my time working inside. And I was just looking for an excuse to get outside and explore in a safe way.

And that led me to cycling. Last year I had a significant personal accomplishment where I rode over 100,000 feet of elevation gain. And this year, I'm on pace to do the same thing. And so I know there are many cyclists in the Kubernetes community. And that's something that I've just really enjoyed and glad to have the opportunity to do that here in the Bay Area.

CRAIG BOX: It's pretty flat in the Bay Area, isn't it? How many times do you have to climb a hill to get to 100,000 feet?

ROB SCOTT: You're right. I really have to go out of my way to find hills. That is for sure. But if you go around 10 miles away, we have some things we call mountains, the Santa Cruz mountains. And I spend a lot of time going up and down those.

CRAIG BOX: A lot of mountain bikers perhaps think that the cycling up part is only so that they can enjoy the cycling down part. But it sounds like you possibly enjoyed the first part more.

ROB SCOTT: You know, I enjoy the sense of accomplishment. But I don't think anyone really truly loves cycling up. Or at least I don't. But I enjoy saying that I did it and being able to enjoy the ride back down.

CRAIG BOX: All right, well thank you very much for joining us today, Rob.

ROB SCOTT: Thanks so much for having me.

CRAIG BOX: You can find Rob on Twitter @robertjscott. I note that you are robertjscott1 on LinkedIn. It doesn't matter how long you make your name, there's always going to be someone else who's got it first.

ROB SCOTT: Yeah, I was too late to the LinkedIn game. I am rather annoyed by that. There is another Robert J. Scott on LinkedIn. You know, that's what you get for having such a common name.

CRAIG BOX: You never thought about picking up a pseudonym?

ROB SCOTT: You know, I may have to do that. Maybe next time.

CRAIG BOX: All right, and you can find the Gateway API at gateway-api.org.

[MUSIC PLAYING]

CRAIG BOX: Thanks as always for listening. If you've enjoyed the show, please help us spread the word and tell a friend.

If you have any feedback for us, you can find us on Twitter @kubernetespod or reach us by email at kubernetespodcast@google.com. You can also check out the website at kubernetespodcast.com, where each week, we post our transcripts and show notes as well as links to subscribe. Please consider rating us in your podcast player so we can help more people find and enjoy the show.

I'm off for a little bit, but I'll be back in a couple of weeks. We'll see you then.

[MUSIC PLAYING]

View More Episodes