#58 June 18, 2019

Istio 1.2, with Louis Ryan

Hosts: Craig Box, Adam Glick

Istio 1.2 has been released. Louis Ryan is a core contributor to Istio and a member of its Technical Oversight Committee, in his role as Principal Engineer at Google Cloud. He talks to Craig and Adam about his history with API infrastructure and the service mesh, and the history and future of the Istio project.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

News of the week

CRAIG BOX: Hi and welcome to the Kubernetes Podcast from Google. I'm Craig Box.

ADAM GLICK: And I'm Adam Glick.

CRAIG BOX: It was Father's Day in the US this weekend, and given that you've just become a new father, I say that's a lot of effort to go in order to have a reason to celebrate this particular day.

ADAM GLICK: I personally am enjoying my extra day of celebration for the year, and it feels a small extra bonus to having a little one. We got to check out the garlic festival out here, which beyond seeing two gentlemen dressed up as giant garlic cloves walking around, which was just interesting in and of itself, I found an incredible food item that anyone who loves garlic should definitely try. It looks like mayonnaise, but it's actually whipped garlic, which is like a blended garlic mixed with oil, and if you like garlic toast, it turns anything into garlic toast. It's magic.

CRAIG BOX: You find a lot of people adding food to things that maybe doesn't necessarily belong. You can find bacon flavored mayonnaise and bacon flavored maple chocolate. Maybe it's a Canadian thing. Were there any bacon flavored garlic things? I can understand those being two complementary flavors.

ADAM GLICK: As it turns out, there were. At one of the spice vendors, they had a bacon garlic salt that was available there.

CRAIG BOX: For all of your vices at once. Possibly not for the vegans in the audience.

ADAM GLICK: The most interesting thing I stumbled across was probably the pineapple and garlic, which are two things you wouldn't necessarily put together, especially in a jelly, but it actually kind of worked. How was your weekend?

CRAIG BOX: Well Fleetwood Mac rolled into town on Sunday.

ADAM GLICK: Were they driving a Fleetwood?

CRAIG BOX: They were not. They were driving Wembley Stadium. Wembley Stadium had been home to the Spice Girls just two days earlier. 4/5 of the band on both occasions. Unfortunately, Lindsey Buckingham not only is a bit unwell lately, but he had a little bit of a falling out. But of course, the good news is that he was replaced by two fantastic musicians in Mike Campbell of Tom Petty and the Heartbreakers and Neil Finn of Crowded House, who is one of my favorite musicians of all time.

And so it was one of those weird things like, hey, I know that guy. How must it feel for everyone else who doesn't know that guy hearing those songs sung. And how does it feel for me, like I've seen this guy live many, many times, and all of a sudden he's in Fleetwood Mac. But that aside, it was a great show, and I'm really glad to have gone.

ADAM GLICK: Awesome. Let's get to the news.

[MUSIC PLAYING]

ADAM GLICK: Istio 1.2 has been released, bringing a number of stability improvements, performance upgrades and bug fixes. You'll hear more about Istio and the 1.2 features in today's interview.

CRAIG BOX: HAProxy, a popular open source proxy server and load balancer, has released version 2.0. New features include layer 7 retries, Prometheus metrics, traffic shadowing, and gRPC support. HAProxy is sometimes used as an ingress for Kubernetes, and this release adds an officially supported HAProxy ingress controller as well as a new data plane API.

ADAM GLICK: Microsoft announced their new Windows Subsystem for Linux version 2 in May, and said they now support all the Linux system calls required to run Docker. Building on that, Docker has announced they are working on a new version of Docker Desktop [for Windows], leveraging WSL 2, and a public preview will be launched in July.

CRAIG BOX: For the past eight years, Facebook has been running a Borg-like cluster system called Tupperware. Instead of a joke based on Star Trek they went for a pun based on the popular plastic kitchenware. They've talked about it on and off, but they formally presented it at their recent systems at scale conference, and put a page up on the engineering site describing it. Interesting learnings include why their elastic capacity model is based around whole servers rather than container shaped slices. Videos from last year's conference talk about their move to a single global shared system, and the Tupperware containers from where the system gets its name.

ADAM GLICK: Kubernetes at the edge continues to grow, as this week, Wind River announced the release of their-- let me take a breath-- Yocto Project based, carrier grade Linux 5.0 compliant distribution with Over-C with a capital C technology. That all boils down to another version of Linux built for edge computing that focuses on running containers in Kubernetes. If you're running Wind River Linux on your Edge devices and have been installing Kubernetes yourself, this could be the update you've been looking for.

CRAIG BOX: Banzai Cloud has announced that they have added the Istio service mesh to their pipeline platform. Banzai built an open source Istio operator, which is the basis of their new commercial product, which they say helps customers who are having trouble installing istio get them up and running with the benefits of a multi-cluster service mesh more quickly. The alpha version contains features for blue green deployments, tracing, audit trails, setup validation, and mesh diagnostics.

ADAM GLICK: The CNCF has announced that Apple has joined the foundation at the platinum level. Apple is joining as the 88th end user company member of the CNCF. They have contributed to a number of CNCF projects including Kubernetes, gRPC, Prometheus and Envoy, and hosted the foundation DB summit at KubeCon last year for their open source database technology.

CRAIG BOX: Google Cloud is running a series of worldwide online events for anyone interested in learning more about enterprise migration to the cloud, including GKE and Anthos. You can gain insights on cloud adoption and implementation, discover strategies and solutions that power successful migrations, join experts in live Q and A sessions, and participate in hands on training labs. The events are on June 25 or 26 around the world, and you can find the link to subscribe in the show notes.

ADAM GLICK: And that's the news.

[MUSIC PLAYING]

ADAM GLICK: Louis Ryan is a core contributor to the Istio project, a member of Istio's Technical Oversight Committee, and a principal engineer at Google Cloud, where he spent the past 10 years working on Google's internal API infrastructure and gRPC. Welcome to the show, Louis.

LOUIS RYAN: Hi. Thanks for having me on.

CRAIG BOX: You gave a talk at QCon last year, and you summarized your career at Google basically in three systems. With the addition of Istio, it's four systems that in effect have done the same thing. Can you talk us through the changes that you've seen in the time that you've been working on API infrastructure?

LOUIS RYAN: Way back in the day, I was working on the API infrastructure for a social networking product here at Google, which I shall not name. But the way that we built APIs was to go and use a library and have the individual teams that wanted to launch an API kind of build their own sets of infrastructure and launch their APIs independently. And having done that for about six months, that seemed like a really, really inefficient way of doing things.

And so I helped start a team here at Google where we would go and build a common set of infrastructure for people to build APIs, and run that as managed infrastructure. And that infrastructure, we would kind of manage and help other teams onboard onto it. And like most other API systems in the world today, that was built as a kind of big ingress gateway, proxy-style system. And we worked on that for about five or six years, and over that period of time, we saw big shifts in the way that APIs were used in the industry, right.

You saw the launch of smartphones, and smartphones consume an awful lot of APIs. And right at the end of life of that infrastructure, we also saw the kind of launch of cloud and the demands that cloud applications have for API are very different than you would expect from mobile or the web. And so using a big kind of common monolithic API gateway solution doesn't really scale very well for cloud, right, when the client has as much bandwidth to the service that it's trying to talk to as anything else running on the Google production network.

So we had to go and create a distributed solution for that problem where these API gateways were colocated with the services that they were running with, and that evolution effectively ended up with us creating something that looks very much like, I think, what we're going to be talking about in a minute, which is a service mesh where you have these sidecar proxies sitting in front of Google production services and mediating the traffic, providing higher level functions on that traffic, and doing so at very, very low latency, high performance.

ADAM GLICK: Along those lines, a lot of people in the cloud native space have heard about Istio. There's certainly been a lot of talk about it. Not everyone knows exactly what it does or what a service mesh does for people. Can you explain what a service mesh does in general, and what Istio does in specific?

LOUIS RYAN: Yeah. So a service mesh is a networking abstraction. Service meshes are designed to kind of raise these sets of behaviors that can be controlled on the network and allow people to control them at the layers at which applications perform operations, not at which people think about the network, right.

So most of the networking infrastructure or software that people use today is to manage IPs and ports, firewalls and things like that. But when you look at what people are developing as applications, right, they usually operate in terms of services, in terms of API operations, and so the service mesh is designed to present a higher level of abstraction where when you want to communicate, you communicate to a service, not to an IP. That's how you think about this as an application developer.

And you want to make that communication reliable, robust, resilient to failures. And the typical features that you get out of the networking there don't really help you with those problems. So the goal of a service mesh is to present this higher level abstraction so the application developer understands the network the way that they interact with it. And then they can get other things from the network like telemetry, logging, auditing, load balancing, all these other higher level functions that existing network management solutions don't really give you.

ADAM GLICK: How should I think of that in comparison to something like DNS?

LOUIS RYAN: So DNS provides you with a means to discover the IP that a service can be reached at. So DNS is part of this solution space, right. DNS already presents this abstraction of what is a service. What DNS doesn't really do very much about is, how do I talk to it and how much control over how do you talk to it you have, right.

Most application code today, they'll do a DNS resolution. They'll get back in A record. They'll open a socket to the IP presented by the A record, and that's what they'll talk to. That definition of a service is very static, right, and so if the IP address that that application was running on, maybe instead of one, there were thousands of them, or tens, which is probably more common, or the location of that IP is pertinent to making sure the traffic that you're going to send out goes to the closest endpoint, which is a common problem people face with DNS load balancing.

A service mesh is designed to kind of interpose in that space and allow application developers and operators to provide a more refined control over what set of IPs, what localities, how should load balancing occur, while still allowing the application developer to just think in terms of, hey, I want to talk to this name.

CRAIG BOX: Why is this something that is done as a sidecar rather than something we build into an application?

LOUIS RYAN: So you absolutely can build it into an application. There's nothing wrong with doing that. Maybe if you'll indulge me, I'll go on a little digression here. So I worked on gRPC for several years, which is an application libraby that people build into their applications to help them with lots of these problems. The primary issue there is adopting libraries or adopting new communication paradigms into application code is very expensive. For people who write software and rewrite software pretty regularly, that's fine, but most software is not rewritten.

If you went and contracted out for a piece of software to be written five or six years ago, the odds of you being able to have it rewritten are pretty small, right. It's a very expensive proposition. So my experience having worked on gRPC and talking to a lot of companies that were trying to modernize their network infrastructure, which was gRPC is great, we'll adopt it sometime in the next 5 to 15 years, but we need a solution for today. And so the use of a sidecar proxy allows us to provide a bunch of that modernization without requiring the software to be rewritten.

CRAIG BOX: What are the advantages of having a sidecar proxy, which is an instance of the proxy alongside every instance of the application, versus the more traditional mid-tier proxy, or a single proxy running on every instance in your cluster, for example?

LOUIS RYAN: So there's a few different advantages. So there's the kind of just the network scaling issue, right. The sidecars talk to each other and they mesh like fashion, which means they don't all route in through one centralized place. Any network topology that requires you to route in through one centralized piece of infrastructure is almost by definition less efficient, right, because it's not utilizing the other available bandwidth in the network.

So if you're sending a lot of traffic, then fanning in through one big centralized process is going to be quite expensive. The other reason, and it's a little bit more subtle, is big centralized network proxies are superpowers in your network, right. If you talk to one and it's expected to talk to something else on your behalf, it effectively has all of your authority, and this is an interesting and important security principle. If you do that for everything in your network, then the middle proxy can act like anybody else, and if it was ever compromised, right, it could exploit the entire network.

So we generally, as a best practice, try to avoid creating superpowers where we don't want them. The third one, and it's also the same reason why we went through that transformation and kind of the history of the API infrastructure here at Google, is if it fails, everything fails, which is a really, really bad property to have.

CRAIG BOX: You're one of the few people who can claim to have 10 years experience with what the public will think of as a two year old technology. What exactly is Istio?

LOUIS RYAN: I talked a little bit about the evolution of kind of API management infrastructure at Google. Istio is a packaging up of all of the best practices that we observed trying to create that solution. istio is also an attempt to build an ecosystem in a community around those best practices, because my experiences internally and how Google went about trying to build this type of solution, we were also seeing other companies doing the same types of things, and we were out there talking to the community, and we were talking to companies building things like gRPC.

And so there was this recognition I think broadly within the industry that there were these best practices people wanted to have in terms of application level networking. And so istio is an attempt to make that set of capabilities as widely available as possible. We just wanted all these best practices to be as available as they could be, in as many places as they could be, not just for Kubernetes clusters but for VMs, for any kind of application out there, really. And so that's why Istio was created, right. That's what we were trying to achieve, was to solve a kind of ubiquity problem for technologies that were kind of locked in into libraries and other solutions.

ADAM GLICK: Previously, you mentioned some of the powers of Istio. And we talked a little bit about DNS, and basically that it can do routing, and it's kind of a more instantaneous version of DNS to keep things up to date and direct traffic where it should go. But you mentioned a number of other features that Istio has. You mentioned some observability, some logging. Could you talk a little bit about those parts of Istio?

LOUIS RYAN: Sure. So as traffic is flowing through the proxy that we use in Istio, we get to see an awful lot of information about what's in the traffic. And we can extract features of that traffic and report it to telemetry systems, et cetera. So most application developers, they care what were the API operations I called on another service, right? And I mean API in the kind of general sense, not in the high-level I'm selling an API sense.

And so we can extract that telemetry for them automatically and feed that information downstream generically to a variety of different telemetry systems, so there's one integration point into N different telemetry solutions or vendors, which again is lowering the cost of bringing a solution into your network. So that's one major feature.

The other thing that we provide as a built-in feature of the system is an out-the-box security solution. So if you want to secure the traffic between any two services in your network, Istio can enable that and make sure that as traffic flows between those two systems, we know exactly where the traffic is coming from, or more specifically, really who, and who it's going to.

And then we can layer a security policy over that, so that we can ensure that if there is a service in your network that shouldn't be talked to by someone else or should be talking to someone else, then you can have policy on that traffic and have it enforced reliably and securely all the way down at the finest level of granularity.

CRAIG BOX: What was the discussion like as to make this an open-source project rather than just something that was offered as a service to Google customers?

LOUIS RYAN: Ubiquity is the goal. You're almost never going to achieve ubiquity if you ship something as a product, particularly if you're going to make people pay for it. In our industry, major adoption occurs with things that have the broadest possible distribution. And so it was always the goal to deliver something into open source.

And so the real question was what was the best way to go about it? At the time, we were looking at creating a project around this type of capability. Kubernetes was already a thing that was being quite successful as an open-source project at Google. And those people were our neighbors, so we had a lot of inspiration. I had been working on gRPC at the time, so I had some experience working in open source.

So the real question was what was the best way to go about doing open source because we were convinced from the very beginning that open source was the right thing to do. And so what we were really looking for as we kind of were playing with ideas and trying to get some consensus about what was the best way to do this, we were also looking for partners to help us build this technology.

Successful open-source projects generally have a strong ecosystem. They have other people, other companies who are willing to commit resources, commit their time to making sure that the technology is successful and represents a diversity of opinions and views. And so I was effectively going to conferences, looking at technologies, talking to people who were working in adjacent spaces, and seeing who was interested in getting something off the ground.

At the time, there was a project called Amalgam8 out of IBM. And I reached out to the project maintainer of that and said, hey, do you want to meet up at a conference? And so we sat down, had lunch, had a couple of beers, and you know, talked about what our objectives were for each of the projects.

And basically by the end of the conversation, they were pretty excited about coming on board. We were pretty excited about partnering with them. And so we kind of started kicking off a more concerted conversation about creating something real.

ADAM GLICK: What was the state of the service mesh ecosystem at that point? Were there other service meshes already available?

CRAIG BOX: Had anyone used that term publicly?

LOUIS RYAN: They had. So the folks over at Linkerd had, I think, used the term "service mesh" a few times by then, so credit where credit is due. And Linkerd was probably the other primary service mesh product in the space. There was also obviously within the enterprise or within the kind of corporate firewall, things that looked a bit service meshlike. There were certainly companies using nginx or HAProxy to get some of those behaviors.

Some companies were deploying it as a sidecar, but not many. And then there was also, you know, Kubernetes was ramping up and various other actually container orchestration systems which made deploying sidecars somewhat ubiquitously into your deployment or runtime infrastructure, a viable thing to do or something that people were thinking that they could do. Whereas previously, you know, without a uniform orchestration system, that would've been quite difficult.

ADAM GLICK: And what drove the decision to externalize the technology that you had worked on inside of Google and work with IBM and other companies on that, as opposed to contribute to Linkerd or one of these existing open-source service meshes?

LOUIS RYAN: So there were a variety of things. We didn't actually take any of the existing technology that we had internally. We actually built everything from the ground up externally. So what we did, we were looking around to see which proxy solution would we use as part of this product.

And as we were doing that evaluation, the folks over at Lyft and Matt Klein open-sourced Envoy. So we were doing evaluations of different proxy technologies. And we very specifically wanted a proxy that we could configure with an API and that we could reconfigure on the fly because we want that adaptability in the network.

And Envoy fit that bill almost perfectly. From our perspective, it was built on a technology stack that we were comfortable with from a performance, footprint, stability, and also developer community point of view. While C++ may not be everyone's cup of tea, you can hire C++ engineers. C++ engineers are used to working on performant code, so it seemed like the right technology choice.

And so it wasn't could we contribute to something else because we have actually made quite a large number of contributions to Envoy. It was what was the right set of technologies, and then how can we build the other pieces of the stack up around us? We didn't actually take any code that was internal at Google and kind of throw it over the wall into open source. That didn't happen. We just took technologies that made sense, and then we built up control plane infrastructure around it from the ground up in open source.

CRAIG BOX: You mentioned that you did this with a number of other partners. Who are the companies that have been working on Istio to date?

LOUIS RYAN: So we have a pretty diverse community. I mentioned IBM already, so they were a founding partner. We've also done a lot of work with Red Hat. They contribute quite heavily to the project. Cisco has been a regular contributor. And then you've seen companies like VMware, who have launched products based on Istio, so they've obviously been contributing for a while. Cloud Foundry and the folks over at Pivotal who use Istio now in production in a variety of scenarios.

And then a longer tale of contributors depending on use cases, you know, we see contributions from telemetry vendors like Datadog, who have written integrations for their systems into Istio, and LightStep, SolarWinds.

So there have been large companies, and you know, some more focused on making sure the tool integrates into a solution that they provide, and some into integrating Istio into product solutions that they want to provide, and some as large potential deployments of Istio, finding bugs in the system, and making sure that it works with their infrastructure and how they intend to deploy the system.

ADAM GLICK: Istio has a fairly broad community of people, as you mentioned. That must come with some trade-offs as you're building the software, in terms of going and working with the other companies involved in this and the individual contributors versus going and just hiring all the people that are working on the project and keeping it self-contained. What drove the decision to make it more of a broad collaborative approach?

LOUIS RYAN: I firmly believe that open source doesn't work unless you have diversity of opinion. And if Google went and hired everybody who worked on Istio, I think that would have been really, really bad for the community. That's absolutely something we wouldn't have wanted to do.

So you know, obviously, different vendors and different developers have different opinions. In part, they have those opinions because of how they want Istio to work in environments that I don't necessarily get to see. They have customers who I don't get to talk to or their users who I don't get to talk to. And so they bring in that perspective about what more people want than I would ever be able to get exposed to, just by being Google. And so now, that's super important to us.

I talk to the IBM folks pretty regularly. They obviously have a very, very long history in the industry and get feedback from customers about their products that are not Google customers. And so just getting that representation of the diverse community, diverse customer base is really, really important for the project.

CRAIG BOX: Do you think that's something that a single vendor can't do just by going and talking to their customers and their prospects?

LOUIS RYAN: I think it's harder. There are companies out there that specialize in providing solutions to very particular types of companies. And unless that company that wants to be the nexus of everything tries and makes itself a specialist in all industries, I just don't think that's viable. I just don't think the industry works that way.

ADAM GLICK: Istio 1.2 is out this week, so congratulations. What's new in 1.2?

LOUIS RYAN: There is in some sense, not a ton new, which is a good thing. It's a healthy sign of a project. So what there are are an awful lot of stability and bug fixes. And probably the biggest feature of 1.2 is the fact that we shipped it on the timeline that we wanted to ship it on, which is three months.

And when we were doing the planning for the 1.2 release, our goal is to regularly ship software in a predictable, reliable way. We said, we wanted to ship software in a three-month window. The last release 1.1, we targeted a three-month window. It ended up taking well over eight months, and so we weren't so happy about that.

So our biggest new feature in 1.2 is that we have shipped a packaged up bundle of stability, and performance, and other fixes of that type in a timeline that we're happy with.

CRAIG BOX: That sounds like what I'd like to call a Snow Leopard release.

LOUIS RYAN: Yes.

CRAIG BOX: It was a Mac OS 10 release about 10 years ago where they put a slide up at WWDC. It said, zero new features. And the crowd just went wild with applause.

LOUIS RYAN: Yes. So obviously, there's lots and lots of little features. There are lots of users of Istio out there that these features will make them happier. Most of them are user-facing changes, stability fixes, as I mentioned before, that address specific targeted use cases that people had and helped them with real production problems.

So that's what we were focusing on. There are definitely important improvements in areas of security, and how we do health checking, how we have integrated with different telemetry vendors, fixed some things in how we do locality-based load balancing. Like there are lots and lots of little features, which in totality, amount up to actually quite an important release for us. But there's just not a big, splashy, hey, we have this marquee new feature type of thing. And we're actually quite happy with that result.

CRAIG BOX: Istio came out of the gate fully formed, as it were. So the full feature set that's in Istio was there from day one. It's a very Google approach. A lot of other vendors like to start and say, all right, this is the minimum required functionality, and then bring out things at a time.

Google quite often has an opinion about what something should look like because it has that experience. It brings the whole thing out on day one. Does that leave you room to evolve based on what the community thinks of the product that you've released?

LOUIS RYAN: It's an interesting observation. It does present both a challenge and an opportunity. We knew what types of features we wanted to ship. And so when Istio came out, we put a stake in the ground about how those features should behave. In part, we did that because we wanted Istio to be like an existential proof to the industry. It's like, yes, you can do all these things, and here's how you would do those things.

Now, really, the evolution of Istio is improving the how we do those things. And there's a lot of discussion within the project right now about how to make improvements both in terms of how we integrate with other systems to do some of those things, how we present those things in APIs, how those things should evolve, and how independently can they evolve within the project.

Certainly, Istio came out a little differently than other open-source projects, but in part because service mesh was so new and there was a lot of skepticism about whether service meshes would actually work. Could you actually do or have all those features?

And so having something reasonably formed so you could actually show people, even at the expense of possibly having usability or integration friction or other types of issues because you haven't had that kind of long history of customer feedback about the product, was pretty important. You actually had to show that the thing could work, and so that was what we were trying to go for.

Since Istio came out, obviously we've gone through a lot of discussion with customers about how easy or hard it is to integrate Istio into an environment, whether this API or that API represents the concept well-- all the kind of typical product evolution things. So yes, in some sense, it was quite different and maybe a little overambitious, but it also really did serve a purpose in the project and I think in the industry because you could actually show people that this type of thing was possible.

ADAM GLICK: The Istio control plane is probably the most sophisticated application that anyone runs on Kubernetes. How has that impacted best practices for installation and deployment?

LOUIS RYAN: I'm not sure I would say it's the most sophisticated, but it's--

CRAIG BOX: There are a lot of moving pieces compared to the average software people will run themselves.

LOUIS RYAN: There are quite a decent number of moving pieces, so we certainly had our struggles getting those moving pieces running the way that we want them or helping customers get them running the way that they want them. Some of those struggles that we've had are effectively forcing us to push requirements upstream into Kubernetes, like particularly around how you might manage a sidecar is something that we've struggled a little bit with and I think every project in this space has struggled.

Our goal in Istio is to minimally impact an app, and so we're constantly looking at ways of removing the visibility of Istio from the application space and having it live only in kind of the infrastructure operator space. So there's actually quite a lot of things on the Istio roadmap about getting rid of a bunch of the things that we do and having them only really appear to infrastructure management people.

But that said, we've obviously learned a lot about how people need to set up and run apps-- the whole GitOps revolution. Because we require people to inject sidecars into an application, we have to work with CDFlows or we have to provide automation for doing that. We've done a lot of work with operators recently in the project.

We're trying to make sure that at least installing the control plane, that management and update of the control plane is smoother. And that's a trend that you see in the industry in general. You see a lot of apps and infrastructural components now use an operator to manage their application lifecycle because of the complexities and the interdependencies of components that are hard to express pretty statically.

CRAIG BOX: Istio has introduced an operator in 1.2?

LOUIS RYAN: We're not actually shipping it in 1.2 because the operator can actually ship on a different schedule. We'll have an operator launch sometime after the 1.2 release. And we'll ship the operator on an independent schedule. That's a nice feature of operators. They don't actually have to be locked with products.

CRAIG BOX: Istio seems like it's the kind of project that's better understood by the "capital E" Enterprise than the cool cloud-native kids on Twitter. How do you see the different adoption between those two groups?

LOUIS RYAN: Well, certainly, the "capital E" Enterprise folks have that mountain of brownfield lying around. The cloud-native folks generally don't. So the value to them is higher.

If you're an enterprise and you care about security best practices and maybe you've had some less than optimal security experiences recently, you're going to care a lot about the security features. And you're going to care about getting those security properties as deeply into your IT portfolio as humanly possible because if you can only get a security solution into 10% of your workloads or 5% of your network traffic, that's not going to make you very happy.

So that separation-- the gRPC versus service mesh discussion is very relevant here. If you're writing your application from the ground up today, you can do a lot of these things using libraries, using newer technologies, open-source technologies. If you can't do that because you don't have the code lying around, you don't have the engineers who wrote the code, you don't know what the code is doing really, then you don't have an awful lot of choice.

And then you're going to look for an extrinsic security solution, or a monitoring solution, or even a load-balancing solution. And that's really what's driving interest from the enterprise side.

CRAIG BOX: We also see a lot of people who have a Kubernetes environment, and then they want to get observability features on top of that. Istio is commonly cited as a very easy way to get that visibility.

LOUIS RYAN: Right. So we made it reasonably easy to install Istio somewhat ubiquitously into a Kubernetes cluster. By running the sidecar everywhere, we get telemetry about all the traffic into and out of a workload or a service, and then we make sure that that telemetry can be rolled up and categorized in ways that just generally make sense.

And so you can go from 0 to 60 quite quickly in terms of getting telemetry for all the behavioral interactions of services within your cluster without really having to write code. And so that's proven to be pretty valuable to people. Kubernetes itself provides a certain amount of that for the things that it runs.

Out-of-the-box Kubernetes will give you telemetry like how much CPU is my pod using or how much memory is it using. Well, that doesn't really tell you anything about the interactions between services. And so that's the value people are looking to get-- a pretty low-cost 0 to 60 experience, and then they can see an awful lot about what's going on in their network.

And we've had customers who have had production issues, and they're trying to figure out why. There's some interaction between two services, and they don't know what it is, and they're getting stability problems.

And then they install Istio, and all of a sudden, they say, oh, there's a call going between this app and this database. And for some reason, the socket is churning or this HTTP request is taking an awful lot longer than we would expect. And so they can diagnose that class of issues very quickly. And so that's the type of thing that people look to Istio for in that scenario.

CRAIG BOX: Some of the other mesh products add functionality to Kubernetes as if it was just something that the platform should have had to start with. Istio's opinion is more that the network exists outside and you should be able to connect multiple clusters, and VMs, and mainframes, and all sorts of things-- anything you can put Envoy in front of. What do you see as the differences between those two approaches?

LOUIS RYAN: It mostly is the difference between the aspirational and the real world. There's a lot of people adopting Kubernetes very rapidly out there today. And Kubernetes is a very successful project, and rightfully so. But the vast majority of workloads in the industry today are not running on Kubernetes. Kubernetes is absorbing them at an impressive rate, but there's still a solid majority of things are not running on Kubernetes, and so that has to be dealt with. It cannot just simply be ignored.

And while we make Istio easy to use on Kubernetes, and that was an intentional thing that we did in the project early on-- this kind of Kubernetes-first, but not Kubernetes-only approach-- there's nothing intrinsic in Istio itself that's coupled to Kubernetes. The concepts that we use, the technologies that we use apply equally to VMs and other compute environments.

And so there's just simply no reason to lock the solution into just being for one orchestration technology. And really, it's just do we have enough bandwidth in the Istio community to kind of do the ease of integration into those other environments to make the product work there? But there's nothing that's coupled to Kubernetes in any way really from an infrastructural perspective.

And in part, that's true because we're just trying to be a generalization of the network. And Kubernetes itself, the way it abstracts the network is really to just present another generalized kind of layer 4 IP network to people. And so it's opinionation about that doesn't really lock people into something that's weird or esoteric about networking in general. And so that's the same to an overlay network that people might be using with VMs-- they're using VMware or one of their products.

ADAM GLICK: In conversations at conferences and in reading the various tea leaves and blogs on the internet, I've seen that not everyone understands Istio. What's the most common misconception that you hear about the project?

LOUIS RYAN: There's lots of little ones, which is understandable. Istio is a big project. I think there's some misconceptions about what the goals are. There certainly seem to have been some misconceptions about how the Istio community operates and makes decisions.

As I mentioned earlier, we appreciate diversity, and so we have all of our community meetings in the open. There's a calendar in the GitHub site linked to from the Istio site where it has a list of all the meetings that people can dial into and weigh in with their two cents.

And then as to the goals, I talked a little bit about the genesis of the project and why we thought it was important to come out with something that was reasonably fully formed, just to be able to show the industry and potential adopters that you can actually do all the things that we want to be able to do.

I think there's a misconception that that's now locked in stone, but actually, the project is evolving, and we're changing the way that we do these things. And if you show up to a community meeting, you'll get to hear all about it. And there's not really much point getting too deep into the weeds of an engineering roadmap, but I would strongly recommend people who do care about that kind of thing to come along and read the ocean of design docs that we have available to understand where the project is and where it's going.

About the goals, you talked obviously about making sure Istio works well for Kubernetes, and we have a roadmap to make Istio work even better for Kubernetes, but also to make sure it covers other deployment environments very well. And that's going to be very much a focus of the project, I think, over the next year or so.

CRAIG BOX: Is Istio hard?

LOUIS RYAN: It can be. It's certainly harder than we would like it to be. Installing complex software--

CRAIG BOX: --is complex. News at ten!

LOUIS RYAN: --can be hard. Istio does a lot of things, and those things have impacts. And so if you just install Istio into an existing production cluster and roll it out 100% with the click of one button, you're probably not doing it right. Like most things, when you roll them out to production, you should do so incrementally. And we have facilities in Istio to help people do that, but there are certainly things that we should be making easier in the project.

A big focus for us is over the coming releases to reduce the amount of configuration work that people have to do to just deal with some simpler use cases. So there's just reducing the amount of toil that you have to do when you install Istio and set it up, so just removing that configuration burden. We've already done quite a lot in that space, but there's definitely more that we can do and we are quite focused on doing that.

There's also this question about hard in the conceptual sense. Istio gives you facilities like client-side load balancing. If that's not something that you've used before, then you kind of have to mentally onboard what client-side load balancing means to your production system. I certainly had to do that when I joined Google, and it took me a while to wrap my head around what does that mean for production.

So there's definitely also some conceptual learning that a service mesh, whether it's Istio or any other project, is going to present to kind of the operations people and service owners. And so I don't know if I would say that's hard, but it's certainly new. And sometimes, those two things can feel like the same thing.

ADAM GLICK: Istio does a lot of things and installs a fair number of things. Should Istio be broken up into a number of separate projects so that each of those could move at their own pace and be installed as people want those particular benefits or does it make sense to keep it together as a whole?

LOUIS RYAN: In the early days of Istio, we had to be quite conservative and kind of lock everything together to make sure that we could show everything working. Istio is already composed of other open-source projects, like we use Envoy. That's a huge part of what we use, and so that part is already broken out.

We're very careful to use APIs and standards where we can in the project. So another good example there, we use SPIFFE as the identity system or the kind of identity standard for how services communicate with each other. And there have been examples of, for instance, using Istio and swapping out our Citadel component, which is a component that has to some extent or other been broken out, and using SPIRE, which is a SPIFFE project instead of that for kind of certificate management.

So Istio itself is already quite componentized. And you'll probably see an evolution of the project where as the standards and APIs that we depend on become more mature, it'll be easier to swap out pieces of Istio with other things. And that's actually something we're trying to encourage in the project.

So I don't know specifically what that means in terms of the bits of Istio and where all the bits will end up, but certainly we support componentization. And where we can, we try to make sure that other components with equivalent interfaces can be used.

CRAIG BOX: If you want to learn more about SPIRE, and Citadel, and their relationship with Istio, you can listen to episode 45 with Andrew Jessup from Scytale. Along that note, there's a component of Istio called the Mixer. And a lot of people cite that as a single point of failure and something that they would like to see removed from the system. I understand that that is happening.

LOUIS RYAN: Yes. So over the last six to nine months, we've gone through qualifying Istio in production with some pretty large deployments and learned a lot about the scalability and performance pivot points within the system.

And ultimately, we've made the decision that Mixer is probably not going to be a supported component long term, but we're going to try and find a way to provide the kind of same ease of integration because Mixer exists to help people integrate additional capability into the mesh. Its job is to provide an integration point that's easy for people to write to. That presented an interesting set of performance trade-offs.

And so what we're trying to do right now is find-- and we have some proposals in the community right now to give people that same ease of use that how do I go and write a custom policy and put it in the network between these two services, but without any of the performance trade-offs doing that through a centralized system.

So we're doing a lot of work with the Envoy community right now to make Envoy more programmable and provide a developer experience around injecting behaviors directly into Envoy, and then making the control plane able to distribute those behaviors. And so there's an effort underway to embed web assembly as a runtime into Envoy, and then to be able to target that runtime with development experiences that make it easy for people to write those kinds of customizations.

ADAM GLICK: Istio is API driven. What have you learned about APIs to drive a service mesh through this process?

LOUIS RYAN: We've learned quite a few things. The first thing we've learned is that we have a lot of capability in the system and that people want access to all of it. Istio is interesting in that it's not trying to be a generic abstraction over N different pieces of infrastructure.

We chose Envoy, and Envoy has a lot of capability, and so people want to be able to use it. And so we have to make sure that we have an API surface that exposes a decent subset of that capability to users. Otherwise, they end up being frustrated. They're like, I should be able to do this thing. I can go read that Envoy can do this. Why is that not exposed in the Istio APIs?

And that's a general trend that I've seen play out over the years with APIs, where you're providing infrastructure to people, and people have demands of that infrastructure, and that those demands just generally grow. And you can try and curtail the API service that you present as an infrastructure provider, but eventually, you're going to end up providing the majority of the capabilities of the system in the API.

Now, that creates an interesting set of trade-offs, and so then you have to think about how do I do that in a way that doesn't hurt the usability of the API, turn off users who have simpler requirements. So you have to put a lot of time and effort into your API design.

And we've certainly learned a lot about doing that for Istio, ensuring that we make sure that people only have to use the APIs when they need to. The default installation or the default out-of-the-box experience gives people the majority of what they want with very little interaction with the APIs, and then making sure that if they do have to use an API, it's aligned with their kind of role or how they see themselves-- what role they play in an organization.

And then lastly, if you do have a lot of capability, that that is kind of structurally layered within the API, so you don't see all the artifacts of that capability when you're a user who only needs to interact with the more kind of generic or higher-level concepts of the API. So those are all kind of API design best practices.

CRAIG BOX: People talk about Istio installing 54 resource types into Kubernetes. Is that a lot, and why is that a number people care about?

LOUIS RYAN: It's probably a lot. It's probably also not a number of people should care about. We had all those resource types-- effectively, we had one resource type per extension vendor. And so if you didn't care about a plug-in from a certain vendor, then you weren't going to ever interact with that resource.

Now to be fair, we probably shouldn't have installed it. So that's a fix that we made in the 1.2 release. So by default, you only get the kind of core API types, and then you can choose of the extension types for specific integrations that you just pick up those resource types. But for the most part, people shouldn't care.

ADAM GLICK: Is Istio aiming to be the fastest and lightest service mesh out there or what's the core goal?

LOUIS RYAN: There's always an interesting set of trade-offs. We certainly want to be fast, and certainly faster than we are today. We've had a lot of experience here at Google about building a service meshlike technology, and we have a pretty good idea of what type of performance envelope it can live in.

It's not always the most useful thing. Customers-- their concerns often aren't driven by latency or bandwidth as the first thing. They may care about getting a strong security posture. And so making sure that that works well out of the box is probably more important to a broad class of users than shaving off those last 15 microseconds.

So most of our engagements do tend to be feature driven. That's what people are looking at. They care about getting good telemetry, good security, good policy, good traffic management. And then you will start to see people care about performance. We have mechanisms to help people in very, very performance-sensitive situations, car traffic out of the mesh, so that they can get to be line speed.

But certainly over time, we have quite aggressive performance goals for Istio, but I wouldn't say they're the highest priority thing in the product right now. They're just something that we'll keep working and grinding away on organically. And that's usually kind of how performance goes anyway within the project.

CRAIG BOX: Do you see Istio joining an open-source foundation?

LOUIS RYAN: Over time, certainly. Istio is certainly built on community principles. And I think we recognize that being part of a foundation will work in support of those principles. So right now, it's an ongoing discussion in the community, but it's certainly something that we're interested in doing over time.

ADAM GLICK: Speaking of things evolving over time, lots of people would like to get involved in Istio. There's a lot of excitement about it. Platform9 put out a survey that they did at KubeCon and listed it as the project that most people are evaluating and investigating right now. What are the areas that you'd like to see people join the project to help with?

LOUIS RYAN: If I get to choose anything, the thing I think we could use the most help with is just helping getting Istio running in more environments, helping people with the onboarding process. We have a fair number of people working within the project making sure that Istio works well out of the box on Kubernetes.

We don't have quite as many people working to make sure that Istio works well out of the box on this VM environment, or that bare-metal environment, or Mesos, or Nomad, or some other container orchestration system or workload orchestration system. So help in those areas would be very much appreciated.

But mostly, just I want to be able to grow the community. So if people don't know what they want to work on, would like to show up and come to a community meeting, they'll find a very long list of things where their help would be appreciated. But in a world of ponies and unicorns, certainly more environmental support would be great.

CRAIG BOX: All right, Louis, thank you so much for joining us today.

LOUIS RYAN: Oh, happy to chat, guys. Thanks for having me on.

CRAIG BOX: You can find Louis on Twitter @louiscryan. And you can find the new Istio 1.2 at istio.io.

[MUSIC PLAYING]

CRAIG BOX: Thanks as always for listening. If you've enjoyed the show, please help us spread the word, tell a friend, write a review. If you have any feedback for us, you can find us on Twitter @KubernetesPod, or reach us by email at kubernetespodcast@google.com.

ADAM GLICK: You can also check out our website at kubernetespodcast.com for show notes and transcripts of each episode. Until next time, take care.

CRAIG BOX: See you next week.

[MUSIC PLAYING]