Kubernetes Podcast from Google: Episode 55

#55 May 28, 2019

Solo.io, with Idit Levine

Hosts: Craig Box, Adam Glick

Solo.io was founded in 2017 by this week’s guest, Idit Levine. She talks to Craig and Adam about API gateways, service meshes, and lots of project names with two O’s in them.

Do you have something cool to share? Some questions? Let us know:

ADAM GLICK: Hi, and welcome to the Kubernetes Podcast from Google. I'm Adam Glick.

CRAIG BOX: And I'm Craig Box.

[MUSIC PLAYING]

ADAM GLICK: So you're back home from KubeCon, Craig?

CRAIG BOX: Yes. We spent another couple of days after the event traipsing around Barcelona and its surrounding areas, mostly in art museums. We went to Gaudi's Casa Batllo, which is a regular house that a madman redesigned and took out all the straight lines and just poured curves into everything he could find. We went to the Picasso Museum where we had a regular artist who just had an epiphany and decided he'd start drawing strange Cubist things, and then up to Figueres where you see the Dali Museum, where-- I'm not sure Dali ever was a regular artist. He definitely had something going on in his head.

ADAM GLICK: Is there such a thing as a regular artist? We have one of those buildings where they talk about having no right angles in Seattle. It used to be called the EMP, and they've updated it since then.

CRAIG BOX: It is the Museum of Popular Culture, MoPOP.

ADAM GLICK: MoPOP, yes.

CRAIG BOX: A lot of exhibits there that actually do trigger the things that I very much love. So one time, there was both David Bowie and the Muppets, and I managed to buy myself a t-shirt wearing both of those themes at the same time.

ADAM GLICK: Gosh, I hope it's Kermit the Frog with the famous lightning bolt across his face.

CRAIG BOX: No, it was Beaker, but you got the theme, basically. But anyway, they just don't seem to make artists like they used to. I was thinking afterwards, Picasso and Dali are household names-- both Spanish, obviously. And then there was Andy Warhol and a bunch of artists which I would say never quite made it to top tier prominence. And aside from Banksy today, I'm trying to think of anyone who will have that level of infamy 100 years from now.

ADAM GLICK: I'm wondering if the Warhol estate might take umbrage with the "not making it to prominent" statement, but perhaps it's just shifted to different types of art. I would think of movies, music. Lots of household names there.

CRAIG BOX: Oh, absolutely. It's just I wasn't sure if it was since the rise of popular music nowadays. It's all Kanye West all the time.

[LAUGHTER]

Thank you so much to everyone who came up and said hello at KubeCon. Thank you especially to those who came to our lounge meet-up and shared a little Canadian liquor with us. There's a fun story there. We have a listener, Francois from Quebec, came up to me after Google Cloud Next a couple of months ago and said, I love the show. Thank you so much. Here's a bottle of this fine Canadian liqueur. It's a thing called Sortilege.

It is Quebecois, as is he, and it's basically half maple syrup, half whiskey-- or, as best as I could tell, alcoholic honey. And I said, thank you. That's a wonderful gift. Thank you so much. And Adam obviously couldn't make it for Glick 2.0 reasons, as we've explained on the show. But next time I see him, I will make sure to drink to the health of his child and also share it with our community. And that is indeed what we did.

ADAM GLICK: Yes, it was lovely. And hopefully, we can link a picture of some of the folks who were there to share with us. Special shout-out to some of our friends who we were able to run into at KubeCon-- Wilbrod from South Africa, Laura from Ireland, Stephane from France, Alex and Lynn from Zurich. And thank you to everyone who grabbed a sticker and who was able to tweet out a picture at us. Sorry for some of you who we weren't able to meet up with there at the show. Hopefully, we'll have a chance to meet up with you at a future show.

CRAIG BOX: Let's get to the news.

[MUSIC PLAYING]

CRAIG BOX: Microsoft, along with mesh vendors Solo.io, Buoyant, and HashiCorp, announced the Service Mesh Interface, a spec which is designed to provide a standard interface for service meshes on Kubernetes.

The spec, released under the Deis Labs brand, works similarly to Ingress in Kubernetes, offering a subset of features specific to the Kubernetes implementation of service mesh, but not implementing them. There are adapters for Istio as well as for Linkerd and Consul. We're glad to see more people interested in the service mesh space, and you'll hear more about SMI in today's interview.

ADAM GLICK: More news from Microsoft at KubeCon last week. The Cloud Native Application Bundle team announced Pivotal had joined forces to work on the spec, and the project is approaching a release candidate. We'll talk to the CNAB team on an upcoming show. Also, the virtual Kubelet project announced version 1.0 with performance improvements to signal its readiness for production workloads.

CRAIG BOX: Banzai Cloud have released an operator for Kafka. The astute listener will know that there are three such operators in the ecosystem already, including one from Kafka sponsor Confluent. All are based on StatefulSets, which mean you can only ever remove the last broker that you added when you scaled down. Banzai solution is pod-based-- a KafkaSet, if you will-- where you can remove arbitrary brokers, unlocking more fine-grained control. Along with this, they have released a Go library to help compare if two Kubernetes objects are the same, ignoring fields which are amended by the server after being submitted.

ADAM GLICK: Last week, IBM announced a new open source project called Razee, not to be confused with the similarly named motion picture awards. Razee follows the trend of multi-cluster Continuous Delivery tools for Kubernetes that are becoming quite popular as organizations take on more automation and DevOps operations. Razee focuses on templating deployments across environments and clouds, and helps you gain insight into what applications and versions you are running on them.

CRAIG BOX: Couchbase announced version 1.2 of their Autonomous Kubernetes Operator, adding new features while insisting you keep your eyes on the road. The new release supports automated rolling updates, an admission controller for validating configuration, and production support on the three big clouds.

ADAM GLICK: Rancher have announced Rio, a Kubernetes based micro-PaaS described as "making containers" fun. Rio installs on your Kubernetes cluster and brings with it Istio and Knative, as well as adding features like LetsEncrypt, simplifying the configuration and deployment of container applications. No word yet if Duran Duran are making any contributions to the project.

CRAIG BOX: Want to run Atlassian software stack on Kubernetes? Praqma, with a Q, are a Scandinavian company specializing in continuous delivery. They have open sourced their Atlassian Software in Kubernetes solution, abbreviated A-S-K, or ASK. ASK is actually a set of scripts to help put Jira, Confluence, and Bitbucket into containers, but ASD wasn't as catchy an acronym.

ADAM GLICK: SAP have announced that their open source application framework called Kyma, spelled K-Y-M-A, has reached 1.0. It contains three main components-- an application connector which allows you to iterate on a component without having to release a full product update-- somewhat like an API gateway in front of it-- a server component that has you write Node functions called lambdas for running functions, and a service management part which provides a UI for service discovery.

CRAIG BOX: Congratulations to Intuit who won the CNCF End User Award at KubeCon last week. The CNCF called out Intuit's involvement in the community as well as its use of many cloud native technologies, including Kubernetes, Istio, Prometheus, Fluentd, Jaeger, and Open Policy Agent, as factors in their winning. Their prize, a free blog post.

ADAM GLICK: Last week, Brian Liles said his old company CapitalOne were all in on Kubernetes and are having lots of success so far. They appear to be doubling down on that by opening up their Kubernetes platform called Critical Stack to external developers and [announcing] its eventual offering as an enterprise package. CapitalOne made the announcement last week at the Collision 2019 Conference, which is as close to the word "crash" as I ever want my bank to get. This announcement follows their acquisition of the startup named Critical Stack in 2016 and its beta release in 2017.

CRAIG BOX: And that's the news.

[MUSIC PLAYING]

ADAM GLICK: Idit Levine is the founder and CEO of Solo.io. Previous to Solo.io, she was the CTO of the cloud management division at EMC. Welcome to the show, Idit.

IDIT LEVINE: Hey. Thanks so much for having me.

CRAIG BOX: Your company was founded in 2017, so it wasn't named after the Star Wars movie. What is the story behind the name?

IDIT LEVINE: Yeah, that's actually funny. So I was looking for names, and at the same time I was looking to actually raise money. And what happened is that, when I tried to raise money, specifically because I'm from Boston, I tried to do it on the East Coast.

CRAIG BOX: You can tell you from Boston by the way you said "Boston."

IDIT LEVINE: Boston.

ADAM GLICK: I was like, that's impressive. You have the Israeli and Boston accent merged right in there.

IDIT LEVINE: Yeah. It's ridiculous, right?

[LAUGHTER]

Yeah, it's bad. But yes, I tried to raise money there. That's almost impossible. First of all, there's not a lot of VC there. And second of all, even if you have them, they really are conservative. They needed partners and so on. So then, I went to the West Coast. I get money with no problem. But to spite those people. I call it Solo, which is amazing. I like it.

CRAIG BOX: So you are the sole founder of the company?

IDIT LEVINE: Yes.

CRAIG BOX: That's the story?

IDIT LEVINE: Yeah. They really wanted me to take a partner, and I said, why? I just need to build a product. It doesn't make any sense at all. That's one thing. And the second reason is that I'm a real geek. So it's Solo.io. And if you're taking the S from it, it looks like 0-1-0-1-0. So I really liked that.

ADAM GLICK: Binary association. I love it.

IDIT LEVINE: Exactly.

ADAM GLICK: What was the first product for Solo.io?

IDIT LEVINE: So when we actually started, the biggest problem that we saw on the market is that-- I come from a lot of startup companies that got acquired. I was in DynamicOps, which got acquired by VMware. We did like vRealize or cloud when cloud wasn't cloud. It was called virtualization back then. And then, I moved to another startup company, got acquired by Verizon. And even when I was in EMC, I was doing advanced development.

So wherever I was, it was very innovative, and I put a lot of open source out there. And what I noticed is that, when I'm putting open source out there, at the end of the day, there's not a lot of people who really understand it, because they can't really adopt it -- because there is a huge gap between where they are actually in their process, like those enterprise, to where actually the people are in the process of innovation.

And basically, in Solo, that was my purpose. My purpose was to close this gap, bring those people with me so we can innovate together. So basically, when we started, we saw like every company is trying to do a digital transformation kind of journey. The first thing that they will try to do is to migrate from monolithic to microservices.

So our first product was exactly this. It was Gloo. And its purpose was--, in my opinion, the best API that exist out there natively on Kubernetes as a first-class citizen to glue monolithic microservice and serverless together natively using CRDs and very, very innovative. And the idea was to have those companies migrate to microservices.

Now, once they actually migrate to microservices, then they have a whole set of new problems. So how are those microservices going to communicate between each other? How are they going to safely do it? And how do you actually see what's going on? So service was solving this problem-- Istio, Linkerd, and Consul.

But the thing is that I didn't think that it's smart for me to try and actually just work on service mesh. What I did is I tried to understand what will be the problem after it. And what I realized is that the problem that coming with service mesh is that there is so many, and you probably have more than one instance, maybe from the same type. But you will have more than one cluster. Therefore, we will need more than one instance, maybe two Istio. Maybe you will want to use AWS, and then you will need to go to the cloud. So you need somehow to manage this. And this is a very simple orchestration problem.

So basically, what we built, we built the product called SuperGloo. It's basically a service mesh interface for all the configuration between all those meshes. And then it also discovering them, installing them, and also grouping them together into one big mesh. So then, what you can do is basically take App Mesh and take Istio, flatten the networking to them, and basically just have an ability to manage them the same way. So that's the second project called SuperGloo.

CRAIG BOX: Let's start by digging into Gloo. Gloo is an API gateway based on Envoy.

IDIT LEVINE: Right.

CRAIG BOX: What was the ecosystem like when you were making the choice of which engine to use for Gloo?

IDIT LEVINE: To me it was really, really simple. If there's something I'm pretty good at, it's to understand a trend. By Google actually adopting Envoy, it was clear that this is going to be something big. Lyft, Google, big names. I knew that Google was putting a lot of resources behind that, so I knew that that would be good the fastest. Now, I also liked Envoy for three reasons. First of all, it's really, really extensible, and for us it's critical because we need to differentiate ourselves. So we needed to be able to actually write filters and extend that.

The performance is great. And then, besides that, the ecosystem is amazing. So I knew that it would go very fast. So that was like no-brainer for me. This is what we're building on.

ADAM GLICK: You mentioned before SuperGloo and the next phase of that. How did you decide to build that? And what is in SuperGloo that takes Gloo to the next logical step?

IDIT LEVINE: So as I said, before what we wanted to do is just take service mesh and build on top of it. But we realized that it's not ready. And we also recognized-- it's actually what happened in the KubeCon like a year and a half ago. I recognized that it will be harder for customers to adopt it. It's a really invasive kind of solution. You need a sidecar next to all your microservices. You need to redeploy all of them, and so on.

I felt that it would be much easier for me to go to a customer and said, look, you have this problem to migrate, and here's an API gateway that you're already familiar with. API is a concept that people running forever. And it's pretty simple. So basically, when I went back to the team, I said, how can I build this thing on the API layer and API management, but still going to give them the benefit to actually migrate? And that's exactly what we did. That's what Gloo is.

But once I saw after it that Istio started to mature, there was others, like Consul Connect that we actually helped build-- so the first implementation for Envoy in Consul Connect was actually us, because we knew Envoy very well. So they needed help. We were working with Linkerd. We were working with all of them, and we basically recognized that there's quite a lot of problems still to solve in this ecosystem. And it makes a lot of sense that you might need different solutions.

Like for instance, if you're running on AWS, you should use App Mesh. And the reason is because it's natively working there, it's free, and it's integrated with everything that they have. But it's not open source, so when you're running on point, you have to use something else. So this ability to give the customer the ability to choose, I think that's what's the first thing.

And that's what our customers are excited about right now, because they don't know which to choose and to take, like I said. It doesn't really matter what you choose. If you feel right now better with Linkerd and then you want to swap, just swap it. But we have an API that you don't need to change any kind of configuration on everything you build on top of it.

So that's where we started. But then, we also wanted to solve the problem of how easy is-- and this is what SuperGloo is doing-- how easy it is to use Istio, for instance. It's pretty complicated. You need to create four different objects for each way, for instance. So we make it way more-- if you want to see it's kind of like the OpenShift of Kubernetes. It's just easier for the user.

And then, the last thing, we also recognized that because you will need to use more stuff, we wanted to group them together, the meshes. So for instance, having production, one cluster in there, you want to group them together and basically use the same Prometheus or use the same root certificate. And the last thing that I will say is that the only reason we actually wanted to do that is because we are interested on building on top of the mesh, and for that we needed an API. I didn't want to create four different implementations for every application that I'm creating, and I wanted to have this abstraction layer that I will be able to choose.

CRAIG BOX: Do you think-- regardless of which one it is, do you think service meshes are inevitable?

IDIT LEVINE: Yes, definitely. I think that, as I said, that's the best choice, in my opinion, to solve the problem that the microservices ecosystem is, you know, we needed to deal with when they're moving from microservices distributed application. It's hard. And that's probably, in my opinion-- there is a trade-off of performance, but I think it's so little that it's bringing a lot of benefit.

CRAIG BOX: If I have 400 nodes that I need to run, I have to make decisions about which zones I run them and whether I run multiple clusters and whether I run in multiple regions or on multiple clouds, and so on. You make decisions in large part based on how you want to pack workloads together and redundancy requirements.

IDIT LEVINE: Right.

CRAIG BOX: Is the decision tree the same with service meshes? Should I be saying, I want a single service mesh across all of these environments? Or might you want different meshes in different environments?

IDIT LEVINE: Yeah. So today, everybody in the community agrees that basically, per cluster, you need to have one instance of a mesh. But this is exactly why we think that SuperGloo and the Hub orchestration makes a lot of sense, because you will have more than one for sure, and you will have more than one for a different reason. Maybe those guys have production, this one is staging and it's a developer, and it's looking different. It's a different configuration. They're talking to the different metric system.

So you need the ability to actually group them together. And I feel that that's probably-- multi-cluster is the big problem that everybody talking about it, and we're basically writing exactly on it. So yeah, I believe that that will be the result of the multi-cluster work that is being done.

ADAM GLICK: Is this technology something that you plan to provide as open source into Istio, or is it something that will live separately that people will choose to plug in?

IDIT LEVINE: So in the last KubeCon, we just announced together with Microsoft something called SMI. SMI is basically the API of SuperGloo, only the configuration, only the actual translation. And the reason we actually partnered with Microsoft and HashiCorp and Buoyant on this is because we felt that it shouldn't be ours. It's not ours. It's a community story. First of all, we were working a lot with Service Mesh for the last year and a half, but are we sure that we have the best API for that? No. We should ask customers and users what they want.

So what we thought is that it's bigger than us, and we wanted to give this, at least a spec, to the community and hopefully to the CNCF, and basically let the community decide what this right API should be like. And as I said, SuperGloo is more than this, and it's open source as well. So at the end of the day, it's all open source. Everybody can use it. But it's doing the grouping, it's doing the discovery, it's doing the installation that is not done in the SMI, Service Mesh Interface.

ADAM GLICK: You mentioned the announcement that was made at KubeCon about SMI, which is the Service Mesh Interface.

IDIT LEVINE: Right.

ADAM GLICK: So like a programming interface. It defines how things can talk to each other but doesn't choose which particular mesh would sit underneath it. Which meshes do you support with SMI? What were you thinking about when you built that?

IDIT LEVINE: We are about to support all of them, but there is a difference between the SMI, the Service Mesh Interface, to SuperGloo because we just didn't have enough time to move all the code to SMI. So I see myself-- I think there is an Istio adapter. There is some telemetry that Linkerd did. And there is Consul policy. But we're actually way ahead in SuperGloo because we've worked in it for years. So we're supporting Istio, of course, App Mesh, Consul Connect, Linkerd, and we're supporting all of the options that they have. So we're way more mature. Yeah. But we will move it. It will take time.

CRAIG BOX: Having taken that API and built the Service Mesh Interface, where do you want to see that go?

IDIT LEVINE: As I said, what I believe that it's very interesting-- as I said, there is use case for customer today. They want to try. So this is what we saw from our customer. We came to people in the beginning when service mesh started. We said, guys, tell us, what are you doing with service mesh? And what we discovered is that they're spending a lot of resource to actually investigate. So they're starting with Istio, and maybe it's too complicated, so they're moving to Linkerd. But guess what? It's not supporting Envoy. So is that going to really win? So they're going to Consul, but Consul only supports Layer 4. So it's a problem. And then, App Mesh, but it's not open source. S

o the question is, which one do you choose? You learn four different APIs. And is it really necessary? If you think about it, most of the people that will use service mesh, will use a very special use case that are very-- 80% of them probably will use only some-- a few API, and that's it. Do I need 50-- whatever, 60 API that Istio has? Probably not.

CRAIG BOX: Gabe Monroy from Microsoft described it to me as Ingress for Istio. The Ingress object is in Kubernetes. One of the challenges of the Ingress object in Kubernetes is that it has described a minimal set of behaviors, and for every vendor's different implementation, they've had to implement annotations to describe that. How is SMI going to solve that problem?

IDIT LEVINE: So that's a good question. Every time where you're trying to extract something and come with a little community, you will have this problem. So the way we fix it in SuperGloo-- and it's not in SMI yet. The way we're fixing it is that we're basically creating a system that, if you want the basic functionality, that's not a problem. You can use whatever, SMI. But if you want something that is a little bit more specific or more special, you will be able to actually go directly, for instance, to Istio, do it there. And then, we're not going to overwrite it.

We're going to merge that together. So this is special CRDs in Istio that we are not touching. And if you guys want to still put it there, we're going to merge it and serve it to Istio. So in the end of the day, we're letting you the ability to have all the spec if it makes sense. But I would argue that probably 80% to 90% of the time it just would be good enough. And actually, Tim Hockin is a good friend. And I will not-- I don't think that it's a good definition to compare it to the Ingress because I don't think that the-- me and I think Tim and everybody think that the Ingress of Kubernetes wasn't doing a good job. And actually--

CRAIG BOX: Yeah. We spoke to him on the show about that and--

IDIT LEVINE: I know.

CRAIG BOX: --he said, well, there needs to be a version 2 of that. And so do you think that there are lessons from the SMI that will help influence version 2 of the Kubernetes Ingress?

IDIT LEVINE: We came with something very basic based off-- we were working a lot with a service mesh. I feel that we need input from the community. So it's just the beginning. It's very, very new. It's very, very fresh. But I feel that the community will have a lot of influence over how those APIs will look, and they should.

ADAM GLICK: Can I build a mesh of meshes if what underlies all of those is different? Do we end up with a lowest common denominator there, or is there a way to actually take the benefits and extend that?

IDIT LEVINE: OK. So one of the things that we can do is basically group meshes. In the end of the day, what is common between those meshes is usually Envoy. It's almost always the common-- almost. Let's put Linkerd once second aside. So theoretically, I can take two meshes and just flatten the networking through them and treat them like one big. And theoretically, I don't even care which type of mesh is it, if it's Istio or Hashicorp, for instance, because it's using Envoy, so I can do the flat networking.

And then I can use the SMI as the API to kind of translate. So to me, it's one big mesh. It's like the Mesos vision. You have a lot of computers. Let's treat them as one big [computer]. And then, let's put stuff on top of it. And that's exactly what I'm passionate about in the SMI, the stuff that we can build on top of it.

ADAM GLICK: Is there a way to avoid ending up with a lowest common denominator, where each of the meshes has their own particular unique features they provide, but once you bring them together and if you have a standardized interface, is there a way to actually leverage the benefits that you get for the different meshes that you may have connected? Or does it become, you have to extend the SMI, and those pieces need to grow within SMI in order for those features to benefit across all the different meshes?

IDIT LEVINE: OK. So we working a lot with all those meshes, and what we notice is that-- for instance, a retry, it's a good example of a big problem to merge together, because the definition of retry in Istio, it's totally different than the Linkerd one. It's just not the same. So now, if you're coming in, calling it a function, how do you do this? It's not doing the same thing. It's not even the same meaning, but they call it the same name.

So that was challenging. We overcome it, I think. But that's the problem. I think that Istio is way the most mature one, and they're going the fastest, which means that if you're going to go and try to do any function, any command in the SMI, any API, you will discover that most of them will say not implemented because just, really, they're not implemented yet.

So we can't do fault injection for Linkerd or for Consul Connect. And we built a product on top of it called Gloo Shot, which is a Chaos Engineering on the top of the mesh. We can run it only on Istio. But we did reuse the SMI. That way, if they will add this functionality, boom, we get it for free.

ADAM GLICK: You mentioned something in there called Gloo Shot.

IDIT LEVINE: Yeah.

ADAM GLICK: You want to explain what Gloo Shot is?

IDIT LEVINE: Yes. I will maybe go first to explain that what my company wanted to focus on is basically building stuff on top of-- extends the use case for service mesh. So service mesh is great because what it's doing-- yes, there's three things, which is great. But what I think is most special about the service mesh is the way they fix, they implement that. And the way they're doing it is they'll basically detach their logic business application from actually the operation when they're putting it on the sidecar. By doing this, they abstract their network. And if they abstract the network, there is way more we can do.

So we announced recently something called Service Mesh Hub, and this is something that you can just go there. It's free. It's great. And what you can do is-- it's basically based on SuperGloo and SMI. What it's doing is you can install whatever mesh you want, or either we can discover one that you already have. And then, it's basically just very simple. You can manage that.

But what's special about it is that it's basically bringing the experience of iPhone or Android to this experience of the mesh. We want to make it simple to use. So we should make it very simple to install, but also to extend. So what we created is like an extension store. Right now, it's free so-- "store". But yeah. But the ability to--

ADAM GLICK: It's a catalog.

IDIT LEVINE: Yeah, exactly. That's the word I was looking for. And basically, what you can do is just basically click on this and extend. Now, because it's very knowledgeable about the meshes themself, discovery, all of this, it can easily just be smart about it. For instance, if you want to install something like Flagger, which Flagger is basically a canary deployment by the Weaveworks guys. It needs to know where Prometheus is.

Now, you can go into the YAML and configure it, but if we already discover everything that you have in this mesh, we know which Prometheus you're using. And it's pretty simple for us to just basically pipe it for you.

So it could be as simple as just extending your iPhone or extending your Android by downloading an app from the App Store. So that's the experience that we want to do it so it's free. Just go use it. The idea with this is that we want the community to build on top of it. We want the community to be able to put stuff on it without understanding mesh. So we're creating an SDK that they will be able to leverage the quality of actually abstracting the network without actually needing to understand how the mesh is working. So that's that the second thing.

And yeah, Gloo Shot is just a Chaos Engineering. So we have some interesting project. Chaos Engineering on top of mesh, it makes a lot of sense because if you think about it, today Chaos Engineering, you need to import a library. But then, it's specific to the language, and you also need to change your code.

CRAIG BOX: All of the things that a service mesh frees you from.

IDIT LEVINE: Exactly. So what we said, what if we can basically fault inject some latency and on that proxy level. Then you don't need to change anything. So we open source it. It's a Go project, very simple, and very, very good, but leveraging basically based on the SMI. Right now it only supports Istio because Istio is the only one that supports fault injection.

And the coolest one, in my opinion, that we're doing, it's called Loop. And what it's doing is it's solving the problem of recording bugs in production and actually being able to replay that after it outside production with all the data that was with it.

So how does it work? We're actually working on it with the Lyft guys. What we're doing is that every time that we actually, a request coming, we're saving all the data, basically recording all the data, headers and body. And then, in the end of the request, if everything good, we're just tossing it so we don't override the network.

But if something is wrong anywhere, we're basically saving it. We're calling it a loop. And then you can basically just add loops forever. And then you can do replay. And the replay just basically go to spin it up, inject all the information that's coming from the database to everything that you have in production. And you can attach debugger and just basically replay that.

ADAM GLICK: That's really interesting. How far back does it go? So does it just take a look at a single back-and-forth transaction, or will it actually have this sense of, like in databases, where you can lack an entire transaction. If you need to do microservices architecture, you may be communicating with a dozen different services in order for one failure that's actually making it fail. Does it have the ability to think about the entire transaction throughout the system, or just that one connection between server and client?

IDIT LEVINE: No. So we actually want the transaction, because maybe you have a cascade error. So we wanted to get all of this. So basically, it's all the transaction. This is one use case. Another use case that-- this is where Lyft want it. So Lyft is actually have their own system internally of support. And if someone is actually going to say, hey, my Lyft is not here, they want to actually be able to click on the button and record everything that is related this user in the stack.

And then, basically, we can record everything and let the engineers understand what's happening there. So I think it's pretty awesome. And it's solving the problem with OpenTracing, because OpenTracing is basically-- it's the ability to have all the logs, but then it's a lot of data on the network. So what do you do? Use samples. The biggest thing is samples. And you're only taking the header because it's too expensive. But do we really care about the ones that are successful? No. So what if we only go and detect the ones that fail?

And then, we can actually save all of it, give you all the information that you can. And because of the sidecar way with Istio and all those meshes, we can actually record also when it's going to the database, when it's going to a stream, when it's going to everywhere. And then we can get this snapshot of all the things that you had in production where it's happening. So I think it's pretty cool.

CRAIG BOX: Kubernetes has the challenge that it has to describe all the things that are possible in all of the API objects. But then, it also has to be easy to access. And so there's been a lot of projects that exist to try and make it easier to describe all of these CML files. Istio comes along in a very similar way and says, like, here are all of the things that we know you might need to customize them.

On the whole, people only need to do one or two of them, but we need the things to be available. And so then you get platforms that build on top of them to make it easy. So Knative is a great example. You can come along and say, hey, I've just got some code or I've got a function and a container, and I'm going to run that on the platform. You have integrated Gloo into Knative. Can you tell us a little bit about that?

IDIT LEVINE: Yeah. So Knative is a great product, and we're working very, very close with the Google guys. And we supported from the get-go any function-- because Gloo is all about the migration, so we really care about serverless. What happened is that when Knative actually got out, there was-- it was depending only on Istio. So you had to install Istio in order to actually use Knative. And the Google guys got a pushback, basically, from the community to say, why? Why do we need all those CRDs when all we're trying-- basically using is two CRDs. Can we make it simple?

And there, Google basically needed to evaluate if they can give an alternative. And they explored all those API gateway ingresses, and they chose us, which is really cool for us. And we basically seamlessly integrated that.

So, basically, today the only official support right now from Knative is that you either can go with Istio, which if you're already using it, it makes a lot of sense. But if only you use the Knative functionailty and you're not interested in, should just use Gloo and is it seriously one command line. glooctl install knative. It's installing you Knative with the Gloo already backed inside. It's just working really seamlessly.

CRAIG BOX: Are there cases where a function or a program running inside Knative needs to talk to something else as a back end where you'd need a full mesh between services rather than just the API gateway?

IDIT LEVINE: You don't really because all the point with Knative is that it's basically coming from the outside. It's a function.

CRAIG BOX: So if a Function A wants to call a Function B, it goes all the way back out through the gateway and back in again.

IDIT LEVINE: Yeah, and therefore it's not a huge extension. The only thing that you are going to get better if you're using something like Istio is, for instance, the telemetry. But that's something that we're going to fix really soon because-- so Knative has three projects. So the serving is the one that we integrated with, but there is also the eventing.

And the eventing, it specifically was a very hard dependency on Istio. So basically, that's not something that we did. But now, the Google guys actually working to abstract it, and we're helping them. And then, basically, we're going to-- and they ask us to basically add Gloo there. So will add that, and then, basically, even that's solved. So in the end of the day, it's Envoy.

ADAM GLICK: What's next for Solo.io?

IDIT LEVINE: So we have quite a lot of projects coming up. I will give you some example of something that we want to attack. So what is our job? Our job is to make service mesh easy to use. So let me give you an example of something that it's really hard today for user. OK, so you install Istio. That's not that hard. The problem is actually starting when you're trying to configure it.

Now, that's something that you have to start from zero today. You're installing Istio. You need to configure from scratch. The question is, do we really have to do that? Think about what's happening with Docker. Today, when you're taking a Docker container, do you start from zero? No. You're doing it from Ubuntu or from a current--

CRAIG BOX: You're copy and pasting from Stack Overflow.

IDIT LEVINE: Exactly. And how do they do it? They do it with layers. So what we did, and it's coming up very soon, is on this Hub that we did, we're going to basically have an ability for sharing configuration of meshes because we want community. We think we can go faster with community and want people to actually contribute that.

And then, basically, what we did, we basically did an internal implementation that basically layers. So we use kustomize plus Helm and basically layers the configuration for service mesh. And then you can say, well, everybody using MTLS. So why should I start it from the beginning? I can just take his implementation, and then tweaked it. And then you can actually start from 80% and just tweak it.

So in our app, you can install mesh, extend it, configure it. Done. Which is pretty cool. So you can go much faster. So I think that's something that we should fix, and we're doing it. So it will become very soon. I think that another problem that have to be solved, and actually probably a problem that I heard the most when I was [at KubeCon]-- but actually, we already started working on it, so it's good for us-- is Google built it, all your engineers are amazing.

Those guys know how to write application and actually understanding mesh and configure it. That's not the scenario in a big organization. Usually, they have operation, and they have engine app. And do we really want the app? Now, before that, it wasn't too hard because whatever the configuration of the retries it what is depends is was baked inside the application. But now service mesh solve it. It's actually took it out from the operation. Now it's going to the sidecar.

So the question is, I'm as a application owner really don't want to understand mesh. It's too complicated. How can I communicate right now to my actually operation to tell to him what I need to be configure? And this interface is actually not defined yet, and that's what we're working on. And I think that the end result will be plug-in for CI/CDs. That's where we believe that it will end. But that's what we're working on right now, and I think that's what may be useful.

CRAIG BOX: Autotrader have built a delivery platform where their product people or their engineers basically say, here's my container, and here are some lines of configuration. And some of that might be, this is what I want my retry policy to be. Some of them might be Istio configuration. Some of them might be Kubernetes configuration. Do you think that there will be more platforms like that that are describing not just the way something's to be run but the way the network is to be configured alongside it?

IDIT LEVINE: Yeah, definitely. Definitely. I have no doubt that that's what's going to happen. It makes sense. Why should that treat different than anything else that you're putting compute and storage? So it should be definied together. And as I said, this tool that we build, we is this kustomized template for this. I think there's something that is simpler to do. So that's the direction we're going with.

CRAIG BOX: What's the story with the logo?

IDIT LEVINE: Actually, it's nothing interesting. I needed a logo. We did just like 99 Design competitions. It was horrible. So I basically said, well, I need to go specifically after a designer that I know that I like their logo. So I basically figure out who is the Docker designer because I really like the whale. Reached out to him. He's somewhere in Indonesia. And I basically said to him, I need a logo. He did the owl, and he did also Gloo. Also everything-- all our products is like there is a line of-- all of them look the same with this character, so it's all his doing. And yeah, he's good. He's really good.

CRAIG BOX: All right. Thank you so much for joining us today.

IDIT LEVINE: Thank you so much for having me.

ADAM GLICK: You can find Idit on Twitter @Idit_Levine.

[MUSIC PLAYING]

Thanks for listening. As always, if you enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @KubernetesPod, or reach us by email at kubernetespodcast@google.com.

CRAIG BOX: You can also leave your feedback in the form of a review from your favorite podcast client. Then, go check out our website at kubernetespodcast.com, where you will find transcripts and show notes. Until next time, take care.

ADAM GLICK: Catch you next week.

[MUSIC PLAYING]

View More Episodes

Solo.io, with Idit Levine

Chatter of the week

News of the week

Links from the interview

Transcript