#153 July 9, 2021
Debugging Kubernetes often involves correlating what happened just before something went bad. Itiel Shwartz is a co-founder of Komodor, a startup who builds a platform to help with exactly that. We talk Hebrew names, Hungarian dogs and German car crashes.
Do you have something cool to share? Some questions? Let us know:
CRAIG BOX: Hi, and welcome to the "Kubernetes Podcast" from Google. I'm your host, Craig Box.
CRAIG BOX: Since Adam left, we've been running with guest hosts every week, but there's going to be a little bit of a busy schedule over the next few weeks. So I'm delighted to invite our longtime producer, Jimmy Moore, to the show. Welcome, Jimmy.
JIMMY MOORE: Hey, Craig, thanks. Long time listener, first time caller.
CRAIG BOX: Jimmy’s going to help out for a few weeks. And seeing as Conan O'Brien's just gone off the air, I thought that he might make a great Andy Richter, sitting as a sidekick and throwing in the occasional joke, laughing in all the right places.
JIMMY MOORE: [LAUGHS] Absolutely.
CRAIG BOX: I've been enjoying Conan's podcast. It's called "Conan O'Brien Needs a Friend." I love podcasts very similar to this format, where there's just someone that you're interested in and you want to talk to and you just have a chat with them.
JIMMY MOORE: Yeah. Actually I've been listening to "Revisionist History" lately by Malcolm Gladwell. He kind of peels back the layers and tells you what really happened on the history stories that we hear back in school. It's absolutely riveting, and he has just enough snark to keep me laughing the whole time.
CRAIG BOX: Now, Adam was great at recommendations. He'd almost always have a game or a TV show that he'd looked at over the course of the last week. And he pings me with them every now and then. And on that late night theme, I remember a story that Johnny Carson, look him up, kids, he'd send in jokes to David Letterman after he retired. And every now and then, Letterman would read out one of Johnny's jokes. And apparently he was very happy with it.
So I'm going to share Adam's recommendation for the week, which is a TV show called "Mythic Quest." Apparently it's based on RPG gaming, and so it probably hits both of those sweet spots very well for a large number of our listeners.
JIMMY MOORE: Sounds like targeted content.
CRAIG BOX: All right. Well, Jimmy, it's a pleasure to have you here, and we look forward to speaking to you for the next few weeks.
JIMMY MOORE: Until you find a better option, really.
CRAIG BOX: Well, we're going to teach you a lot about Kubernetes.
JIMMY MOORE: Yeah.
CRAIG BOX: So what is your favorite part of the Kubernetes ecosystem?
JIMMY MOORE: Honestly, my favorite part so far is sharding and resharding. I'm not sure what it means, but I'm a big fan.
CRAIG BOX: I think the order is very important. If you reshard before you shard, anything could happen.
JIMMY MOORE: It's a disaster.
CRAIG BOX: It could well be indeed.
JIMMY MOORE: Yeah. I remember that from a few episodes ago. That's right.
CRAIG BOX: All right, well let's get to the news.
CRAIG BOX: We don't often cover geopolitics on this show, but this week brings news that Russian military intelligence is using Kubernetes to run brute force attacks against government and private sector targets. In the joint cybersecurity advisory published by US and UK intelligence agencies, claims are made that the GRU are targeting Microsoft Office 365 and routing traffic through the onion router, or Tor, as well as commercial VPN services. Nothing specific to Kubernetes is implied other than it is a means to run the distributed attack. But the report very much wants you to know it was being used. And so we shall, proving it to anyone listening from Russia. Listening to this podcast, that is.
JIMMY MOORE: It's possible the team at the GRU are spending more than ever before to run their reported Kubernetes cluster. The CNCF and sister org, the FinOps Foundation, have published the results of a micro survey, which says that over the past year, 68% of respondents reported that Kubernetes costs increased. Among those with an increased spend, half saw it jump more than 20% during the year. 10% of respondents spend over a million a month on Kubernetes. The vast majority don't know what their state is going to cost to run, with 44% relying on only estimates and 24% not monitoring spending at all.
CRAIG BOX: Some more survey results. Canonical ran a Kubernetes usage survey. And once you get past the what is your favorite animal questions, you will see the most common reason for adopting Kubernetes is improved maintenance, monitoring, and automation. And the biggest blocker for adoption is lack of in house skills or limited person power.
JIMMY MOORE: Only 10% of respondents in that survey were only running one cluster. So it's timely that the CNCF end user radar for the month covers multicluster management. The good news is that almost all the tools looked at were ready for immediate adoption, while the bad news is that you've got a lot of choice as to how you manage your clusters and workloads.
CRAIG BOX: After five years and release candidates, the runC project has released version 1.0, a small number of bug fixes, and the promise of no breaking changes, cap off a project built by over 400 contributors. runC is the CLI tool spun out of Docker when the Open Container Initiative was formed and is used by high level run times like containerd.
JIMMY MOORE: Linkerd creators, Buoyant, announced the open beta of Buoyant Cloud. This service was first announced as Dive in November 2018, and two and a half years later has finally reached the public. A free tier is available for all Linkerd users, with tiers at $100, $1,000 or “contact us” per month.
CRAIG BOX: Finally, developer Xabier Larrakoetxea has built an SLO generator called Sloth. Sloth generates SLOs for Prometheus based on a Kubernetes manifest that scales and is easy to understand and maintain. It can run as a CLI application or a Kubernetes operator and supports the recently released open SLO spec.
JIMMY MOORE: And that's the news.
CRAIG BOX: Itiel Shwartz is the CTO and co-founder of Komodor with a K, a startup building a troubleshooting platform for Kubernetes. Previously Itiel worked at eBay, Forter, and was the first developer at Rookout. Welcome to the show, Itiel.
ITIEL SHWARTZ: Hey Itiel.
CRAIG BOX: Now, I'm told that Itiel is not a common name, even in Israel. And because that is a term in the industry, I understand that you had a bit of fun going and talking to a company that actually works in the ETL space?
ITIEL SHWARTZ: Yeah, yeah. Itiel is not a common name. For anyone who doesn't know, ETL means extract, transform, and load.
CRAIG BOX: Was that what your mother was thinking about when she named you?
ITIEL SHWARTZ: No, no, like everyone is asking me, are both your parents developers or something like that? And no, sadly no. Itiel in Hebrew is "God is with me". So it originated from that area. But as I became a software engineer, I became aware of the fact that there is such a thing as an ETL engineer.
So I went to Aluma. Back then it was an Israeli startup doing ETL as a service. And they were so excited that my name was Itiel that the CTO there just offered me, on the spot, a job. He said we must have you. You'll be our lead ETL engineer. You are going to be ETL. It's going to be amazing.
I didn't pick up on his offer, and later they were acquired by Google. So who knows where I could have ended up. But yeah, it's also very confusing where you work in a company that has a lot of ETL engineers. So I always turn my back like, did you call me? No, no, I had a problem with the ETL process or something like that.
CRAIG BOX: Yes, I believe that's called nominative determinism.
ITIEL SHWARTZ: Yeah, yeah, I also think so.
CRAIG BOX: Do you have children?
ITIEL SHWARTZ: I have one daughter. I'm married plus one.
CRAIG BOX: Is her name SQL?
ITIEL SHWARTZ: Her name is Ella, which I think will be better both for English speakers and for Israelians. So no more cloud names for me. It's too confusing.
CRAIG BOX: Before founding Komodor, you were the first developer at Rookout. Rookout is a company in a similar space, at a very broad level. What's different about what they did to what you do now?
ITIEL SHWARTZ: I was the first developer in Rookout. I joined the company basically because I really enjoyed troubleshooting. I'm not sure if it's a fetish or something like that, but I love solving issues and problems. Rookout in a sentence allows developers to put non-breaking breakpoint on their code basically to debug their code in production.
CRAIG BOX: So just points then.
ITIEL SHWARTZ: Yeah. It's like a debugger but for production. A very neat product. But a lot of the inspiration that I got to found Komodor was from my time in Rookout. Not only that I was the first developer, but I also did sales engineering for the first six months of the company. They needed someone technical, and I'm quite technical, and that can also speak to people and solve their issues. So I was a natural fit.
And I went for one company after that, or like dozens, hundreds of different companies. And I saw how hard it is for them to troubleshoot. And I was shocked. The first question that I ask them is — you're troubleshooting, you have an issue — how do you know what code is currently running in production? And I always thought it's the most obvious question. I don't know. You have your code. You moved it to production. Everyone knows that the code is currently in production. So very obvious for me.
But it turns out that for a lot of the organization I speak with, back then and even now, they don't really know what is currently up in production. Usually the answer is, I don't know. Maybe I can go check in the Jenkins for the last pipeline or something like that. Or better yet, I'm going to ask the DevOps in Slack. This is the best way to get this data. And for me as someone who loves to troubleshoot, it's the number one question. What code is currently running? When did it change and who changed it? My first question every time I have an issue. And I think that for most people it's quite common.
And after I saw the dozens of different companies, and it can be small companies and can be really big companies, don't know how to answer this question, I felt like there is an issue here. And I started also thinking about why, as a developer, as an ops person, I didn't really feel the pain. And it was strange to me, because I thought to myself, how can it be I never felt this pain? And the funny answer is because I wrote a lot of automation to make this data very visible.
In eBay, there's a whole ops team. I didn't really write the ops part of eBay. In Forter, I wrote the part that allows you to understand what changed and when did it change. And also for Rookout, one of the first things I did is building a very comprehensive CICD solution. And there it was very obvious what was the previous version, what is the new version, and what get, commit, or pull requests or info changes has changed between those two versions.
And for me it was obvious. It was almost the first thing I did in Rookout as a first developer so we can move faster. And it's very easy to get started if you don't have a legacy system. So I think I went a little bit off topic here. But this is the answer about both Rookout and a little bit about the originated story of me and Ben founding Komodor.
CRAIG BOX: Yes, your co-founder is Ben Ofiri. How did you meet him?
ITIEL SHWARTZ: Both me and Ben studied together in the university. We were in the same study group. I studied computer science and psychology. Ben studied computer science and economics. And we really wanted to be the best. And we hung out in the same study group, and we really tried to be good in computer science. And Tel Aviv University had at the time a very naive startup incubator. I don't know, like a program. To be honest, it was quite bad.
But me and Ben both signed up and we were in the same team. We had, I don't know if it's a terrible idea, but we were super naive at the time. It was more of e-learning for students. We didn't go public with the startup, but it did allow us to know that we love working together. And it was quite obvious for both me and Ben that we are going to do something together in the future. During the university, Ben started working for Google where he was a software developer and later became a product manager. He did a lot of infrastructure work on Google. So a lot of the stuff came from his experience at Google.
I started working for eBay as a software, like a back end engineer. Then I joined Forter, which is another Israeli startup. I was there responsible on the main flow. Forter in a sentence is like a e-commerce fraud detection service. The interesting part about the company, it's a great company, is the fact that every time we were down, it cost us and our customers a lot of money. And that means you really don't want to be down. And if you have downtime or something like that, you do a very thorough post-mortem to understand how you could prevent it for next time, why did it happen, and how can we prevent it. So I think there my love to troubleshooting had started.
Forter became quite big, and I joined Rookout as the first developer because I wanted to experience the startup life firsthand before starting my own company. It was an amazing experience, going into building a product from scratch. I said that I was the solution engineer for the first six months. So going into companies, seeing them use the product, seeing them getting value and debugging faster using Rookout was super fun. And I learned a lot during my two years there.
But after two years, both me and Ben felt like it's our time to shine again and to join forces. It was quite obvious for us that we are going to do something in the DevOps space, the troubleshooting space. The pain point was obvious to us, that more and more companies are moving to Kubernetes. They are taking their monolith and breaking it. And at first everything looks amazing, but there's always the day two, where your app is now crashing in production and you have no idea why did it crash, when did it change. Did one of our dependency just change? Maybe a feature flag was just opened, a database migration, a cloud problem. There are so many different things. And a monolith, even it has a lot of downside. When you have a problem with a monolithic application, you just look at the monolith. And it's very complex, but you are looking only on one specific pipeline, one specific application.
CRAIG BOX: It all used to be so easy.
ITIEL SHWARTZ: In eBay and Rookout monolith, it wasn't easy, but it's a different set of problems.
CRAIG BOX: It's knowable, perhaps. You can see the state. You can keep it in your head.
ITIEL SHWARTZ: Exactly. It's knowable. And also companies have already spent years building monitoring tools for their monolith application. Things were written in blood, sweat, and tears, making sure the application is alive and we know how to track it. And then someone in the team, in the organization, said we should move to Kubernetes and let's take this monolith application and transform it into hundreds of different microservices.
Everything will be faster. We'll be agile. Every team will have a full ownership. So things sound really good when you are on the first step of your Kubernetes migration project. This really looks good. Everything that bothered you on your last application is going to be resolved, like auto scaling, auto remediation, very simple CICD, because all you need is a YAML. So everything sounds amazing.
And a lot of our listeners that are already running on top of Kubernetes already know it. You need a level of experience with Kubernetes to make sure everything really works. Not only the hello world of Kubernetes, but making sure a production grade application is running on Kubernetes. You need to spend time and experience, basically, writing the best practices, understanding what are the best tools.
And I see a lot of companies that reach out to Komodor in order for us to help them with the migration part, or after the migration when they're starting to suffer from a slowness in the R&D. Instead of going faster, they are in a slope of everything is harder. I don't know how to troubleshoot. I don't know where to look at. There are different teams. One team is breaking the other team and so on.
CRAIG BOX: So you've got a problem. You've got obviously something that people are suffering with in the real world. You've got a co-founder. You and Ben sound like you've got a little bit of a similar skillset, in that you both did computer science and something else at university. He did a bit of work on product management. You did work in solutions engineering and so on. So in that situation, how do you decide who's going to be CEO and who's going to be CTO? There's a picture of the Komodor team on the web, and he's a foot taller than everybody else. Is that why he's in charge?
ITIEL SHWARTZ: Yeah, Ben looks a lot better than me. To be honest, it's a podcast, so you can't really know. But he's a lot taller than me. I'm not that short, but he's tall.
CRAIG BOX: You're sitting down in this photo, in fairness. So it's hard to make a real comparison.
ITIEL SHWARTZ: No, it's good. It's good that you can't make a real comparison. It's good for me. We didn't really need to decide. I love Kubernetes more than Ben. Ben was at Google, where you have, you're on the base of Kubernetes, like work and your own things. And I really like the Kubernetes ecosystem, and I think I love technology more than Ben. He was a software engineer at Google. It's not like he's not a technical person. But it was quite obvious from day one that I'm going to be the CTO and I'm going to be in charge of the technology and Ben is going to be in charge of bringing in the customers.
A lot of the product is done together. I lead the product, but Ben is with me. The product is where both of us meet each other on a daily basis. But other than that, it really works well. Even that both of us are software engineers on training and on our past life, because I was a solution engineer for a while and did a lot of sales and even a lot of marketing stuff and Ben was a product manager at Google, then we bring to the table a lot of non-technical experience that was super valuable for the first six months of the company.
CRAIG BOX: Have you heard of a guy called Jack Tramiel?
ITIEL SHWARTZ: Not at first. He was the guy who invented the Commodore computer. And we named Komodor not because of the computer. A lot of people ask us, did you really enjoy playing with it? And we're a bit young. So the answer is not really. But we love the Kubernetes ecosystem. Everything is naval based. Like Kubernetes, Helm, and so on. And Commodore is like an admiral, like a Navy general. So we thought it's quite a cool name.
Other than that, Komodor, it sounds similar to the computer, which is nice because a lot of people have really good association with us. And also there is the Hungarian dog named Komodor or Komondor. I'm not sure how to pronounce it. Super cute dog. If you don't know him, search it. How do you describe him, Craig? Do you know the dog?
CRAIG BOX: Yeah, of course. When one researches the show, you type Komodor into Google image search and all you see is pictures of these lovely dogs. It's like a big, shaggy sheep dog. I know it's not this exact breed, but I went to a fair once here in the UK where they had a contest for the person who looks the most like their dog. And there was a guy with big black dreadlocks and a Puli, which is a Hungarian dog, a little smaller than the Komondor, but the resemblance was astounding. You absolutely have to check out this picture.
ITIEL SHWARTZ: Ah. He's really similar.
CRAIG BOX: Yes. There is the man who looks like the dog. You can find it in the show notes.
ITIEL SHWARTZ: Yeah, yeah, yeah, I like it. Don't miss it. I really like the dog. I wanted him to be the mascot of Komodor. He was our GitHub icon for a while. But he's too much detail in order to print it. So we switched back to a different logo. But I really like the dog. He was my icon for Slack also for a while.
CRAIG BOX: Jack Tramiel led a very interesting life. And for anyone who's interested, as I am, in computing history, and my first computer was a Commodore. It was a VIC-20 that my father bought probably around the same time I was born. But Tramiel was in the army, and he started a company, and he wanted to name the company something military related, something with an implication of seniority. And he said that the general was taken. There was General Motors and general this and that. There's a lot of generals. And there was already a computer company I think named Admiral.
And so he says that he was in a car in Germany and he almost had a car accident. The car had to brake really hard to the car in front. And the car in front was an Opel Commodore. And we saw that and said, right, that's the name for my company. And that was back in the '50s, as well. This started as a typewriter repair company. So it was a very long time until we get to the Commodore which you might recognize and which I, for one, have very fond memories of.
ITIEL SHWARTZ: Yeah. I didn't know the story, the name story of the name Commodore for him. So good to know.
CRAIG BOX: Commodore with a C, of course.
ITIEL SHWARTZ: Yeah. To be honest, we thought about, are we a Komodor with a C or a K? Mainly because of Kubernetes, we went through with the K. The right way is with a C.
CRAIG BOX: You mentioned before that you have trouble troubleshooting things yourself and then you build out tools and then obviously that's a thing we want to make those available. And you're definitely not the first people we've had on this show who have done that. I think the team from Pixie Labs and the team from Okteto, for example. It's even true of Docker to some degree. They were building out a platform, and it was the tooling that they built that they found that was more interesting to themselves and then also maybe to the public than the platform they were building. Do you find that that's true not only for yourselves but for the people you employ?
ITIEL SHWARTZ: Yeah. We employ people who really like Kubernetes. And almost all of them had a lot of troubleshooting experience. So when we are thinking about developing a new feature or a new capability, we just take the team and start asking each one, would you find that helpful on your previous company? Or would it be helpful for you if we do A, B, or C? So having a team that is very passionate about both Kubernetes and troubleshooting has indeed made our life easier because we have our own beta testers out of the box even without talking with customers.
CRAIG BOX: Let's dig in, then, to what it takes to troubleshoot Kubernetes. Kubernetes doesn't really have a huge amount of information about the things that are running on it. You can label workloads, and that's really about it. And you can't really infer — there's no dependencies between services. If you want to know what's calling anything else, you start getting into service mesh and Layer 7, for example. How much is in Kubernetes that you can read from, and then how much do you have to infer or ask users to provide themselves?
ITIEL SHWARTZ: Once we struggle with quite a lot at first, trying to understand how much data can we collect from Kubernetes and how much data we need from our customers. So Kubernetes has a lot of metadata on the things that are running. When was the last change, what is the health status of each one of its pods. But that's pretty much it. I will say that the interesting part about Kubernetes is there are a lot of different Kubernetes resources. For example, let's say you have a problem with your application. You're not getting any requests. And you have a simple deployment. What a lot of people do is check if their app is running.
Let's say it is running and no errors and the health checks look great. Then for someone who doesn't really have a lot of experience, maybe a developer, he will just call the DevOps and tell him, my app is not getting any traffic. Please fix the issue. He's not thinking about the four, five resources that Kubernetes has in order to route the traffic to his pod. So you have service, you have end point, ou have load balancer, you have maybe an external load balancer. All of this just to route the traffic from the external world into your application.
What Komodor does, we know all of these things before you tell us that. You know that you have a service. We know what pods are the endpoint. We know the load balancer. So we know a lot of things about the topology of your system even without you needing to tell us anything. And a lot of people are quite surprised when they first integrate with Komodor, because they get quite a lot of value out of the box, after five minutes of installation just because we really know Kubernetes and we know the ins and outs of Kubernetes.
Other than that, the very cool thing about Kubernetes is that it made other tools more standardized. Basically if you have Datadog or Prometheus monitoring your Kubernetes application, there are different standard ways that almost everyone is using. For example, for Datadog, you have the DDN and DD service that represent your application in the IPM. Prometheus and Grafana have the generic names that are taken from the deployment name and from the service.
So even when we integrate with other tools, we can bring a lot of value out of the box, because we already know your application is running on top of Kubernetes. We know this monitoring system is monitoring Kubernetes, and we can give you, for example, like you said, a dependency map. We take the dependency map from Istio or Datadog or Jaeger or Zipkin. Doesn't really matter for us. Where does this service map exist? We take the data and we enrich the Komodor brain basically with all of these new connections.
So to answer your question, we can't bring a lot of value out of the box by only reading the Kubernetes data. But because Kubernetes has made other tools standard, we can also provide out of the box value for other tools without the need for our users to configure things. Because users really don't like labeling things and putting a lot of annotation. Sometimes they do and we resort to that. But we do prefer working hard so our users don't need to work hard.
CRAIG BOX: Does the system have to work first for you to realize what things have changed in order to break it? And I ask because Komodor sort of represents itself as something's gone wrong. The pages have gone off. Go to Komodor and look and see what's changed. That sort of implies that it was working and now it is not. If I'm building something out and I'm not sure if it will work, can I still use Komodor to debug it?
ITIEL SHWARTZ: Yeah. You can use Komodor to debug it. I will say that, let's say you are in the middle of your migration program, because everyone is moving to Kubernetes. It's obvious for you that things are not really working at the moment. But for teams like that, a lot of the time, what we see is a lot of different people are pushing to production and are changing the system really rapidly. And then your system is not working and this is OK. It's not production yet. It's still staging or dev or I don't know. But you have no idea how did it change over the last day. And then you need to go and ask, or go to Jenkins and GitHub and do those correlations in your head.
So I will say that we shine when you have a big issue in production, when the system is down and you go to Komodor to find everything that changed and to find how a different component has changed, or not your service, but the surrounding. So this is where we shine. But I will say that we bring a lot of value to companies that are in the middle of the migration to Kubernetes basically because we give them a one stop shop for everything related to Kubernetes. They don't need to know another UI, another UX.
And we see because we are not like a Kubernetes dashboard. We are smarter. We have a lot of logic on what is unhealthy. What is a deployment? When did the deployment start? When did it finish? When did it fail? So even people who are not really Kubernetes experts can go into Komodor and solve issues really fast without the need of the ops team or the expert Kubernetes in the company that everyone keeps on hammering with questions. Why is my application not working? Why am I not getting any traffic? Why is my health check failing? Or a lot of different questions.
CRAIG BOX: This is a way of looking at the state when there are lots of different people making changes to an environment. Another way of potentially understanding your environment is to only allow one person to make changes to it. And that is in some regards the promise of GitOps, is that you have one source of truth and you have an agent that just actuates all of those things. I'm not 100% sure that that's as easy to buy into as I make it out here. But do I still need something like Komodor if I'm bought into that GitOps idea?
ITIEL SHWARTZ: Yeah, yeah, yeah. So I just wrote a blog post about GitOps. I think it should be published in the next week or something like that. I think GitOp has a nice promise. I like the promise. I like GitOp and I don't really like deploy script. So sounds like a match made in heaven. In the end of the day, Git simply doesn't represent what is happening in your production.
Let's say I just merge from staging to production. It's not like a binary state. Like it's not working or not working. We are now in the middle of a transition. And maybe some of the pods are not working. Some pods are working. Maybe the issue is not with your service that is GitOps but with a feature flag you just enabled in LaunchDarkly. GitOps, when you have a lot of different repositories, let's say I'm working in multi-repo environment, when you have an issue, you start opening seven or eight or a dozen of different repositories. Then you go to the production branch. Then you try to figure out what changed. And even then, you don't really know if the code was merged and working successfully, or it was merged and the pod crashed because you got the image pulled back off or something like that.
So we do integrate with GitHub, obviously. We also integrate with Argo, which I really like. I think it's one of the most interesting open source projects for Kubernetes. But our customers have Argo, they try being in GitOps as much as they can. But you can go and ask people who are using GitOps, is troubleshooting easier since you use GitOps or now we know what changed?
And the answer is no, mainly because both third parties, the complexity of the current state of the application, the number of GitOp repositories, those are the main reasons. But overall, I love that everything is in Git. I think in order to be confident with your system, you need something else. And almost all of the big companies we are talking with also have Jenkins or Spinnaker or any other tool for the CD solution. And it's not like a pure GitOps model.
CRAIG BOX: My favorite bug from the last week was a certificate transparency log that has invalid hashes because one bit got corrupted somewhere along the way. And you can put that down to cosmic rays or solar flares or something. But it goes to show that there are a lot of problems that occur outside a Kubernetes environment. Perhaps it's trying to connect to an internet site to get issued a certificate and that's down. Or even the Kubernetes environment, perhaps the node has a full disk and so things can't operate. Can I use Komodor to debug those kinds of problems?
ITIEL SHWARTZ: Yeah, yeah. No, the issues are like the number one feature requests from our users, or just adding it to Komodor. You never suspect the nodes until it is the nodes.
CRAIG BOX: You suspect DNS first.
ITIEL SHWARTZ: Yeah. Why should my node be down? And we heard it from a lot of different customers. And the main problem with issues that are outside of your Kubernetes, or even nodes. Your node just got a new Kubernetes version and it was auto-updated by GKE or EKS. People don't suspect those areas. And also a lot of developers who know how to do basic Kubernetes troubleshooting, they don't even do a 'kubectl get nodes'. It's not part of their workflow out of their playbook. So we are now adding more and more infra-related capabilities into Komodor. And our goal is to provide you with very clear work to look at once you have an issue.
For example, I'll give you a real scenario from one of our customers. Your app is currently unhealthy. Out of nine pods, three are down. OK, it is weird. But using Komodor, you can understand that all of those three pods originated are running on the same node. And did he just changed or he just got restarted or things like that. So we bring our customers more value than just being a very good and sophisticated deploy dashboard. It is important and there is a very big value only in that.
But the Komodor layer two is the ability to track things both on the infra Kubernetes level, such as nodes, clusters, and versions, and now we are also adding dependencies, such as the database and queues and trying to tackle those as well. Because like you said, a lot of the issues that affect your Kubernetes cluster sometimes originated from something else, such as a DB. SSL certificate, to be honest, it's also one of the things that pop up quite a lot for users.
Again, mainly because it's the things that you don't really suspect and it takes you hours and hours of troubleshooting to say maybe we will check the DNS. Maybe let's check the SSL certificate. Maybe someone just changed our, I don't know, the control plane in EKS. So we want to be there for you and running all of those tests, all of those checks for you. So you don't need to do it manually and have all of this experience and expertise. It's a question of both time — it takes a lot of time to troubleshoot — but it is also a question of experience.
For a novice developer, you can't really expect him to know all of these things, because most of the time he's writing code. And people told him that once he's going to be in Kubernetes, he should be responsible maybe for part of the troubleshoot. So he's spending hours on issues that he didn't cause, and he doesn’t really know how to troubleshoot. So we see a lot of companies coming to Komodor not only to help the DevOps troubleshoot — even it is our main use case — but also to empower the developers. Mainly to give them the visibility, the knowledge, the experience in some way, of an expert developer, an expert DevOps.
And the Komodor platform is built in a sense of making troubleshooting easier, not only for the experts who know how to do everything, troubleshooter, but also for the normal developer who is on call once every two weeks and is afraid of going on call. Or he knows is going to run into a lot of problems and the DevOps is shouting at him that is always waking him up in the middle of the night.
CRAIG BOX: Funny you should mention nodes there. I was trying to debug a problem in some third party Kubernetes installation scripts recently where I'd had to manually set some labels on nodes in order for the pods to schedule. And then I came back a few days later and nothing was running. It was like, oh, it turns out the order updater has run, updated all of my nodes, and of course the manual changes I made in my testing weren't reflected. So that is very much a thing that can happen. Is Komodor something that you install in your own cluster or is there a SaaS component to it?
ITIEL SHWARTZ: There is a SaaS component. We try not to hammer the Kubernetes cluster. So we have a very slimmed down agent that basically listens to the cube API sends Komodor all of the data. Like the metadata, no PII or something like that. And we do all of the heavy lifting on our SaaS product. Basically, we built a model off your cluster. We understand the dependency map for your application between clusters, between clouds, because we are Kubernetes native. We don't really care where are you running. So it's very easy to install us on hybrid solutions.
CRAIG BOX: Do you solve problems in Komodor using Komodor?
ITIEL SHWARTZ: Yeah, sure. Komodor is the first go-to tool in Komodor. The product is still evolving. So every time we have a post-mortem, we end it with, what can we put inside Komodor to help us prevent it for next time or mainly to solve it faster.
CRAIG BOX: That's a very convenient problem to have. Our software didn't detect this, and so that means we can fix it.
ITIEL SHWARTZ: Yeah, eating your own dog food is very satisfying, solving issues with your own platform. You really know it's working when you do it.
CRAIG BOX: You've just recently taken the $21 million funding round on top of $4 million of seed capital. And I assume it's easy to get investment from VCs, or for the purpose of this let's pretend it is if you've got a good product. But I'm more interested in the fact that you've got angel investment from CTOs, CEOs, and co-founders of companies like GitHub, Atlassian, Snyk, Aqua Security. How did those relationships happen and how did that come about?
ITIEL SHWARTZ: I will say most of them are friend of friends that heard about Komodor. When you do something that looks cool and solves a real issue, then people start talking about you. So one person brings the other one because he's a good friend of them. Like the Atlassian CTO and the GitHub CTO and VP that we got from Facebook, all of them I think they are good friends. So one of them is like, I saw Komodor. You should check them out. They are solving a real problem in a very interesting way.
And we are fortunate to have those people with us. Having them one phone call away already solved us a lot of issues or problems, mainly taking their experience of selling a developer product or DevOps product and learning from both their mistakes and successes. We found it super valuable. And I'm really happy with the angel team that we assembled. I think each one is an amazing person with a ton of relevant experience.
CRAIG BOX: And aside from the capital that they put up, I assume that they've had to solve very similar problems being in that kind of developer tooling space.
ITIEL SHWARTZ: Yeah, yeah. The first question we ask is how did you solve it internally? I think Atlassian has a very interesting internal system. I think they wrote about it somewhere about how they troubleshoot issues in Atlassian. Super interesting. I had a whole session seeing their system, understanding the pain points that they try to solve. Very interesting. Facebook also has something very similar. And it makes you feel like you're going on the right track when you see a company like Atlassian or Facebook spending dozens of developers building this platform. And because we are much more generic, then we are building it once and it's going to run everywhere like Java. So it is very satisfying.
CRAIG BOX: And you mentioned before, it's nice when people talk about it. When we announced that you were in your open beta on this show, I went back and looked at the notes and said that you're sending out swag to people providing product feedback. What could I be getting?
ITIEL SHWARTZ: To be honest, I stole the girl, Ortal, our designer who did design for Rookout, my previous company. She creates really incredible swag. So I get her to design the swag for Komodor as well, like the stickers. So you have stickers. You have socks. You have the hat. You have shirts. You have everything. Basically, we really love listening to people using Komodor and getting customer feedback. Because all of the company, and me in particular, we live Kubernetes and troubleshooting.
So going in with a callout for our customer, even a new customer, someone who never used Komodor but is having issues in Kubernetes is super satisfying. So we wanted to repay them and basically send them something small from Komodor. I will say that at first we wanted to send everything to the office of the companies, our customers. But then they told us no one is currently at the office, so we started mailing across the US. I'm not sure how many states did we send swag already, but quite a lot, I think. And it's very nice seeing your own swag. Our stickers are the best. Feel free to PM me, anyone who wants to get Komodor swag. We have a lot.
CRAIG BOX: I am very far behind on sending out stickers. So I apologize to anyone who sent in that they'd like a sticker. A good 12 months behind. It's been a bit of a year for that. Where would you like to see Komodor go next?
ITIEL SHWARTZ: I think my vision, the goal, is to be synonymous with Kubernetes, basically. So very similar maybe to Datadog and Prometheus in the metrics monitoring space. I can't really imagine me going up to production without having Datadog or Prometheus installed and running. And our goal is to make Komodor the same for anyone who is running on Kubernetes. So a lot of companies are using Kubernetes. A lot more are going to migrate into Kubernetes in the upcoming years. And I want all of them to have Komodor installed and helping them solve real issues.
So I think, again, Datadog is a great example, because they have a really good product. They are also good friends and so on, and we do integrate with them. But they solve a real issue, and they became the standard. Let's say with [INAUDIBLE] and Prometheus. We invented our own category, in a sense, troubleshooting Kubernetes. It wasn't the thing two years ago. And I really hope it will be a new category in the upcoming years.
CRAIG BOX: Do you think it won't just become a feature that the APM and the monitoring companies have?
ITIEL SHWARTZ: We are thinking about that. I think that most monitoring companies, the claim to fame is showing you the data as accurate as possible without any digestion. And I love Datadog. They show me the CPU. I created monitors. The APM, really great product. But in the end of the day, what Komodor promises you is to use our knowledge of Kubernetes to help you troubleshoot faster.
And we don't only take the data that you get from Kubernetes and show it up. Because if we do so, you will have millions of different events every day. We clean the data. We know the data, and we put it in a way that is very easy for you to use. So I think it's a very different promise than the monitoring guys. So yeah, I really hope we are not going to be a feature on Datadog and so on.
CRAIG BOX: Do you see a future where your product becomes more predictive at using machine learning, for example, to tell you what might go wrong rather than presenting the data and having you look for it yourself?
ITIEL SHWARTZ: Yeah. AIOPs and ML, they are like a silver bullet, but in the end of the day, they don't really work, from my experience.
CRAIG BOX: Unless you're trying to slay a vampire.
JIMMY MOORE: Yeah, yeah, yeah. So I'll be happy to use MLOPs. We do start using it on the edges of the product. But going forward, the very cool thing about Kubernetes, again, because it's so much standard, then different companies have same issues. So even without a very sophisticated ML, you can bring out of the box value. For example, your nodes issue. So things like that tend to repeat. I hope to be good with AIOPs and use ML to even further increase the offering. I think a lot of DevOps don't really trust AI in terms of monitoring because so many tools tried and failed using AI. So I don't want to be another company putting everything on an AI and then failing.
CRAIG BOX: All right, well wherever things end up, I wish you all the best. And thank you very much for joining us today, Itiel.
ITIEL SHWARTZ: Thank you, Craig. It was a pleasure.
CRAIG BOX: You can find links to Itiel on Twitter and Komodor in the show notes.
CRAIG BOX: Thank you very much for joining us today, Jimmy. It's your first show. What did you think?
JIMMY MOORE: Hey, I had a great time. Thanks so much, and hopefully if Conan's people are in need, give me a call.
CRAIG BOX: If you enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter, @kubernetespod, or reach us by email at firstname.lastname@example.org.
JIMMY MOORE: You can also check out the website at kubernetespodcast.com, where you'll find transcripts and show notes, as well as links to subscribe.
CRAIG BOX: See you next week.