Kubernetes Podcast from Google: Episode 194 - Kubernetes on Vessels, with Louis Bailleul

#194 November 24, 2022

Kubernetes on Vessels, with Louis Bailleul

Hosts: Abdel Sghiouar, Kaslin Fields

Louis Bailleul is a Chief Enterprise Architect at PGS. After years of running highly-ranked super computers to process PGS’ seismic data, Louis’s team at PGS has lead a transition to Google Cloud. Listen in to learn about HPC in Google Cloud with GKE, and to explore using Kubernetes to do processing on vessels at sea!

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

Listen to the KubeCon NA 2022 recap episode

News of the week

Links from the interview

Transcript

Show full transcript

ABDEL SGHIOUAR: Hi, and welcome to the "Kubernetes Podcast from Google." We are your hosts, Abdel Sghiouar.

KASLIN FIELDS: And I'm Kaslin Fields.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Last episode, we spoke to some folks at KubeCon 2022 in Detroit, Michigan.

KASLIN FIELDS: That was our first episode as the new co-hosts of the "Kubernetes Podcast from Google," and we had a lot of fun putting it together. I hope you all enjoyed it as well.

ABDEL SGHIOUAR: After this, we'll do just one more for 2022, and then we'll be back in 2023.

KASLIN FIELDS: This week, Abdel had an interview that sounded really cool when he was telling me about it. Would you give us a little preview of what we can expect today, Abdel?

ABDEL SGHIOUAR: Well we spoke to Louis from PGS. PGS is a company who is running Kubernetes with over 200,000 cores, which is impressive. They're also thinking about putting Kubernetes on floating boats in the ocean.

KASLIN FIELDS: That sounds really cool to me. I know I've talked to a lot of folks who are interested in running Kubernetes in unusual environments, so I'm personally definitely looking forward to hearing this interview. But first, let's get to the news.

[MUSIC PLAYING]

ABDEL SGHIOUAR: Docker has introduced a technical preview of Docker + Wasm. Wasm, or Web Assembly, allows you to write code in many programming languages and run them inside a sandbox environment. The initial focus was on web browser, but it has extended rapidly beyond browsers thanks to the WebAssembly System Interface. With this preview, Docker wants to offer developers a complementary way to run containers depending on their use cases.

KASLIN FIELDS: A security vulnerability, CVE-2022-39278, has been discovered in Istio that allows a malicious attacker to crash the control plane. The Istio control plane, istiod, is vulnerable to a request processing error, allowing a malicious attacker that sends a specially crafted or oversized message to crash the control plane process. This can be exploited when the Kubernetes validating or mutating webhook service is exposed publicly.

ABDEL SGHIOUAR: Google applied to introduce Kubeflow as a CNCF Incubating Project. The open source tool was released first in May 2020 and has developed into an end-to-end extendable machine learning platform, with a large community of contributors and users.

KASLIN FIELDS: Sigstore announced the general availability for the Rekor transparency log and Fulcio's Certificate Authority public benefit services. These services allow open source communities to sign and verify artifacts.

ABDEL SGHIOUAR: In exciting news, the CNCF introduced a new free course for Backstage. This open platform which is used by developers to build self-services portals has been part of the CNCF Incubator since March of this year. The course is available from the link in the show notes.

KASLIN FIELDS: I've also heard a lot about Backstage recently, so I'm also excited about that. And the CNCF also announced the availability of a new free intro course to Istio, the popular service mesh tool. The course is hosted on edX, or E-D-X, and is a collaboration with Tetrate and Tetrate Academy. And that's the news.

[MUSIC PLAYING]

ABDEL SGHIOUAR: We are here today with Louis Bailleul, who is a chief enterprise architect at PGS. Louis has spent over eight years working mostly on PGS's High-Performance Computing infrastructure, HPC. And recently, with the company's shift to cloud since 2019, he's been focusing on leading the team of architects that focus on cloud migration. Welcome to the show, Louis.

LOUIS BAILLEUL: Thank you for having me.

ABDEL SGHIOUAR: We've had a discussion when we were preparing for this episode, and there was quite a lot of interesting things and some very, very crazy numbers. But let's start with — talk a little bit about PGS. What is PGS? What do you guys do?

LOUIS BAILLEUL: PGS, we are a seismic acquisition company, seismic and processing acquisition company. So that means we have vessels that acquire data on the subsurface, and we process it to image the subsurface. And that's mostly used for oil and gas exploration, offshore wind farms, and carbon capture nowadays.

ABDEL SGHIOUAR: Cool. Yeah, I've seen those vessels. They look very interesting.

LOUIS BAILLEUL: Yeah, a lot of people say that. We just cut the half, like the bottom half of the ship, and just keep the front, because they are like a triangle shape and very, very large. The biggest ones are like about 70 meters at the back.

ABDEL SGHIOUAR: Yeah, yeah, if people who are listening to us, google them. They look like the first couple of meters of a regular vessel. So it's just like a triangular shape thingy. So let's talk a little bit about where did you came from to today running, basically, HPC on Kubernetes, and eventually running Kubernetes on vessels, which is one of the interesting, cool things you guys are thinking about doing.

But let's start with the beginning. So you had your own data centers, and you were running some supercomputers there. You had two of them, one called Abel and one called Galois, which I believe are French names, right?

LOUIS BAILLEUL: Yes, they are the names of famous mathematicians, basically.

ABDEL SGHIOUAR: All right, cool. "Galois" sounded French to me. So I'm a data center nerd. I have worked in data centers before. I spent about four years in Google Data Center Europe West 1. If you have deployed things there, I've spent some time there. But I actually have never seen supercomputers. I've seen them in pictures, but I've never seen them in person. What are they? What is a supercomputer?

LOUIS BAILLEUL: It's a lot of the same and a lot of different things. So it's still like CPUs, RAM, network, and all the stuff you find in any data center. It's just highly condensed. So in a supercomputer, the main thing is basically power density. So cramming as much as you can in the same floor space of CPUs, memory, and everything. And also, it most of the time uses very specialized networking. So either InfiniBand or some appropriate fabric in order to reduce the latency node-to-node.

In the case of Abel, which was the biggest supercomputer we had, it was 150,000 cores, 50,000 for Galois. It had the Aries network from Cray, so it was an XC40. So it's a pretty big beast for us, yeah.

ABDEL SGHIOUAR: Maybe those references will resonate with some people who are listening to the podcast.

LOUIS BAILLEUL: [LAUGHS] Yeah, it's a pretty select world, and there's not a lot of communication that's done outside of our world about it.

ABDEL SGHIOUAR: Yeah, that's interesting, because whenever you actually google supercomputers, you just end up on some obscure websites like datacenterknowledge.com, or some very dedicated HPC newsletters or publications that actually talk about them. So then, if I understand it correctly, a supercomputer is essentially like a bunch of servers cramped into the smallest space possible, but probably sharing some sort of control plane. So the motherboards are designed like motherboards, but they share like they're connected to a control plane somewhere, right?

LOUIS BAILLEUL: Kind of. So like in the case of the Cray XC40 that we had, basically, on the same motherboard, you actually have multiple nodes which are like CPU and memory, and they share an I/O network, an I/O chip, basically, that is specialized for that network. So the Aries network is a butterfly topology, and it allows basically to have like node-to-node connectivity with extremely low latency and less than two hops from a node to any node in the network.

ABDEL SGHIOUAR: I see, which means no switches between them?

LOUIS BAILLEUL: So there are switches, but yeah, they are limited to as little as you can. And it's also a source-based routing instead of being packet routing, packet switching.

ABDEL SGHIOUAR: Oh, interesting. OK, yeah, I've heard of that before. So then, the question is, how different or how complicated are they to manage compared to a regular fleet of servers — like if you have, I don't know, 100 or 1,000 servers?

LOUIS BAILLEUL: It's the same, but one order of magnitude higher in the sense of how much you have to care about them. So the main reason why we have so much density and so much requirement for networking latency and interconnection is because the jobs that run on them, basically, are spread across all the nodes and participate in the same computation. So they use a framework like MPI, where they're all sharing like actual code between the different machines to work on exactly the same data set. Some even use RDMA, which is like raw memory access from one node to another, in order to actually compute on the same memory space. What that means is that when one machine goes down, you can impact a job that runs on a thousand machines. And yeah, that's pretty impactful.

ABDEL SGHIOUAR: That's probably not good.

LOUIS BAILLEUL: Yeah, you do have a lot of redundancies, a lot of things to take care of in order to make these things run. And yeah, so one of the aspects, also, is that these machines, because they are crammed so tight, they are water-cooled. So they are direct water-cooled, and that adds another layer of complexity on the actual infrastructure you run into.

ABDEL SGHIOUAR: Yeah, the physical hardware.

LOUIS BAILLEUL: Yeah.

ABDEL SGHIOUAR: Yeah, last time I checked, water and electricity doesn't go hand in hand in data centers. All right. And so you have mentioned also that one of these two supercomputers you had was on the TOP 500 list? I don't remember what was ranking?

LOUIS BAILLEUL: Yeah, we were on 12th in 2014 when it was introduced.

ABDEL SGHIOUAR: OK, so what is TOP 500? What is that list?

LOUIS BAILLEUL: It's a list that gives you bragging rights.

ABDEL SGHIOUAR: [LAUGHS]

LOUIS BAILLEUL: And that's it. But it's a list that is published twice a year, basically, at ISC in Europe and the Supercomputing Conference in the US in November, which ranks the biggest supercomputers in the world. So public ones, so ones that we are aware of. And usually at the top, you find National Labs in the US, or a big European program or Asian program like that. National, basically, investment program, basically, in supercomputers. So a ranking in the top 20, 30, even 50 is usually quite an achievement.

ABDEL SGHIOUAR: OK.

KASLIN FIELDS: Because like the top ones usually have investments that are coming in hundreds of millions of dollars, basically. And they have, usually, an order or two order of magnitude in terms of performance compared to the next top 5, top 10, and so on.

ABDEL SGHIOUAR: Yeah, there is something interesting with being in the tech industry and the ranking things. It's quite an obsession that we have in this industry which is quite fascinating, I guess.

LOUIS BAILLEUL: We like to compare things.

ABDEL SGHIOUAR: Yeah, yeah, we do. All right, and then, around 2018, 2019, you started experimenting with moving parts of your workloads to cloud. I mean, in this case, it was specifically Google Cloud, but it could be any cloud, essentially.

LOUIS BAILLEUL: Yeah, we had a lot of experimentation here in '18 and '19 because our Cray supercomputers were getting old, basically, and we were thinking about the next platform. So the problem with this is big investment on-premise is basically that you have a massive Capex investment every five, seven, eight years, depending on how long you make it. But it's always a fairly hefty one, basically, that you have to replace because you have to go through a whole supercomputer refresh is network, compute, and storage that goes hand-in-hand, basically, that you have to just ditch and replace. And yeah, when you were looking at 200,000 cores, that's a lot of CPU. So if you do the math, it's a big investment in any shape or form.

So we're kind of interested into lessening that Capex investment. And then, one of the ideas was to target the refresh for the average utilization that we have. So on-premise, we used to be running about 80% full all the time, so we're thinking next platform, let's go for 80%, and for the 20% that remains in order to process all the stuff we do for the rest of the year, let's go burst into the cloud and use the elasticity of the cloud in order to do that.

So the experimentation with the cloud was basically under that angle. It's just to reduce the next investment cycle a little bit and optimize for the biggest efficiency possible on-premise.

ABDEL SGHIOUAR: Yeah, so essentially, in a nutshell, it was buying, eventually, a smaller supercomputer and using the cloud as a bursting platform.

LOUIS BAILLEUL: A smaller, probably not.

ABDEL SGHIOUAR: Probably not, but buying less than what you would probably buy if you aim for 100%, right?

LOUIS BAILLEUL: Exactly.

ABDEL SGHIOUAR: So buy enough to cover the 80% use cases, which is through the year, and then use the cloud to burst for the extra 20%.

LOUIS BAILLEUL: Yes.

ABDEL SGHIOUAR: I have seen people talking about bursting to the cloud, which is an interesting concept because, obviously, there is a huge difference between a cloud platform and a supercomputer on-prem. I mean, there is already a huge difference between just regular service on-prem and cloud, but now, in your case, it's two totally different platforms. So the question is, how did that go?

LOUIS BAILLEUL: Interestingly. [CHUCKLES]

ABDEL SGHIOUAR: [LAUGHS]

LOUIS BAILLEUL: It took a bit of time to get our footing. And how do you say? A supercomputer on-premise is a very interesting beast. And there was no direct path. There was no way you can lift and shift your supercomputer into any cloud today. It just doesn't work.

Well, technically you can if you go on Azure, because they can rent you a Cray. But you don't change anything. You're basically just still running on the same platform.

But if you want to go and use the elasticity of the cloud, and basically pay for what you use, there is nothing that approaches it, basically. So we had to think a little bit. And one aspect where we were looking in the early days is that we have the Cray on-premise, but we also had community compute that we used for a bunch of stuff, basically. So the vast majority of our compute power was basically coming from the Crays, but we're still running workloads on standard compute nodes.

So we focused on these ones early on as a way to burst into the cloud, saying that this will actually be easier to lift and shift because it was more commonality between the two platforms. That was OK in the sense that we took what we had on-premise, which was like the scheduler or the different library framework, all that stuff, basically how to run that on compute nodes and basically leverage the IAAS service over there.

And we haven't run part of our filesystem. Like we used GlusterFS on-premise, which is pretty big filesystem type for our HPC workload. And we ran that on GCP, and we had pretty much all like auxiliary services, like databases, location services, that kind of stuff, basically, that you need that was running. And it was OK in the sense that it was functional. But in terms of like cost efficiency, that was not great.

ABDEL SGHIOUAR: Yeah, I mean, it's a very interesting topic on itself, which is I've done some consulting before, and it's actually, when I was in consulting, that's when I met you the first time a couple of years ago, and we worked on some projects together. And at a certain scale — especially at small scale — cloud sometimes doesn't actually make sense because you're just buying more expensive compute nodes. And you are not at a scale that gives you a leverage to negotiate, on one hand, so you cannot get discounts. And you're just lifting from a server that you've already bought and paid for, probably, toward a platform where you have to pay monthly. And this is something that I have, in my consulting time, I have came across with customers many times this. It's like, oh, yeah, it's more expensive. It's more expensive because you're running 10 nodes, right?

LOUIS BAILLEUL: Yeah. I mean, if you don't have an incentive to leave your on-premise, a real one, to go into the cloud for pure lift-and-shift, it's hard to justify the value. There's always things like hear about, yeah, you get more flexibility, or you have better paths to innovation and that kind of stuff, basically. But if you do a lift and shift, you actually miss all of that.

So you have to change things in order to benefit from the cloud. And yeah, the more you change, the more you benefit. But the more the actual engineering investment you have to go through. And it's a decision that you have to make.

ABDEL SGHIOUAR: Yes, that's what we use the very salesy term "modernization" for, right? [LAUGHS] All right, and then, at some point, you said, well, I'm going to go Kubernetes, which is quite a huge shift from on-prem supercomputers, well GCE on GCP, then Kubernetes. So what was that like shifted moments?

LOUIS BAILLEUL: It was a long path toward Kubernetes, and it definitely wasn't something we had decided from the get go. We ended up there more than we actually chose to go there. The main thing is basically, in 2020, we've been impacted like everybody, and we ended up in a financial situation that meant that renewing our on-premise platform was just out of question. There was no way we could actually do it with our finance at the time.

So we had to find a solution. And because we've been getting our foot wet in the cloud, and we're starting to see what was working and what was not, we were starting to see that actually, if we go a bit further in, there is probably a way for us to completely shift our idea on its head and, instead of going 20% burst in the cloud, actually going 80% in the cloud and replacing the credits. So we started to explore that idea, and because we didn't really know what we didn't know and what we actually could do in the cloud, we started with the approach of taking what we knew from on-premise and looking at the different components that have cloud interfaces or have cloud components that could be used.

So we effectively did a lift and improve, I would say, in that case. So we took a lot of the ecosystem we already had and started to build cloud interfaces on it.

ABDEL SGHIOUAR: Yeah.

LOUIS BAILLEUL: And on-premise, we're using Grid Engine as a scheduler from Univa. At the time, Univa started to have a new product, which was called Navops Launch, which basically allow you to bridge your schedule from on-premise with cloud resources. So it was a very interesting way for us to burst because it was actually still using the same components we knew. All the interfaces we had from the user workflow were still the same. And it just basically bridged, how do you actually use the cloud resources?

ABDEL SGHIOUAR: Yeah.

LOUIS BAILLEUL: So we experimented with that for a while, and we even had an MVP and even production running on this today, even today. But we eventually realized that at the scale we are, things that are not cloud-native are extremely difficult to scale at the scale we want.

ABDEL SGHIOUAR: Yeah.

LOUIS BAILLEUL: So anything that is actually trying to interface in a hybrid way, or coming from a mindset or a technology framework that is completely different, it just breaks down. We're too big for this kind of thing.

ABDEL SGHIOUAR: Yeah.

LOUIS BAILLEUL: So we eventually realized that we needed a cloud-native scheduler. And we looked at what existed. And we said, hey, there's a little thing called Kubernetes.

ABDEL SGHIOUAR: I heard this can schedule things. [LAUGHS] Yeah, so that's actually quite an interesting shift of going from — and that's basically related to the point we said earlier, which is going from supercomputers and the user workflow, which you, obviously, don't want to disturb, to the cloud, which kind of does things in a different way. And trying to bridge that gap is quite an interesting problem to try to solve. So then you ended up basically having to build your own scheduler?

LOUIS BAILLEUL: Not quite. The main problem that we had was basically that even Kubernetes is too small — or technically GKE is too small — for us. So we ended up in a situation where a single cluster can scale to 15,000 nodes, but we needed 12 times that.

ABDEL SGHIOUAR: Yes.

LOUIS BAILLEUL: So we ended up with 12 clusters. And one of the programmatics that we had, then, was to OK, how do we move, actually spread the job, across all these clusters? So we created what we call the External Job Manager, which is responsible for setting a workload inside the different clusters. And inside the cluster, we have an Internal Job Manager which itself is actually responsible for bookkeeping and actually scheduling the Kubernetes jobs from the Batch API. Basically, we use the Kubernetes job.

ABDEL SGHIOUAR: Yeah, there is actually a very interesting diagram in the talk you have given as part of Next 2022, which talks or shows where this External Job Manager is and Internal Job Manager is. And we will link it to the show notes so that people can look at it. But basically my understanding is that the External Job Manager dispatches jobs across the 12 clusters you are running, and then the Internal Job Manager will submit the jobs to the Kubernetes API, which then will schedule the actual pods, essentially, right?

LOUIS BAILLEUL: Yeah, so effectively, we get two levels of job object, I would say one that is at the cluster level, which is effectively the actual namespace in which we run the workload because we create a namespace dynamically per workload. Then you get the config map that comes from like the actual job user specification. And you got the services that are creating that namespace that are going to be needed in order to run that job. So we use Redis to actually store the — do some accounting. And we have like the Internal Job Manager, which is responsible for submitting the actual task to the Kubernetes.

ABDEL SGHIOUAR: Yeah, so that's the Internal Job Manager. And then, the External Job Manager is, my understanding is it basically translates whatever the user submits and pin it to a specific cluster, essentially.

LOUIS BAILLEUL: Yeah, so one of the interesting things is that because we didn't want to do a lot of rework on premise and change how the users are operating and a lot of things like that, and we still had the on-premise to cater for.

ABDEL SGHIOUAR: Yeah.

LOUIS BAILLEUL: So what we actually did is that we extended how the user submission workflow works to, at the end, instead of talking to a Grid Engine queue on-premise, it's still talking to a Grid Engine queue, but in it, it actually has a workload that is going to talk to Kubernetes and talk to the External Job Manager effectively.

ABDEL SGHIOUAR: Yeah, so you essentially built an abstraction layer on top of multiple Kubernetes clusters.

LOUIS BAILLEUL: Yeah.

ABDEL SGHIOUAR: And that abstraction layer is — are those like controllers, the external and internal job managers? Are they controllers, or they are?

LOUIS BAILLEUL: Not at this time. So we experimented a little bit with controllers and the operator patterns and a few things like that. But we are fairly new to Kubernetes, effectively, and our maturity for this was a little bit too low in order to support this kind of pattern. So today, the EJM is basically just a Python script that runs outside of Kubernetes that is effectively hosted in the Grid Engine manage queue, effectively. And the IJM is a simple service.

ABDEL SGHIOUAR: It's just an API?

LOUIS BAILLEUL: Yeah.

ABDEL SGHIOUAR: Yeah, so basically, this goes to say that not everything has to be controller on GKE or on Kubernetes, because the controller and the operator pattern is quite heavily used, and I have seen before people that end up running just many, many hundreds and thousands of controllers that just bombard the API server.

LOUIS BAILLEUL: Yeah, we already have issues with the API server without going into this kind of thing. [LAUGHS]

ABDEL SGHIOUAR: Yeah, because you are actually running on a very large scale. You are running 12 clusters at, sometimes, the biggest number of nodes you can get, which is 15k nodes per cluster, right?

KASLIN FIELDS: Yes, yeah. We've been getting cluster to their max basically.

ABDEL SGHIOUAR: Yeah, and so that gets us to the numbers that you have shared, actually, in this presentation from Next. So I'm just reading out of the doc that I have prepared for the notes. At normal operations, you are 200,000 cores, 21 petaflops, and 75 petabytes of storage, right?

KASLIN FIELDS: Yeah, so that's what we spike to, basically.

ABDEL SGHIOUAR: OK, that's what you spike to, OK.

LOUIS BAILLEUL: Yeah, so it's we had two records, basically, one at the end of summer or late August and one late September, where we first reached 800,000 VCPU concurrently on about 50,000 instances. And then, we reached 1.2 million VCPU, still with 54,000 instances, just different instance shape, basically that meant more VCPU.

ABDEL SGHIOUAR: Yeah, just changing the instance types, interesting. I mean, obviously, those are crazy numbers, right? 1.2 million VCPUs is crazy numbers. 53,000 nodes is a big number. But are those normal numbers for HPC workloads?

LOUIS BAILLEUL: Yes. [LAUGHS] So it's in the high scale, but it's still like what you would expect from an HPC environment. To give it some perspective, it's like this 21 to 30 petaflops, that puts you in the top 25 like of today's TOP 500. So it is not even one of the biggest thing ever.

ABDEL SGHIOUAR: Yeah, It's not very uncommon. Which brings me to a very interesting question. I could not find you on the TOP 500 with the Kubernetes HPC, right?

LOUIS BAILLEUL: Yes, because it's like the benchmark that you use to actually submit it to the TOP 500 actually is not possible to run on the Kubernetes today.

ABDEL SGHIOUAR: Oh, so they haven't included the use case of cloud and Kubernetes?

LOUIS BAILLEUL: No, this is one of the last things or one of the big things that we need to mention, is that going into GCP, we made the choice to go for the cheapest way to run, I would say, so the most cost-efficient, which means that we're using prem-developed VMs. So Spot nowadays has been renamed.

ABDEL SGHIOUAR: Yeah, it's called Spot nodes yeah.

LOUIS BAILLEUL: That means that the workload that you run needs to be able to survive preemption, so using checkpointing and so on. Historically, when I was mentioning the supercomputer and what most supercomputers do is they run tightly coupled workloads using MPI, which means that you create a ring between all the nodes. So that is very, very incompatible with preemption.

So running this kind of focus on like GKE is possible. You do actually have a supporting kubeflow for it. But yeah, at the scale we have and with the cost efficiency we're looking for, this is not practical.

ABDEL SGHIOUAR: Yeah, interesting. So you choose to run on Spot VMs, and you choose, or you configured, or you made your workload tolerate preemption, but for the sake of cost-efficiency, essentially, right?

LOUIS BAILLEUL: Yes.

ABDEL SGHIOUAR: OK.

LOUIS BAILLEUL: Yeah. Yeah, it's just a trigger for us, basically, on using a different node [INAUDIBLE].

ABDEL SGHIOUAR: Yeah, and actually, the other interesting thing is you have done all of this with a very hard deadline for May 2022 with — what was the size of the team? How big was the team that worked on this?

LOUIS BAILLEUL: It's about 20 people.

ABDEL SGHIOUAR: All right. [LAUGHS] And you had from about 2020 to May this year to get it done?

LOUIS BAILLEUL: Yes, this was a very challenging effort, and the team delivered way above expectations for the complexity of the problem that needed to be tackled and the aggressivity of deadline. So yeah, we're pretty proud that it was achieved, but it was a pretty big challenge.

ABDEL SGHIOUAR: Well, kudos to the team, then. This brings me to my other question related to — so we, of course, we discussed these numbers, and they're crazy. But there is this myth that people think that when you go to the cloud, you can just have an infinite number of resources. And they tend to forget that there are quotas that any cloud provider would enforce on you for many reasons. So how do you actually handle capacity management? How do you handle getting capacity when you need it?

LOUIS BAILLEUL: It's a very good question. And this definitely is a myth, that everything is infinite in the cloud. You technically just run on somebody else's computer. That's the bottom line.

ABDEL SGHIOUAR: Yes.

LOUIS BAILLEUL: So you have the same capacity management issues than you would have on any other infrastructure. The only difference is that because of the scale of the cloud, most of the time, you're not exposed to its full extent. But for HPC workloads, yeah, you can get there.

So mostly, we have contact with the Capacity Planning Team, which help us. There are definitely some interesting arrangements that can be made in order to accommodate for very large workloads. So for example, we run out of US Central 1, which is a very big region. But even within it, all clusters are basically tapping into different pool of compute, so they don't actually over-stress the same resource pool.

ABDEL SGHIOUAR: Yep.

LOUIS BAILLEUL: So we've got different things like that. But one of the things that is making our challenge even harder is because we're trying to run out of Spot, which is extra capacity, effectively, on the cloud region, it is very unpredictable in terms of how much you can get. But yeah, with what we've been saying, you still can get a lot of it.

ABDEL SGHIOUAR: Yeah, you can get 1.2 million VCPUs. So — [LAUGHS] all right, that's actually pretty cool. And then, I'm going to shift gears a little bit and just talk about the other cool things you folks are doing, which is what actually we discussed first time we met I think in 2019, Kubernetes on vessels. Like those boats we've been talking about, they are effectively floating data centers. They have computers on them, right?

LOUIS BAILLEUL: Yeah, so the data acquisition is actually requiring quite a bit of processing just to do quality and to merge a lot of different data streams into something usable. Just for the operation of the vessel itself, we actually have quite a bit of compute. And we do some fast track, what we call, on the vessel where we do some processing of the data into some intermediary product to get a sneak peek, basically, of what it would look like directly from the vessel.

ABDEL SGHIOUAR: Yeah.

KASLIN FIELDS: And yeah, we've been thinking about the next generation of our platform on the vessel. And we have a few reasons for doing this. Like the cloud migration, cloud modernization efforts, how you want to call it, that, basically, we started in 2019 impacted a lot of different areas. But as a general thing, pretty much all of our IT functions, our corporate functions, start to shift and change. Some of them migrated to SaaS services. Some of them have been consolidated. So we see some change needed in here.

The HPC being completely changed and moving to the cloud is also like a good reason for us to look at the different platform. And because it's moving very heavily toward Kubernetes, this is a pointed thing. And lastly, we have new use cases now on the vessels. So over the last few years, we started to see a lot more machine learning and some edge computing, like proper edge computing that is done for like IoT.

ABDEL SGHIOUAR: Yeah.

KASLIN FIELDS: So this kind of use cases have emerged from more modern platforms than traditionally. So we have a few reasons to investigate.

ABDEL SGHIOUAR: Yeah.

LOUIS BAILLEUL: It would make sense for us to deploy Kubernetes on the vessel.

ABDEL SGHIOUAR: Yeah, that's actually very — that's the cool part, right? Having every vessel effectively a floating Kubernetes cluster, right?

LOUIS BAILLEUL: Maybe not just one. [LAUGHS]

ABDEL SGHIOUAR: Well, maybe more than one. [LAUGHS] But this means that actually, one thing, when we're discussing, is that most current on-prem Kubernetes installations, whether they're coming from a cloud provider like Google or whether, even, you want to do it yourself, you still need to be able to monitor it remotely, push software to it. But those vessels, they are on satellite connection, right? And they do not have a lot — what was the bandwidth you told me the other day?

LOUIS BAILLEUL: The biggest we can get to right now is about 10 megabits.

ABDEL SGHIOUAR: 10 megabits per second. That is not a lot.

LOUIS BAILLEUL: No.

ABDEL SGHIOUAR: So you have to be able to manage that Kubernetes platform on a, potentially, very low bandwidth, right?

LOUIS BAILLEUL: Yeah, so reliability and capacity of the connectivity is a challenge on the vessel.

ABDEL SGHIOUAR: Yeah.

LOUIS BAILLEUL: And we do have some critical requirements for bandwidth. So there are things that have high priority that actually need to communicate with the shore and things that can wait.

ABDEL SGHIOUAR: That's a lower priority.

LOUIS BAILLEUL: Yeah.

ABDEL SGHIOUAR: Yeah, so that basically creates another set of problems related to monitoring and making sure the environment works. And eventually, it probably has to be self-healing. It has to be able to recover from a crash or something.

LOUIS BAILLEUL: Yeah, so the current platform we have basically has been designed with a lot of resiliency in mind, a lot of cold spare, basically, and easy maintainability in the sense that we don't have an ops team standing by on the vessel. We usually have like people that their day-to-day operation role is not maintaining a compute center.

ABDEL SGHIOUAR: Yes.

LOUIS BAILLEUL: So it needs to be easy enough for them to unplug this, trip like that, and then, off we go. We're back in business.

ABDEL SGHIOUAR: Yeah.

LOUIS BAILLEUL: So yeah, definitely a lot of self-healing and easy maintenance without being connected.

ABDEL SGHIOUAR: Yeah, the good news is Kubernetes on the edge seems to be a thing that people talk about these days. I haven't been at KubeCon North America this year, but the last episode we have put out online, there was a lot of talk about Kubernetes on edge and edge use cases. And we're seeing, also, multiple cloud providers, actually, moving towards that space. So there are potentially products that could solve these problems for you, hopefully.

LOUIS BAILLEUL: Yeah, we've been looking. It's an emerging space. So we've been testing, we've been trying to get our foot wet with it, and we see that there is some good promises. But for the criticality of the environment for us, it's probably not ready yet.

ABDEL SGHIOUAR: Yeah.

LOUIS BAILLEUL: We're quite happy to investigate further and follow it, because getting our all different platforms converging towards Kubernetes is quite attractive as an idea. So if we can deliver on this in a way that is acceptable for the vessel operation, that's definitely good.

ABDEL SGHIOUAR: Nice. So we should expect in a few years from now you at KubeCon, at a keynote, talking about Kubernetes on vessels, hopefully?

LOUIS BAILLEUL: Maybe, maybe.

ABDEL SGHIOUAR: All right, we'll see. Well, thank you very much, Louis. It was great talking to you.

LOUIS BAILLEUL: It was a pleasure, thanks.

[MUSIC PLAYING]

KASLIN FIELDS: That was a really cool conversation and use case that you got to explore there, Abdel.

ABDEL SGHIOUAR: Yeah, it's quite impressive, the numbers that we discussed with Louis, and also the size of the team they had to do the migration. It's pretty interesting.

KASLIN FIELDS: Yeah, there's several components in there that I feel like I hear a lot of customers that I talk to also talking about — things like the challenges of lifting and shifting to the edge, and adopting cloud-native practices to make the best use of the cloud-native environment. I think I said "edge," but I meant "cloud."

But another component is the edge. I hear a lot of users and a lot of customers that I talk to talking about edge use cases and how edge technology has really developed in the last few years. And I hear a lot of folks really interested in that. And like you said, doing this shift on a timeline with a limited number of people, a limited amount of resources, it's a challenge that I hear a lot of customers trying to take on, and it's really exciting to hear this use case where they succeeded at moving to the cloud. And it sounded like he mentioned that their deadline was May of 2022 with only 20 people, but you all didn't mention when they started. It sounded like about 2019?

ABDEL SGHIOUAR: They were experimenting starting in 2018. They were doing PoCs. But the actual start of the migration was 2020. Because 2020, with COVID, they had a bunch of economical challenges and financial challenges. And rebuying or reinvesting money into renewing their supercomputer, to create the supercomputer they were talking about, they were end-of-life and they had to buy new ones. So instead of just investing money in buying new ones, which would be big Capex, they decided, we're just going to go to cloud.

KASLIN FIELDS: I remember the early days of cloud when we were talking about, is the cloud really more economical for customers? And I love how you all addressed that. You talked about how, at scale, yes, it can be. If you're a really small shop that only has 10 compute nodes, it's probably going to be more expensive to move to the cloud. But at the kind of scale that we're talking about here, with using things like Spot and preemptable nodes, making use of the unique characteristics of the cloud, they actually could save on their Capex, which is really interesting.

ABDEL SGHIOUAR: Exactly. I think it's very interesting to keep in mind that there is still a little bit of Capex because, obviously, they have a commit contract in order to get like massive discounts. Most cloud providers do that. So there is still a little bit of Capex. It's hidden, you don't see it, but there is still a certain amount of money you have to spend every year.

But as you said, generally speaking, in my work on migrating customers to go to cloud in general, there is a inflection point where it becomes economical. It becomes interesting. As you said, if you have 10 nodes, or maybe going to like a hosting provider would be cheaper than going to the big cloud providers. But 53,000 nodes at peak 1.2 million VCPUs? That's a lot of nodes.

KASLIN FIELDS: That's a lot of nodes. And I also love how you all dove into capacity management. A lot of the customers that I talked to get bitten by that at first when they're starting to use the cloud. Because like you all said, the cloud being infinite is kind of a myth. [LAUGHS]

ABDEL SGHIOUAR: Yes, exactly.

KASLIN FIELDS: And so really, you have these capacity management issues that you still have to deal with, and you have to make sure that you're working with a partner in the cloud that can actually work with you and make sure that you can manage your capacity effectively. So it was really cool to hear from them that they were able to do that with Google Cloud.

ABDEL SGHIOUAR: Exactly, and especially that actually, the nature of their workload are their workloads are pretty unpredictable because they're pretty bursty. So when you are submitting a job, you don't necessarily know upfront how much you're going to need because they rely quite a lot on autoscaling as well. So once the workloads are deployed into the cluster, then the cluster autoscaler will spin up the nodes needed for that workload to run. So there was quite a lot of interesting discussions we had that didn't make it to the podcast. But yeah, it's a pretty interesting problem to solve when you are not in your own data center, right?

KASLIN FIELDS: [LAUGHS] When you don't have full control over the capacity management, where things are placed, and all of that, you offload some of that work for the benefits that you get from being in a cloud environment. I think that makes a lot of sense.

ABDEL SGHIOUAR: Exactly. I mean, and it's not like when you are in your own data center, it's easy to just increase your capacity. You still have to go through a sales cycle. You can't just buy hardware and have it just sitting on a shelf because it would depreciate over time. It's the same set of challenges. It's just like, as you said, you don't have control, so you have to solve it in a different way.

KASLIN FIELDS: Then, at the end there, you were talking about the future and how they're looking into maybe starting to run Kubernetes clusters on the edge on these vessels in the ocean, and with their limited bandwidth capacity. This is something that I'm hearing from a lot of customers and a lot of users, like I said, who are interested in edge computing use cases.

There have been advancements in this area, like Louie said. So a lot of folks are looking into how they can do this today with the newer technologies that are coming out. And I know there are several projects in this area.

ABDEL SGHIOUAR: Yeah, so the interesting thing with PGS is that I actually worked with them back in 2018 when I was sitting PSO. I actually worked with them on consulting projects. And the project was on cloud, but the discussion about Kubernetes on vessels was happening at the time. There was just no product that could help them solve the problem because one of the challenges with an edge use case, or their edge use case, is you don't have enough bandwidth. So you have to, of course, keep an eye on the clusters. You have to monitor them. But you don't have a big pipe from the vessel back to the internet.

Then, you have to have enough capacity to store data offline because you're not going to be able to ship your logs and your data, which they do today. They store, actually, data on the vessels, and then, when the vessel docks, they can just offload the tapes and read them. But when you think about telemetry data, like metrics and logs, you can't just send this stuff over a satellite connection, which is 10 megabytes.

So I think they're getting closer to that. They're looking at it very seriously. And I think that the main takeaway, at least for me, was the motivation for putting Kubernetes on the edge for them was unifying the software supply chain, how they ship software to production, just using the same tooling, using everything because, basically, you have Kubernetes as the base layer, right?

KASLIN FIELDS: That's a really good point. And I think a lot of the customers that I talk to bring that up, is that you can run Kubernetes in the cloud, and you can run it on-prem, and I have friends that have run it on ridiculous types of hardware. [LAUGHS]

ABDEL SGHIOUAR: Raspberry Pis.

KASLIN FIELDS: So since it's such a flexible technology, you can have this consistency all the way through. So I think it makes a lot of sense for it to be popular there. I know there were some talks at past KubeCons about running Kubernetes at the edge, and on interesting vessels, and things like that. So I'll have to see if I can find a couple of those for our show notes.

ABDEL SGHIOUAR: Yeah, I think I remember from KubeCon Valencia, there was even Canonical, the company behind Ubuntu, they were there with their own custom hardware that runs Kubernetes. And each time I go to KubeCon, there is always like at least one or two companies with some colorful, shiny computers at the booth. And they're like, yeah, this is our Kubernetes.

You know what's interesting? The first time I heard about Kubernetes on the edge was reading a blog post by Chick-fil-A, the food chain in the US, where they wanted to do Kubernetes installs but they couldn't. So they had to come up with their own solutions. So they use these Intel NUCs, the small nodes.

KASLIN FIELDS: Yeah, I have several of those sitting behind me, actually. [LAUGHS]

ABDEL SGHIOUAR: Yeah, I do have some here as well. So they had to automate it in such a way that they could actually take a node, if one of their nodes in one of their stores break, they can just ship it to the store and have just a store employee connect it, and it will just bootstrap and connect it and configure itself automatically. So that was the first time I heard about it, and I was like, that's such a cool use case. And now, it's actually a reality. There are products.

KASLIN FIELDS: So awesome. As a contributor to Kubernetes and someone who works in the Kubernetes space, there's so much breadth to the use cases that folks use Kubernetes for. So hearing awesome use cases like this just makes me so excited for where the technology is going.

ABDEL SGHIOUAR: Yeah. I think Urs was not wrong when he said that Kubernetes is — what was it? It's the operating system of your data center, it's the operating system in the cloud.

KASLIN FIELDS: Yeah.

ABDEL SGHIOUAR: It's becoming this thing which is like it's a base layer. It's there.

KASLIN FIELDS: And this is kind of going back to the beginning of the conversation, but [LAUGHS] when Louis was describing what a supercomputer is, he was saying, it's just CPU and memory, and that's all connected, basically, with networking. And that's just the basic components of compute wherever it is.

And as he was talking about that, I was thinking, Kubernetes is like that. It's just about connecting CPU and memory over networking. So it's just a base layer.

ABDEL SGHIOUAR: It's pretty much the same thing. Exactly.

KASLIN FIELDS: Yeah, love that. So I really enjoyed talking with you about your interview, Abdel, and hearing you talk with Louis about this really cool use case.

ABDEL SGHIOUAR: Yeah, that was an awesome episode. I enjoyed talking to Louis. And hopefully, we will get to talk to them again when they actually put Kubernetes on vessels.

KASLIN FIELDS: I hope so.

[MUSIC PLAYING]

ABDEL SGHIOUAR: That brings us to the end of another episode. If you enjoyed this show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @kubernetespod, or reach us by email at kubernetespodcast@google.com. You can check out the website at kubernetespodcast.com, where you will find transcripts and show notes, as well as links to subscribe. Please consider rating us in your podcast player so we can help more people find and enjoy the show. Thanks for listening, and we'll see you next time.

[MUSIC PLAYING]

View More Episodes