Kubernetes Podcast from Google: Episode 235 - Ray & KubeRay, with Richard Liaw and Kai-Hsun Chen

#235 September 3, 2024

Ray & KubeRay, with Richard Liaw and Kai-Hsun Chen

Hosts: Mofi Rahman, Kaslin Fields, Mofi Rahman

In this episode, guest host and AI correspondent Mofi Rahman interviews Richard Liaw and Kai-Hsun Chen from Anyscale about Ray and KubeRay. Ray is an open-source unified compute framework that makes it easy to scale AI and Python workloads, while KubeRay integrates Ray’s capabilities into Kubernetes clusters.

Do you have something cool to share? Some questions? Let us know:

News of the week

Links from the interview

Kai-Hsun Chen on LinkedIn
Richard Liaw on LinkedIn
Ray from the RISE Lab at UC Berkeley
Ray: A Distributed System for AI by Robert Nishihara and Philipp Moritz - Jan 9, 2018
KubeRay Docs
KubeRay on GitHub
PyTorch
Apache Airflow
Apache Spark
Kubeflow
Apache Submarine (retired)
Jupyter Notebooks
VS Code
Examples of schedulers for Batch/AI workloads in Kubernetes
- Kueue
- Volcano
- Apache Yunikorn
Examples of observability tools for Batch/AI workloads in Kubernetes
- Prometheus
- Grafana
- Fluentbit
Examples of loadbalancers
- Nginx
- Istio
Ray Data: Scalable Datasets for ML
Dask Python - Parallel Python
Ray Serve: Scalable and Programmable Serving
HPA - Horizontal Pod Autoscaling in Kubernetes
Karpenter - “Just-in-time nodes for any Kubernetes cluster”
Lazy Computation Graphs with the Ray DAG API
Types of hardware accelerators
- Google Cloud Tensor Processing Units (TPUs)
- AMD Instinct
- AMD Radeon
- AWS Trainium
- AWS Inferentia
Pandas
Numpy
KubeCon EU 2024 - Accelerators(FPGA/GPU) Chaining to Efficiently Handle Large AI/ML Workloads in K8s - Sampath Priyankara, Nippon Telegraph and Telephone Corporation & Masataka Sonoda, Fujitsu Limited
NVidia Megatron

Links from the post-interview chat

Transcript

Show full transcript

KASLIN FIELDS: Hello, and welcome to the "Kubernetes Podcast" from Google. I'm your host, Kaslin Fields.

MOFI RAHMAN: And I'm Mofi Rahman.

[MUSIC PLAYING]

KASLIN FIELDS: In this episode, our guest host and AI correspondent Mofi Raman interviews Richard Liaw and Kai-Hsun Chen from Anyscale about Ray and KubeRay. Ray is an open source, unified compute framework that makes it easy to scale AI and Python workloads, while KubeRay integrates Ray's capabilities into Kubernetes clusters. But first, let's get to the news.

[MUSIC PLAYING]

MOFI RAHMAN: LitmusChaos, featured on the last episode of this podcast, episode 234, has completed a third party security audit conducted by 7ASecurity. The audit consisted of a white box security review paired with pen testing. The results include 16 findings with a security impact, six vulnerabilities, 10 hardening recommendations, eight threats to the project defined with detailed attack scenarios and fix recommendations and recommendations for future security hardening in LitmusChaos. The audit report emphasizes that LitmusChaos has well implemented security efforts that reflect well on the function, build, and maintenance of the project.

KASLIN FIELDS: Google Cloud announced that NVIDIA L4 GPUs are now available on Cloud Run in preview. Cloud run is a serverless execution environment for containerized applications in Google Cloud. The release of GPUs on Cloud Run opens the door to many new use cases for Cloud Run developers, such as performing real-time inference with lightweight open models, serving custom fine-tuned gen AI models, and speeding up existing compute-intensive cloud run services.

MOFI RAHMAN: A number of co-located CNCF-led events took place in Hong Kong between 21st and 23rd of August, including KubeCon, CloudNativeCon, Open Source Summit, and Open Source genAI and ML Summit China 2024. That's four events in one. The event featured a number of excellent talks, including a keynote from Linus Torvalds.

KASLIN FIELDS: The CNCF-hosted CoLo schedule for KubeCon CloudNativeCon North America 2024 is live. The co-located events will take place before the main schedule of KubeCon on Monday and Tuesday, November 11 and 12, 2024, in Salt Lake City, Utah.

MOFI RAHMAN: Kubernetes 1.31 became available in the Google Kubernetes Engine Rapid channel on August 20, just one week after its release. Information about the new features and deprecations in 1.31 on GKE can be found in the GKE release notes. And don't forget to check out our interview with the release lead to learn more about 1.31.

KASLIN FIELDS: Red Hat announced the general availability of Red Hat OpenStack Services on OpenShift. OpenShift is a container application platform designed for developers based on Kubernetes. OpenStack is an open source platform, which allows users to build and manage private or public clouds with their own pooled virtual resources, and is designed for system administrators.

OpenShift and OpenStack represent different approaches to the challenge of distributed computing. OpenStack is particularly popular in telecommunications use cases. This new offering provides a native way to run the virtual resource management tool OpenStack on top of the Kubernetes-based platform OpenShift.

MOFI RAHMAN: Broadcom hosted the VMware explore conference in Las Vegas from August 26 to 29. Among the announcements at the event was the release of Tanzu platform 10. Tanzu Platform 10 allows you to choose between Cloud Foundry and Kubernetes for your platform runtime, whether in public or private cloud environments. Among other new features, this change enables users to create more streamlined workflows for developers, especially those developing genAI-powered applications.

KASLIN FIELDS: The CNCF shared that since launching the Kubestronauts Program at KubeCon 2024 in Paris, over 500 Kubestronauts have joined the program. Each of these 500-plus Kubestronauts have active certifications in all of the CNCF Kubernetes certifications. That's the KCNA, the KCSA, the CKA, the CKAD, and the CKS.

And I'm not going to spell those out for you right now. Make sure you go look them up. If you pass all five of these certifications and have them all active at once, you, too, can become a Kubestronaut. You can find links to more information about the program in the show notes.

MOFI RAHMAN: The Dapr community will hold a Virtual Dapr Day event on October 16. Dapr stands for distributed application runtime, a free open source runtime system that helps developers build distributed applications. The community will be celebrating five years of Dapr. The schedule for the event is now live.

KASLIN FIELDS: And that's the news.

[MUSIC PLAYING]

MOFI RAHMAN: In this episode, we have Richard Liaw, who is a member of the product team at Anyscale. He was previously working on the Ray open source project, as well as one of the authors of the Ray-- as well as one of the authors of the original Ray paper. Richard, welcome to the podcast.

RICHARD LIAW: Thanks. Happy to be here.

MOFI RAHMAN: We also have Kai-Hsun, who is the maintainer of the KubeRay project. He's also a member of the Anyscale team. Welcome to the podcast, Kai-Hsun.

KAI-HSUN CHEN: Yeah. Hi. Hi, everyone.

MOFI RAHMAN: Richard, how did you get involved with the Ray project.

RICHARD LIAW: Back seven, eight years ago, the project was being started at UC Berkeley. And this was a period of time where everyone was very excited about reinforcement learning. And there was no simple system to build reinforcement learning applications at scale.

So the Ray project came out of the work that Robert and Philip were doing on distributed deep learning and distributed reinforcement learning. And I was an undergrad, sort of a starting graduate student at the time at the Berkeley RISElab. And I started building reinforcement learning applications on top of-- and algorithms on top of Ray.

This led to me and other graduate students working on Ray Tune, which was a distributed hyperparameter tuning project, and Ray Rllib, which is a distributed reinforcement learning project. I spent a couple of years doing that. And then when we started Anyscale in 2019, I also joined and eventually became the engineering manager, on top of all distributed training solutions for Ray. And currently, I've moved on to work on other projects, including Ray Data and Inference.

MOFI RAHMAN: The same question to Kai-Hsun. How did you get involved with the Ray project?

KAI-HSUN CHEN: Yeah, I think it is also a long story. I think at first is that in my undergrad, I maintained an open source project, Apache Submarine. It is ML platform that's built on the Hadoop ecosystem on the Hadoop YARN.

And then we observed the transition from Hadoop YARN to the Kubernetes, also support on the Kubernetes. And then I also wrote an industrial paper, and then I found an open source project that is the Ray, and the cited paper. And I found it pretty interesting.

So when I just go to the US, I just joined Anyscale to work on the KubeRay. And then I work on the Record stuff. Yeah, and currently it's that-- currently, I've put more emphasis on the Record side, on the distributed training. Yeah.

MOFI RAHMAN: So we already mentioned the word "Ray" a few times. But for our listeners who probably are not familiar, if you had to describe Ray in a few sentences, what is Ray?

RICHARD LIAW: I'm happy to take that. So Ray is a compute engine for scaling AI workloads. And it offers a set of libraries, including libraries for training, serving, data processing, all built within-- using a Python front-end and integrating very closely with the rest of the machine learning and AI ecosystem.

MOFI RAHMAN: And in this space, I think Kai-Hsun also mentioned something about Hadoop and Apache projects that exist in this space. And there are a bunch of other things like PyTorch and there are things like Spark and Airflow, all these other things that exist. Around the same time in 2019, there was also a project called Kubeflow that was doing a bunch of things in the Kubernetes space.

So while existing solutions that does similar things, or does things that are in the same space as an AI/ML workload, why Ray? Why was Ray created and what problem was it solving in the time that other solutions were not doing?

KAI-HSUN CHEN: Got it. Got it. I think this is a pretty good question. I think the first point is that I think before Ray, I think the ML infrastructure is primarily in the microservice architecture after Kubernetes intro. For example, maybe in the Kubeflow, I think you need to have some systems for the data processing and also some operations, one for TensorFlow, one for PyTorch.

I think it's recently combined into one training operator and you need to do some use case there for some stuff to do a serving. But I think it a pretty makes sense for the control plane logic. But for the computation side, I think people prefer to use a monolithic computation runtime, and Ray is pretty good at last, because Ray is pretty versatile and general purpose.

And ML workload is a very versatile workload, including data processing, training, tuning, and serving. And at each stage, you require a different kind of workload, a different kind of infrastructure requirement. For example, like serving, you require autoscaling and high availability. And for training, maybe you require scheduling.

You also need to support different kinds of accelerator and different kinds of workload. Yeah. So I think it changes very fast in the workload. And people want to combine-- use a single runtime for all the workloads instead of use the microservices for different parts. Because if you use different parts, you need to use some YAML file or something to glue them together.

But with Ray, you can use a single Python file to cover end to end. Yeah, so I think that's why people like Ray, because iterates much faster. You don't need to-- and it can very flexible and much future proof. Yeah, I think this is from my perspective, because I also maintain a project in the Apache Submarine. You can think it's also the microservice architecture stuff.

MOFI RAHMAN: Richard, would you like to add anything in that one?

RICHARD LIAW: Yeah. I think the developer productivity aspect is pretty big. In particular, a lot of data scientists and machine learning folks will use notebooks. And instead of writing machine learning pipelines in microservices and separate sort of containers, Ray allows you to do everything end to end in one development environment, especially with a notebook. Then you can chain data processing with training, with batch inference, and keep that all within the same developer context. And that's very powerful and makes building and iterating on machine learning pipelines much easier than before.

MOFI RAHMAN: I think from my experience of using Ray has been similar experience of being able to write code that is in my local machine that is just Ray sets up a Ray Cluster on my local machine for me, and the code, from there, going somewhere else on a Ray Cluster on some sort of distributed system or Kubernetes seemed very intuitive.

So I just wanted to speak to you guys to see if that was like a design decision from the get go. Make the experience of the people that are used to, in the data science community, like having notebooks, not having to move too far from their Python environment. Was that something like a decision made early on or was it something that just organically happened.

RICHARD LIAW: So I think two parts. One is the part about developing locally and being able to scale very-- scale into large-scale contexts without having a lot of-- without having a distributed systems PhD was one of the key design principles that we wanted to have when building Ray. Now I think when it comes to the notebook sort of environment, that was something that was a natural artifact of us wanting to be able to have the developer experience and make sure that we worked really well with the rest of the Python ecosystem single-- the standard single process, Python ecosystem. And so when it came to seeing people start using Ray, and notebooks, and us actually giving tutorials, it became a very natural evolution of how we would want the Ray experience to evolve.

Now I think one thing to call out is in Anyscale, one thing-- so Anyscale is the managed Ray product. The developer experience that we tend to advocate for our users is focused on VS Code and being able to develop scripts from scratch. And I think that's one of the things that probably marks the difference between a machine learning engineer versus a data scientist. And that's also from an aspect of what we target, people who are working more on the machine learning side and then building machine learning pipelines rather than data scientists doing large-scale numerical computations.

MOFI RAHMAN: Kind of going off from that, you mentioned the machine learning engineers versus data scientists. In my work in the Kubernetes side of things, we actually deal with a lot of folks that are platform engineers. So from the Ray project point of view, if you had to define Ray's user on the other side, would you say the user base currently are mostly folks that are in the data science camp, or machine learning engineer camp, or platform engineering camp? Or do you see a nice distribution between the three.

RICHARD LIAW: I think there's actually somewhat of this undefined category that allows Ray to touch all of them at the same time. Ray is not only a machine learning and engineering tool. It's not only a platform engineering tool.

With KubeRay, we make it really easy for these platform engineers to deploy Ray. With the Ray actual core APIs, like the machine learning engineer and the data scientists will use those. And what I mean, core APIs, I mean the libraries and the Ray Core primitives.

And we see that with the libraries, especially some of the more higher level libraries like Ray Tune, that's something that the data scientists will naturally have in their arsenal at this point. So it is a tool for a large swath of different people in common machine learning organizations at this point.

MOFI RAHMAN: Richard, you just mentioned KubeRay, and we have the one of the maintainers of KubeRay here, Kai-Hsun. So if you wanted to add a little bit more understanding of what KubeRay is, why it exists, and why people should know more about KubeRay.

KAI-HSUN CHEN: I think first is that because I think Ray is a very powerful tool and Kubernetes is-- I think a Ray is a very powerful tool for the computation. And I think Kubernetes is the most powerful tool that's for the orchestration and deployment.

And I think KubeRay's Kubernetes operator to deploy Ray on the Kubernetes so that it can unlock the opportunity for users to integrate with Kubernetes ecosystem with Ray. Yeah, for example, like some schedulers like the Kueue, Volcano, YuniKorn, and you can use some observability tool, like a Prometheus Grafana, fluentbit, a lot of stuff, and some load balancer like the Nginx, like Istio.

Yeah. So I think KubeRay is a great operator and unlocks the possibility for users to-- because in computation, it's not enough. You also need to have some deployment and allow stuff. And the Kubernetes ecosystem I think is a very good ecosystem to enable users to actually productionize the Ray.

MOFI RAHMAN: So Ray has a number of these libraries part of Ray Core, as well as the ray project itself. Is there any favorite that you want to talk to our audience more about?

RICHARD LIAW: Yeah. I think the Ray Data project in particular is a really interesting sort of project that we're starting to focus more of our energy and resources around. It hasn't gotten a lot of public attention, but we've been doing a lot of work internally to make it better. And we've seen a lot of our customers and open source users really get excited by the abilities and capabilities that it brings.

In particular, Ray Data is like a data processing engine in some sense, but it's largely focused towards AI workloads. And what that means is it's like being able to take big data and map it to your GPUs in a very effective, cost efficient, performant, scalable way. So that means having really good integration of data preprocessing when feature preprocessing, and then being able to ingest that into the GPU very efficiently. That's something that other systems like Dask wouldn't be able to support.

And the other side is batch inference, being able to handle large models and scale out very, very quickly and handle all sorts of different data modalities. Audio, text, image, videos, so on and so forth. And just to feed into those large neural networks that we have now. That's like somewhat of a unique capability.

MOFI RAHMAN: Same question to Kai-Hsun. What is your favorite Ray library that exists?

KAI-HSUN CHEN: I think our research provides a very flexible way to do the multiplexing, because I think in the production side, we see a lot of users that they need to use a lot of different models in a single graph. And I think Ray provides a-- Ray provides a very easy way for that.

And I think the second is that, because as we said, that it is a monolithic runtime, and we see some very useful benefit from it. For example, I think because in Ray, a single scattering unit is test and actor, which is a function or class, instead of a container.

So some of our users that at first, they use several microservices to serve the model. Maybe one Golden microservice and sends to a Python microservice, and sends to the other Golden microservice. And then when they use the Ray Serve to combine all of them together, because they can avoid a lot of time on the serialization and deserialization.

And they can share the resource between the stage. So I think they reduce 50% of their costs. Yeah, I think this is the public blog from the Samsara. So I think this is a pretty cool use case for me.

MOFI RAHMAN: I think a bit of a personal not gripe, but a challenge for me is that a lot of the names in the Ray Serve and Ray Job space gets kind of overlapping on the Kubernetes space, because Kubernetes have Service and they have Jobs. So there is a definition of Ray Job in the KubeRay operator.

There is also a Ray Space job on the Ray definition. When folks go about speaking and learning about these things, what have you seen as to be the most challenging part of mapping all the Ray concepts in the world of Kubernetes?

KAI-HSUN CHEN: Oh, yeah. I think it is actually not easy to integrate around Ray on the Kubernetes, because at first, I think Kubernetes is primarily designed for the microservice architecture. For example, I think in the best practice is that in each container, only runs one single process.

But I think in the Ray, we run multiple processes in a single container. Each Ray task and each Ray actor is also a process. So we have a lot of challenges about that. For example, because we run multiple process, it will be very hard to integrate with a like some login tool that's on the Kubernetes side.

And because I think in Kubernetes Kubernetes, a lot of tools that are reading like standard out and standard error, but Ray has multiple processes. So it need to write some logs into the file instead of the standard out and standard error.

And I think on the other side, maybe the autoscaling. Because typically, I think a lot of autoscalers, like the HPA or Karpenter, I think in the Kubernetes world, it assumes that maybe all the pilots in a single deployment or something that it is status. And you can use a resource utilization, like the CPU or memory utilization to decide to scale up or scale down a part because it is status.

But in the Ray, that is different. It splits a single application across multiple nodes. So you didn't know that is this part that is it-- which part of this application running on this node? So it is possible that if you use the physical resource utilization to determine to scale down this part, it may break the application.

So Ray needs to implement the autoscaler by ourself. So I think one of the challenges that we need to make the Ray more friendly for the Kubernetes. I think we've become better and better. Yeah, but still need to have some help from the Kubernetes ecosystem and community.

MOFI RAHMAN: And this is a question, I think, for Richard, that the Ray project itself started about in 2019 as a research from universities. And then over the next few years, as the industry itself has evolved and changed, the ray project also evolved and changed about five years into the Ray project existing. For the next few years of Ray, what are the big challenges the Ray team and the project itself are thinking about?

RICHARD LIAW: So this is for the next how many years, you said?

MOFI RAHMAN: Let's say in the next five. The first five is-- we're in the first five. And how about the next five?

RICHARD LIAW: Yeah. I think machine learning is moving so quickly that it's just like a constant reinvigoration of the project that needs to continue to happen. One big thing that Kai-Hsun is working on right now is this thing called accelerated DAGs, which allows us to program GPU clusters with any sort of accelerator and be able to write these efficient distributed programs that take advantage of GPU interconnects and open up the space for different applications that leverage GPUs in special ways.

For example, pipeline parallelism is something that the ecosystem and the machine learning community is starting to pick up and use more and more. But the abstractions for pipeline parallelism have been quite limited, both on the training and the serving side.

And so one of the things with Ray DAGs is-- Ray-accelerated DAGs is being able to compose and create these efficient, and effective, and fault-tolerant pipeline parallel training and inference pipelines very, very quickly. That's something that we've needed to adapt and wasn't part of the original story around Ray.

I think the other thing that we're also seeing is the rise of multiple accelerators, TPUs being one, AMD being another, and Trainium and Inferentia being another. And for Ray to continue to be relevant as the default substrate for distributed computing, that's obviously something that we have to be aware of and be able to schedule over, be able to do memory management over, so on and so forth. So that's part of the vision that we have and-- allowing us to have a substrate for managing sort of resources for workloads.

MOFI RAHMAN: OK. I think-- yeah, I think that's a really great answer. The next question to the both of you is, what is one or two things about Ray that you want people to know, but have been finding that in the community, people have misconceptions about? And this is the place you can just set the record straight and let people know this is one thing they should know about Ray, if they had to.

KAI-HSUN CHEN: You say that I think users should know?

MOFI RAHMAN: Yeah. Like something the people you find in the community are either not understanding or have misconceptions about. Anything about Ray that it would be good for people to understand differently. If there is nothing and people are-- everybody is getting Ray exactly as folks intended, that's fantastic.

But oftentimes, people either think it's too difficult, or it is-- you need like a huge GPU cluster to run it, or it's hard to learn. Anything you think in the community, people are finding-- you are hearing over and over again about something, but you think that's a misconception?

KAI-HSUN CHEN: Got it. Got it.

RICHARD LIAW: Yeah. The only thing I would probably say is just I think more people should be aware of Ray Data. I think there's a lot of people who are doing batch inference, or trying to force their image processing pipelines to run with some really weird MPI setup or whatever. And Ray Data will probably make your life 100 times easier. But there hasn't been a lot of public discourse around it and I think that's something that I'd be excited to fix.

MOFI RAHMAN: Yeah. Hopefully this will get the conversation going and people are going to learn more about Ray Data. I have very limited experience with Ray Data, but I think from my experience, it has actually made a lot of sense. The functionality that is built into Ray Data just made a lot of sense compared to my knowledge in like pandas and NumPy. Bringing some of those stuff over to Ray Data have been quite a breeze. So I'm a fan, personally.

RICHARD LIAW: Yeah. Yeah. So you're one of these guys who got to try out Ray Data again.

MOFI RAHMAN: And I think the last question I wanted to ask to both of you is that in your work over the last few years in the community and with people, and possibly customers, is there a time where you've seen folks use Ray in a way where-- either was surprising, or sometimes even eye-opening for the Ray team to see people in the community using the project or the libraries in some way that the original thought wasn't there. You didn't think people could use it this way.

RICHARD LIAW: Yeah. I'm happy to give a response here. And Kai-Hsun, you can add on as well. There was this one guy on Twitter.

I forget what his name was, but he connected a ton of MacBook Pros together, or MacBooks-- laptops together to create a Ray Cluster. And he was-- I forget exactly what he was doing, but I think maybe training a model.

But that was kind of mind blowing. And it blew up on Twitter as well. The odd thing, I mean, obviously Ray's a distributed framework, but it's usually distributed for like cloud data center servers. And this guy was connecting his laptop and whatever. So that was pretty cool.

KAI-HSUN CHEN: Yeah. I think maybe for me is that I think maybe one of the things that I find is that some people that use the container in container, you have-- people want to launch a container inside the Kubernetes part. For example, if you serve some model and they don't want to back all the stuff in a single image, and then they want to-- so if they want to launch this kind of model, they launch a container in a container with some additional image.

Yeah, so they don't need to back everything into a single image. And yeah, I am very confused about that. But to be honest, some people use that. And it seems to work well.

MOFI RAHMAN: Yeah. I think I was speaking to-- a few months ago, it was the 10-year anniversary of Kubernetes as the project and I had a chance to meet with a bunch of the original contributors and the maintainers in that space and asked this question to a few people. Do you think-- can you think of a time or an example of people using Kubernetes in a way you initially did not think?

And the answer I got overwhelmingly is 90% of the things people are doing on Kubernetes today, they did not think was an initial use case for Kubernetes. Kubernetes was initially created for stateless applications like web applications to scale massively on the cloud. But people found all sorts of interesting ways to use Kubernetes in many ways.

And over time, I think the project itself caught up with stateful set stuff, and storage, and now accelerators to GPUs and TPUs, and a bunch of other things. Last KubeCon, there was a talk on the Kubernetes schedule-- KubeCon schedule about using FPGAs as an operating unit on Kubernetes as well. So people are finding all sorts of cool, interesting ways to use these things.

And it's always exciting for me to hear these stories, like the one Richard mentioned about using a MacBook Pro Ray Cluster. Richard, you mentioned accelerated DAG. Can one of you explain a little bit more what it is? It sounds interesting, but I'm not-- I don't think our listeners are going to be completely clear of what that does.

KAI-HSUN CHEN: I think that maybe I can tackle that part, because I'm working on this project accelerated DAG. I think accelerated DAG is that because I think Ray is that-- you can say, currently, Ray is a resource orchestrator. And then accelerated DAG is that it-- first, is that it reduces the system overhead that's from Ray.

And then second is that we want to do more about accelerator-centric computation, like use some protocol like nickel and maybe RDMA in the future to accelerate the system overhead. And we are talking on two kinds of workloads. The first one is the large-scale inference and the second one is on the large scale training. And I am primarily focusing on the large scale training.

And I think accelerated DAG provides a very simple API for you to-- for example, like define your pipeline parallelism. And in my experience is that I think there is a very popular strategy of pipeline parallelism and the literal bubble pipeline parallelism.

It's from a researcher team in Singapore. And you can see they're an open source repository. They fork Megatron and then add some patch on it to implement the zero bubble pipeline parallelism.

And you can read the patch. I think Istio-- maybe it is a short patch, but you require a lot of knowledge from the Megatron to implement this patch. But with the Ray-accelerated DAG, you just need to have maybe 100 lines of code in a single Python script and import Ray, and then you can implement it.

You don't need to fork Megatron. So I think Megatron is pretty powerful, do a lot of optimization. But Ray is very-- accelerated DAG provides a very flexible way for you to define your distributed strategy.

MOFI RAHMAN: OK. So I'm going to actually ping you later for the links that you mentioned. I think they're going to be really good to have in the show notes for people to have access to those. Richard, Kai-Hsun, thank you so much for spending this half an hour with me answering a lot of my questions about Ray. Hopefully people will get some interesting insight into the Ray and KubeRay project.

Before we finish, anything, you would like to end on, either any new things that people should be excited about that are coming from either the Ray project or the KubeRay project, and we can finish on like a hopeful future note.

KAI-HSUN CHEN: Yeah. I think KubeRay community, we're recently focusing on several perspectives. I think the first one is for the long-running jobs. And we defined some, like how does it handle if it fails, and figure out the best practice for how to do checkpointing. Yeah, and we collaborate very closely with the Google Kubernetes team.

Yeah. And the other side is that we are also focusing on the support, like heterogeneous computing resource. For example, we also collaborate with the Google Kubernetes team to support the TPU and the multihost TPU. And we support the autoscaling for the multihost TPU. And I think this is maybe currently the only solution that's in the open source world to support a different kind of resource and the support the market host autoscaling.

And the third one is that we also currently build some ecosystem for usability and security. I think we currently build something like an authentication solution that's on the KubeRay side. And we also build some easy-to-use kubectl plugin for users. And then we are also figuring some upgrade mechanics for the serving recently. Yeah, so I think there are a lot of stuff still on the roadmap. So I think it's pretty exciting and the community becomes much and much more popular.

MOFI RAHMAN: On that exciting note, thank you so much, Richard and Kai-Hsun for joining in this episode of "Kubernetes Podcast." And the social accounts for both Kai-Hsun and Richard will be in the show notes. So people can go follow and ask probably more questions about Ray and KubeRay. Thank you so much.

RICHARD LIAW: Thank you. Bye bye.

KASLIN FIELDS: Thank you very much, Mofi, for that interview with Richard and Kai-Hsun from Anyscale about Ray. I've been really looking forward to Ray and I'm excited about what I learned. But first off, welcome back to the show, Mofi.

MOFI RAHMAN: Hey. I made it back again.

KASLIN FIELDS: Whew! Mofi, in his day-to-day work, when we do that kind of stuff, is the primary person on our team focused on AI stuff. So he has been kind of deeper in Ray and AI use cases on Kubernetes than anyone else on our team.

So I was excited to have him do this interview. Since you have more context in this area, you're able to dive a little bit deeper into it. I know, as I was listening through the episode, I heard you asking some questions about things that you've seen, and so I really appreciated that perspective.

MOFI RAHMAN: Yeah. I think from our point of view, Ray is one of the very fundamental building blocks that we are thinking that we, along with other customers and people in this space, would be using for building out their ML platform. So in our work that we are working with customers, as well as folks from the community, when we're thinking about an ML platform, it is a fairly difficult thing to describe for people because ML platform means very different things to different people.

But the fundamental things that people need from not just ML platform, like any developer platform, it's things like multitenancy, it's things like getting resources when you need them, and some version of self-servicing. And in many cases, a lot of the cloud provider system like GCP and things like other cloud providers, you have some system where people can do self-servicing. But for most organizations, that permission system is probably too open.

So they end up building some level of guardrail around it to control or limit the amount of knowledge needed to do the day to day task for the developers, and data scientists, and machine learning engineers. So people build out different level of abstraction for their folks to have access to resources in some ways. And when you're doing all of that, I think having some sort of way to define your workload in a way that doesn't need to change across a lot of this boundary that you cross, Ray actually becomes really useful in that space. You can define your workload in Ray. You could run that workload from your local machine, as well as run that in a distributed manner in massive, massive scale.

I think that's one of the things I also talked to Richard and Kai-Hsun about, is that the model of Ray seems to fit very nicely in the world of I want to do something locally, do some experimentation, but then take it out to a distributed computer and scale it out to, basically, as big as I need to.

KASLIN FIELDS: So I've been trying to learn about Ray for a while. It's been on my list for actual work stuff that I need to do for some time. And I feel like this conversation has finally got me kind of near a baseline at least.

So I like the way that you laid that out as the primary use case that, at least we're really focused, on in terms of how Ray relates to Kubernetes is all about enabling this kind of workflow where engineers, data scientists can create their AI stuff with Ray and then you can run Ray on Kubernetes. And so you can productionize it easily. So you've got this path from your local machine to production. I think that makes a lot of sense.

MOFI RAHMAN: Yeah. And it's also a really interesting conversation with someone like Richard, who have been with the project from a really early stage of the project as well, where Ray was initially almost like a library for doing some data science work. I think Ray Rllib and Ray Tune were the first two components of Ray.

So one of the things that is interesting about Ray as well is that it's end-to-end things that are all part of Ray. So they have Ray Data, Ray Core, Ray Serve, Ray Tune. Ray Cluster is also concept of building out the cluster.

We spoke about a couple of other projects in this space. Like Airflow is something, Kubeflow exists in this space. Ray Spark also exists in this space.

And one of the key challenges a lot of folks have with these things is that when you are building things, the way you locally test things, and you have to convert those things to something else, turn them into jobs or containers to run them on Kubeflow and other things. With Ray, you could have this end-to-end flow all part of under the Ray umbrella.

In terms of making sense of your problem and understanding the problem space, once you have done the initial investment of going all in on Ray, I think you have a really nice way to wrap and map all your problems, or most of your data science problems in the world of Ray. So once you have done the initial investment, it kind of becomes like, OK, it's a gradual progress from that point on.

KASLIN FIELDS: And I didn't know you all-- well, I guess Richard mentioned in his intro that Ray came from UC Berkeley. It was like a university project that kind of developed into a whole product.

MOFI RAHMAN: I think-- yeah. I think Richard at the time was, I think, doing his research-- like a post-grad research there and he was working on distributed systems. And when you're talking about distributed systems, doing it manually or doing it in more complicated ways, the Ray came from-- and this is what I love about projects that come from trying to solve a very specific use case and then they find out this actually maps to a lot of other use cases like this.

KASLIN FIELDS: Surprise.

MOFI RAHMAN: I mean, one of the biggest example of that probably is Kubernetes. Kubernetes initially was trying to solve the problem of running stateless distributed applications on cloud, like commodity hardware. At Google with Borg, the engineers were doing this, and they wanted to figure out how do I make sure this just works across any kind of hardware.

But now Kubernetes-- kind of an anecdotal story. In the 10-year celebration time a few months ago in the last KubeCon, I ran into Tim, Tim Hockin. And I was asking the thing people are using Kubernetes for, did any of you initially think people would be using any of this for?

KASLIN FIELDS: Yeah.

MOFI RAHMAN: And Tim told me 90% of the things people are doing with Kubernetes now were not initially designed for. AI workload, running databases, stateful workloads. These are not the initial subset of things Kubernetes was designed to do.

But Kubernetes came-- was built with a strong sense of we're going to solve this problem. And people found out that other problems also map pretty nicely in a distributed system like this. So Ray has very different, but also similar story to that, where they are trying to solve a very specific use case for their distributed workloads, but turns out a lot of machine learning and AI/ML workload kind of maps to similar paradigm.

KASLIN FIELDS: Which is awesome. The generalization of the usage of distributed systems-- distributed systems is just such a common problem of I have a lot of hardware and I want to run stuff on it. So it makes a lot of sense that these things get kind of generalized out.

And something with Ray that I'm kind of trying to wrap my head around. So Ray has all of these different components like you were mentioning, Ray Data, Ray Core, Ray accelerated DAG. And so it has all these different components.

I know that there's an API component to it. It's not a language of itself. So is it kind of like an open source API standard with all of these modular components in different areas or?

MOFI RAHMAN: So most of them are just Python libraries.

KASLIN FIELDS: Python libraries, right.

MOFI RAHMAN: So if you are using Ray Data, you would be closer to something like NumPy or Pandas that have similar functionality. And this is also anecdotal. I was speaking to someone-- and also my personal a little bit of tinkering that I've done with Ray data. They have a very-- they have taken a lot of inspiration and learning from other existing libraries in the space and tried to simplify all the things people found as challenging in those things, like understanding and mapping machine learning and data science concepts in code in some way.

So if you're coming from the world of Pandas, or scikit, or NumPy and trying to map your knowledge into the world of Ray and Ray Data, I think you'd have a pretty nice time. I think most of the things work as you'd expect it to work for the most part. So that's another reason people are-- I think a lot of folks do like Ray libraries, because their existing knowledge in the data science concepts almost always kind of map fairly easily onto the world of Ray.

KASLIN FIELDS: So if it's mainly a set of Python libraries, essentially, what benefit do you get from running KubeRay from the operator on Kubernetes?

MOFI RAHMAN: So Ray has-- basically at this point-- initially, when Ray started, it was mostly libraries. But now Ray has evolved to the point of it is the library, but also the execution engine underneath. So the things you're running on Kubernetes-- so in terms of the workload that runs within the Ray Serve and Ray Job, which is the construct they have for running the workload itself, that doesn't necessarily need you to use any of the Ray Core libraries.

You could continue using your NumPy code. You could continue using your scikit-learn Pandas code. The way the Ray infrastructure layer works, you add some annotations. And that lets the Ray engine understand that this thing has to spread across-- remotely spread across a bunch of computers.

So if you look through the Ray documentation, you're going to see a bunch of these annotations like Ray Task, or Ray Remote, and things of that nature. So you can define any Python function as a Ray actor. And this actor can be then like spread across-- you can distribute it across any number of Ray Remote runtimes that exists.

So you can actually almost think Ray as broken down into two major parts. One is the library part itself, where all this data science and machine learning code and library exist. The other one, the infrastructure layer.

When ray first started, it was mostly in the library side. They did not really think about the infrastructure. But as Ray grew, the need for having more of an infrastructure solution became more and more apparent.

So the KubeRay part of it is the operator that installs the Ray Job, Ray Service, and Ray Cluster as a construct into Kubernetes. And on top of that, it could run any type of Python code. But Ray also comes with a bunch of this existing Ray Data, Ray Tune, and Ray Reinforcement learning libraries, part of the Ray Core itself.

KASLIN FIELDS: And we've talked on the show before about some of the limitations of similar concepts in Kubernetes. You mentioned during the interview that there are a number of words that are kind of shared between Ray and Kubernetes, and that could get a little bit confusing, like the concept of Ray Jobs and the Tasks and things like that.

MOFI RAHMAN: It's not just Ray and Kubernetes. In general, there is a concept of the job itself is such a ubiquitous term. It exists in the world of HPC and it exists in the world of just in general. So when you say that something is a job--

KASLIN FIELDS: Yes.

MOFI RAHMAN: --it can mean so many different things. So one funny example-- we actually published a video, we can add it to the show notes-- is that in Kubernetes, there is job. When you have a Ray Job in Kubernetes, it's the Ray's job without a space that is the Kubernetes resource definition of Ray job. And then inside ray, they have a concept of a job, which is spelled ray space job, which means execution of a ray, bit of Ray code. So when you read it out loud.

Ray job versus ray job, it just becomes very complicated to think about. But the easiest way to think about it is a ray job. That is the Kubernetes one. It is the ray job, the ray code, plus a submitter that submits the code on your behalf.

Together they become ray job without the space. As I'm saying it, for the third time, I feel like I am getting confused what I'm saying. But like so one of them without the space. It's the Kubernetes resource definition in that defines the ray code plus something that submits the code on your behalf to a Ray Cluster versus a Ray Cluster is kind of a bunch of resource to bundled together, which will be used to run your code, if that makes sense.

KASLIN FIELDS: This is a challenge with abstractions and layers of abstraction. You're taking a lot of the same concepts and abstracting them at different levels. And so they kind of have similar names. That makes sense, though, at least that that would happen. Whether or not I can wrap my head around using the right ones at the right times, we will see.

Awesome. So another thing that I really liked about your interview is that especially Kai-Hsun and you both mentioned a lot of different tools in the same spaces, all at the same time. So there was a section, for example, where Kai-Hsun mentioned different schedulers, so Kueue and Volcano, and Apache YuniKorn, and observability tools and load balancers.

And I liked that that gave this view of this isn't something that's going to exist in a vacuum, obviously. It's part of a distributed system and so it's going to require all of these other components. You also mentioned about the benefits of being tied into Kubernetes. One of the benefits is that you get access to the whole Kubernetes ecosystem because there's already so much stuff there that you can tie together with Kubernetes to build your whole platform that you really need.

So Ray is one component of this. It's doing the custom resource definition. And it's also a tool for developers, I guess.

MOFI RAHMAN: Yeah. So if you actually were to open up the Ray docs and go to the overview, they have actually a pretty nice picture that defines the relationship between all the parts of Ray fairly nicely. So at the very top layer, which is all the Ray AI libraries, which is the Ray Data, Ray Train, Ray Tune, Ray Serve, Ray Rllib, all of them exist as a library.

Those are interchangeable. You could use the Ray versions of those libraries or you can find any other Python code that would do the work. Underneath is the Ray Core, which is the actors, your remotes, all the ways you can basically define how Ray would distribute the workload for you. All of them are usually defined via some sort of annotations in your Python code.

So most of Ray code, if you just have those annotations, your code would just work as a Python function. But the moment you start adding-- running them in Ray, Ray would understand the annotations and run the code a little bit differently. So that's kind of like a very, I don't know, elevator pitch of how Ray does the things.

And all of the Ray Core stuff then eventually runs on some sort of infrastructure. So that could be Anyscale's own service. It could be Kubernetes. It could be running on a VM. It could be on your own data center on bare metal. But Ray Core is the code that is like running on some sort of place where a compute can run on.

Now one of the things in this is missing is that Ray itself does not necessarily have a lot of consideration or solutions built in for things like observability, logging solutions, or things like scheduling out or scheduling out to Kubernetes, which I think is a good thing. Because if Ray were to solve all of these, it will be very locked-in system. But now that they are like falling back onto things like Kueue, YuniKorn, and Volcano makes Ray a lot more composable and makes sure that people can build out. It's more of a composability aspect of Ray, which I really like.

Another project in this space that solves bunch of this AI and ML-related problems that started a while ago, too, is Kubeflow. And it comes with a lot of the pieces bundled in and you have to have all these glues connected to other things. So Kubeflow comes more like a framework where all the moving pieces are all under Kubeflow.

And on the other hand, Ray's approach is more like Ray is a piece that you plug into your system and collect the metrics that you want. Collect the logs that you want. And the challenge with that, then, is you have to maintain your own metric system, own log system, and also security.

Ray does not have a concept of authentication, OAuth, or SSL, all this other stuff in the Ray Cluster. You have to wrap it yourself using your systems. So it's a bit more work, but makes it more composable.

KASLIN FIELDS: Awesome. So I think I've got a good baseline now, hopefully, for understanding Ray a little bit better. I hope to dive into it more in the next week, because I need to do some work on that. So I hope all of you out there enjoyed learning about Ray and will check it out. Any last words that you want to say about Ray, Mofi? I feel like I'm doing like a second interview here because I wanted to learn from you about this.

MOFI RAHMAN: Yeah, I think-- I mean-- so there are a few big aspects to Ray, which is Ray is whatever the person using Ray, Ray to them is different. If you are someone who is very deep into the AI workloads itself, you are probably like a very deep into the Ray libraries versus me. Personally, I'm probably more interested in the Ray Core and distributed aspect of it.

So even when two people are talking about Ray, oftentimes we're not talking about the exact same thing, because Ray does have multifacets of doing different solutions. But when you're talking about building an ML platform, which we are currently-- a lot of folks are thinking about, I think Ray becomes one of the pieces-- that Ray Core part of it becomes one of the pieces that you can use to have a way to distribute your job across.

And KubeRay and GK recently also announced the Ray add on that automatically installs KubeRay on your cluster. And so with all of this, it becomes a little bit easier to get from Kubernetes to Ray. The glue point, the connection point becomes a little bit easier.

KASLIN FIELDS: So you mentioned-- we've talked a lot here about the pathway from local development to Kubernetes clusters, and also mentioned that there's several different personas here. So for those running Kubernetes clusters, if you're serving data scientists who might be using Ray to do data science things, then you might want to consider looking into how to get KubeRay on your clusters and see if your data scientists can benefit from that.

And then from the other side, if you are a data scientist or some developer who is working on AI workloads, you might look into the Ray libraries to see if they're useful. Does that cover, do you think, some of the primary personas?

MOFI RAHMAN: It definitely does. I think, yeah. You have basically about three main ones. You have the Ray Cluster one, which probably we are more closer to as GKE developer advocates. And then you have the machine learning engineers, data scientists that are doing the work.

And in the middle, the glue is you take your Python functions and convert them into distributed applications and then they get to run. So you have three main parts to it. And Ray makes it fairly easy with just simple annotations that you can just put. So oftentimes, the default decisions that Ray takes are almost always fairly optimized for you. So you have to do too much work figuring out the right tuning of the distributed of your application.

KASLIN FIELDS: Excellent. And very last thing before we stop for this episode. Ray Summit is coming up. So there's going to be an event held in San Francisco focused on Ray.

MOFI RAHMAN: Yeah. I think it's a fair-- a really growing project with a lot of cool things coming up. In our chat, Kai-Hsun-- both Richard and Kai-Hsun mentioned new things that are in the roadmap for Ray, both for KubeRay side of things and the Ray libraries. And I'm excited to see where the project goes. It's about five years-ish old and it's only growing.

KASLIN FIELDS: So if you're in the San Francisco area and want to learn more about Ray, you might consider checking out Ray Summit, which is at the very end of September, beginning of October. Like, September 28 to October 2 or something like that. We'll make sure to have the dates and links in the show notes. Thank you so much for that interview, Mofi. I'm really glad that I now know something about Ray.

MOFI RAHMAN: Thanks for having me.

[MUSIC PLAYING]

That brings us to the end of another episode. If you enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on social media at @KubernetesPod, or reach us by email at <KubernetesPodcast@Google.com>.

You can also check out the website at KubernetesPodcast.com, where you will find transcripts and show notes, and links to subscribe. Please consider rating us in your podcast player so we can help more people find and enjoy the show. Thanks for listening and we'll see you next time.

[MUSIC PLAYING]

View More Episodes