Kubernetes Podcast from Google: Episode 92 - Accelerators and GPUs at NVIDIA, with Pramod Ramarao

#92 February 25, 2020

Accelerators and GPUs at NVIDIA, with Pramod Ramarao

Hosts: Craig Box, Adam Glick

GPUs do more than move shapes on a gamer’s screen - they increasingly move self-driving cars and 5G packets, running on Kubernetes. Pramod Ramarao is a Product Manager at NVIDIA, and joins your hosts to talk about accelerators, containers, drivers, machine learning and more.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

Printer networking
Adam wants software-defined faucets
- Glowing LED faucet - where does the electricity come from?
- Faucet, a SDN controller

CRAIG BOX: Hi, and welcome to the Kubernetes Podcast from Google. I'm Craig Box.

ADAM GLICK: And I'm Adam Glick.

[MUSIC PLAYING]

ADAM GLICK: It's been an interesting week in technology for me. I've been trying to set up a network printer here at home. And what I realized is setting up network printing, especially if you're trying to wire it in, but you don't have a point to plug it into in the room. Like, network printing for some reason is unreasonably hard, but USB is fantastically simple. So my solution, and what I'd recommend to everyone, just plug that USB port into your computer and forget the rest of it.

[LAUGHS]

CRAIG BOX: What size or flavor of a USB port does your printer use?

ADAM GLICK: It uses the standard-- I think it's called the B. Technically, it's a USB Type-B.

CRAIG BOX: The fact that you have to think about that-- I don't think USB actually is that easy these days. We have DisplayPort over Thunderbolt 3 over whatever, and it's all on the same connector. It used to be true. USB was a fantastic little square thing, and now there's just so many different things that I'd actually say we should all go back to the 25 pin parallel port.

ADAM GLICK: [LAUGHS]

CRAIG BOX: You would never go wrong with one of them. And if you want it on the network, then you get yourself one of those little JetDirect boxes, and plug that parallel port in, and plug an ethernet cable out the other side here.

ADAM GLICK: There you go.

CRAIG BOX: I think we're showing our age here, Adam.

ADAM GLICK: That's a hardware hacking, circa 1995.

[LAUGHS]

CRAIG BOX: Aside from technology, I hear you've been doing a little bit of home renovation?

ADAM GLICK: Sometimes, people refer to networking as plumbing, and I was doing a little actual plumbing. We had a leaky faucet. And all I could think while I was going in and fixing the faucet was, we've come up with software-defined networking. If we could come up with software defined plumbing, I would totally write the YAML to fix that. It Would be much easier.

CRAIG BOX: I stayed with a friend last year, and they had something which I believe that came shipped from AliExpress in China. But it is a faucet where you turn it on, and it's got a ring of LED lights around it. So it will be blue when it's cold, it's green when it's at the right temperature, and red when it's hot to tell you. And I think, A, that's amazing. B, I have no idea where the electricity came from.

ADAM GLICK: There's no battery in it?

CRAIG BOX: Well, I don't know. It could have been battery powered, it could have been mains powered. Maybe it was just powered by the pressure of the water running through it. But no discernible way to tell.

ADAM GLICK: I was about to say, what if they just put a little flow turbine in there, and you could drive a couple of LEDs off of that. That would be amazing. I would love that if that were actually what they did.

CRAIG BOX: And there will have to be some software in that somewhere-- a little microcontroller controlling those LEDs. So I put it to you that it is software defined plumbing.

ADAM GLICK: Fair enough. Shall we get to the news?

CRAIG BOX: Let's get to the news.

[MUSIC PLAYING]

ADAM GLICK: Google Cloud has launched Application Manager for Anthos GKE in beta. Application Manager lets you create a dev to production app delivery flow incorporating Google's best practices for managing release configurations. The new add-on embraces GitOps principles, and uses kustomize, with a "K," to allow you to create declarative configurations that you can audit and review using Git. A demo video and tutorial on using the new application manager are provided with the release.

CRAIG BOX: Two new Anthos GKE features are also generally available this week. Node pool location allows GKE users to specify the location for node pools independently from the location of clusters. Surge upgrades allows users to specify the number of extra, or surge, nodes and number of accepted unavailable nodes during an upgrade, improving reliability and reducing disruption to customer workloads. Surge upgrades is becoming the default upgrade method for new node pools starting in March.

ADAM GLICK: Google Cloud last week launched its Anthos-ready storage qualification, a new designation for storage technology that is qualified to work with Google Cloud's Kubernetes and Istio based Anthos based product running on-premises. Partners include Dell EMC, HPE, NetApp, Portworx, Pure Storage and Robin.io.

CRAIG BOX: Banzai Cloud has released Supertubes, their version of Kafka running on Kubernetes and Istio. Supertubes is a commercial product built on Banzai's open source Kafka and Istio operator's. Recent posts on their blog focus on the DI capabilities as well as the Istio-powered benefits, such as securing communications with mTLS using the speedy Envoy instead of slow, old Java. Longtime listeners will remember from episode 59 that the companies projects are all named after surf spots, and Supertubes is in Jeffreys Bay, South Africa. Banzai Cloud has also added Istio support to their Vault secret injection webhook for Kubernetes.

ADAM GLICK: StackRox has released its state of containers in Kubernetes security report. The new report points out that security is the top concern for those surveyed. Only 6% of the respondents have avoided a security issue in the past year, and misconfiguration is the top reason for those security incidents.

In addition, non-Kubernetes based orchestrators and self-managed Kubernetes clusters dropped while almost every other kind of Kubernetes grew. GKE topped the growth charts with a whopping 75% growth in the past six months. Multicloud use also grew while hybrid use actually shrunk. The report has a lot of other interesting data points, but this is another clear sign that the Kubernetes world is still evolving very rapidly, and that the only constant is change.

CRAIG BOX: The eBPF based network plugin, Cilium, has released version 1.7. Having listened to our show last week and fallen madly in love with eBPF, you'll be interested in features such as full replacement of the kube-proxy, including direct server return, cluster-wide policies, and a new UI for the Hubble observability platform.

ADAM GLICK: Convox, and application management tool for Kubernetes applications, has announced their multicloud offering. They now offer support for Google Cloud, AWS, Azure, and Digital Ocean. Convox points out that their multicloud offering was driven by customer demand looking for cost optimization, redundancy, and the ability to run workloads in clouds that are best for a specific use case, giving the example of machine learning and Google Cloud.

CRAIG BOX: Damian Peckett AKA, fsyncd, is an overachiever when doing coding challenges. Asked to scale a Kubernetes deployment based on a custom metric, he didn't just reach for the built-in autoscaler, but instead wrote a whole new one in Rust. He has released it as Pangolin, an open source, Horizontal autoscaler based on Prometheus Metrics, with options for a number of configurable control strategies. Pangolin currently implements bang-bang control theory, which is unrelated to the popular Chinese chicken dish.

ADAM GLICK: Dell EMC has announced that they are shipping a new data center in a box called the Modular Data Center Micro 415. The new box is for the streaming data use case, but it's interesting to note that the architecture has Flink running on Kubernetes at its center. It's great to see Kubernetes become a core component embedded in systems, because it's when Kubernetes gets boring that you know it has become core and trusted technology.

CRAIG BOX: Do you think the Hooli Box runs Kubernetes?

ADAM GLICK: [LAUGHS]

CRAIG BOX: Platform9, guests on show 88, have announced that they will now be distributed with Promark. Promark is a reseller, not to be confused with cheap and cheerful UK fashion brand Primark. The deal brings Platform9's managed Kubernetes and OpenStack distributions to the market in the US through an additional channel. Though unfortunately, not to the British high street.

ADAM GLICK: Google Cloud has posted guidance on running multi-tenant in Kubernetes. They acknowledge that this can be challenging, and provide both guidance and features to help with this work in Anthos GKE. The blog calls out defense in-depth strategies, and announces the GA of several features to aid in this work.

The features called out are GKE Sandbox, which is an implementation of gVisor isolation that is now GA. Shielded nodes for ensuring that you are running on an authorized node with rootkit and bootkit protections is also now GA. And finally, workload identity, a way for your application to authenticate to other Google Cloud Services, which is listed as going GA soon.

CRAIG BOX: Andrew Albright has posted a four-part screen cast series of his first ever contribution to Minikube. He cites this very Kubernetes podcast as inspiration for publishing these videos and further merged contributions. He can even conspicuously be seen listening to us in the first video. So congratulations, Andrew, you've more than earned your shoutout.

ADAM GLICK: Finally, are you going to KubeCon EU in March? If so, you should consider attending the Kubernetes contributor summit. There are tracks for both beginners looking to learn how to make their first contribution as well as for veteran contributors. Registration is now open, and the schedule has been posted online. If you haven't signed up for KubeCon yet, you can get 15% off with our registration discount code that you can find in the show notes.

CRAIG BOX: And that's the news.

[MUSIC PLAYING]

ADAM GLICK: Pramod Ramarao is a product manager with NVIDIA, focusing on accelerating computing. He leads product management for the CUDA platform and data center software, including container technologies. Welcome to the show, Pramod.

PRAMOD RAMARAO: Thanks for having me, and I'm super excited to be here on the show.

ADAM GLICK: For those that may be unaware, who is NVIDIA?

PRAMOD RAMARAO: NVIDIA video is an AI computing company historically focused on accelerating graphics. But over the last few years, we have focused a lot on accelerating artificial intelligence and scientific workloads as well. So we are at the intersection of graphics and computing is how we like to position ourselves.

ADAM GLICK: Interesting. I normally think of NVIDIA as the company that's made the video cards, and more specifically, the GPU chip that sits at the heart of video cards in my home gaming rig for the past decade plus. The heart of that was the GPU. What is a GPU?

PRAMOD RAMARAO: That's usually a question that we hear from a lot of people. NVIDIA invented the GPU. So the GPU stands for Graphics Processing Unit. So many years ago when NVIDIA was founded, we found a need for accelerating graphics.

So for example, games that you play on your PCs-- all the game physics and the visualizations are offloaded onto the GPU for processing, and makes your games much faster. You get faster frame rates, more effective gameplay essentially. And a few years ago, because the GPU is so good at computing, we essentially realized that it has so many cores on it and it's very good for parallel processing.

So there was a big effort from NVIDIA to see if we could use GPUs instead to solve more scientific computing, where there's a very big need for computing power. And these are problems that you find in traditional science, such as molecular dynamics, quantum chemistry, physics simulations. So we basically essentially kind of repurposed the GPU, if I can say that, to actually solve scientific problems. And we made a lot of good headway into the scientific community, and we helped a lot of the national labs and supercomputing sites around the world to actually build supercomputers using NVIDIA GPUs.

ADAM GLICK: Traditionally, people think of the CPU as kind of the heart of the computer, the thing that does a lot of the computing. You're talking about a GPU doing that. How are those two different, and why would you use one versus the other?

PRAMOD RAMARAO: CPUs have been traditionally very good at serial processing. So these are very good for interactive tasks that you would perform on your system. For example, if you're running a desktop, or you're writing a word processing application, or using Microsoft Excel.

A CPU is very good for tasks that require sequential computing, and kind of interactive communication between the user and the system. Whereas the GPU is more of a throughput machine. Because it has so many processors on the GPU, it's very good for throughput where you are actually asking the GPU to perform a heavy set of computing tasks.

Now if you only used your computer for doing serial processing tasks, then a GPU isn't very useful. But if you're using a computer to actually do very heavy computing or mathematical calculations, that's the value proposition of the GPU. So a lot of books that I have read have this very good analogy where they compare the CPU to a car, but the GPU to a tractor trailer which is basically focused on throughput computing.

ADAM GLICK: I'm going to date myself a little bit here, but I remember the days back when you could buy a math coprocessor for your computer, and you used to have your computer with its CPU. And then sometimes be built-in, sometimes, you'd buy it separately-- there'd be a separate math coprocessor. Is this essentially the evolution of that?

PRAMOD RAMARAO: Yeah. So Adam, that's a very good point. I think you can think about it as essentially a coprocessor to the system, where you're kind of offloading all the computation tasks. So this keeps the CPU free to do other activities-- for example, IO processing or network processing.

Whereas you're kind of offloading the heavy computation onto an accelerator such as the GPU. So it's kind of similar, but a very different evolution. Whereas the math coprocessor was very specific to certain tasks, but you can pretty much think of the GPU as doing more general purpose computing.

ADAM GLICK: Kubernetes is typically a server technology-- like, it's meant to run on servers, people run services. Sometimes, people say it's a platform for platforms, which is very different than kind of how I think of the GPUs which, traditionally, have been end-user computing, gaming, beefing up the graphics for things. You're talking a little bit more here about computational workloads and parallel computing. Is that what you're focusing on with the work from NVIDIA with GPUs?

PRAMOD RAMARAO: Basically, we have many different businesses within NVIDIA. I mean, graphics is still a core function of the company. As I mentioned, a few years ago, we kind of repurposed the GPU to do more traditional parallel computing for scientific use cases and other use cases that we have been targeting GPUs for.

So one of our core businesses is also in the data center, where we're actually putting GPUs to accelerate servers. So you have servers that have GPUs in them for parallel processing. And as we see more data centers move towards using servers with GPUs, you can actually think about running workloads using Kubernetes, which is very common in the data center for orchestrating different types of workloads.

ADAM GLICK: You mentioned general purpose, and I've sometimes seen people refer to general purpose computing on GPUs. How do programmers write for those kinds of things?

PRAMOD RAMARAO: So NVIDIA, along with repurposing the GPUs for general purpose computing, we actually built a platform that we call CUDA which is our general purpose computing platform. So CUDA is our platform for programming the GPUs for compute. And the way to think about it is that CUDA is based on C++ has very minimal extensions to C++. So you could think about a programmer just writing code as he would in C++, and then we provide a rich developer environment using compiler toolchains, debuggers, and profilers that we allow developers to just write in CUDA C++, and have their applications on the GPUs.

ADAM GLICK: Is that an add-on that people use in their development environment? Is it a different development environment, a different compiler? How do people actually put that together if they're building that application?

PRAMOD RAMARAO: We provide an SDK for CUDA developers. So the way you would do it is you would write your program in CUDA C++, and then you would use our compiler toolchain to actually build your applications. And then once you have your executables, you would run them. And we also provide a system software along with the developer environment. And you would use essentially the system software to run your applications. And that's kind of transparent because the system software takes care of moving the data as well as moving the execution onto the GPU itself.

ADAM GLICK: And is that the same way that people would program if they're doing ML workloads, or if they're doing graphical workloads-- doing a lot of polygon rendering basically?

PRAMOD RAMARAO: Yeah. So graphics has been traditionally on OpenGL and Vulkan, for example. It's kind of a similar methodology, where you would write shaders and you would have the NVIDIA driver offload the shaders onto the GPU.

So you could think about CUDA in somewhat of a similar manner, where one would write a CUDA application. And then we provide a lot of APIs for data movement and for launching the workloads onto the GPU. So as I said, we have some minimal extensions to C++.

So as long as you use those extensions, then you can mark the code that you want to be offloaded onto the GPU. So it's pretty close to C++. I would say that the ramp for getting into CUDA has not been very steep. We've tried to make it as close to C++ as possible to reduce the friction of adoption.

ADAM GLICK: When I think about the world of graphics cards and drivers, historically, that's been one that has been largely proprietary software with a number of vendors, each doing their own thing and locking those pieces down as much as possible. Obviously, we have a very strong interest and belief in open source, and I was really surprised when I saw NVIDIA getting involved in the Kubernetes project and starting to make some code check-ins starting with, I believe it was 1.16. What made NVIDIA decide to get involved with the Kubernetes project?

PRAMOD RAMARAO: You mentioned proprietary software. So I think given our history with computer graphics-- and that's very proprietary ecosystem due to competitive reasons-- but as we've seen GPUs being moved more into the compute workloads for general purpose computing or scientific computing, and ML in the last few years. And also we see GPUs being deployed in the data center where there's a lot of adoption of open source software.

And if you see all the work that we've been doing right from accelerating deep learning frameworks-- for example, TensorFlow or PyTorch-- we have a lot of contributions in the open source to accelerate these different frameworks. We work with a lot of different HPC applications or high performance computing applications. Many of those applications are open source. So we have, I think, as of 2020, we have over 600 plus HPC applications that we've accelerated and contributed back to open source.

And to respond to your question as to why Kubernetes, as we observed more GPUs popping up in the data centers, and people running deep learning or machine learning workloads, we're seeing that there's a lot of software dependencies. And containers solve some of these software dependencies because we have dependencies on different CUDA versions or driver versions. There's TensorFlow, and TensorFlow itself has a lot of interdependencies in terms of which versions of deep learning libraries or NPI.

So containers kind of solves that dependency problem, where you can really package up all of your applications in your self-contained container. And then once we built up containers, how do you deploy these containers at scale? And Kubernetes seems to be the obvious container orchestration platform at that point.

And then this is a great way for us to deploy machine learning workloads using Kubernetes and containers. So we kind of embrace the container ecosystem and said, yes, this is the way to package and deploy NVIDIA software at scale. And therefore, we felt that contributing to Kubernetes is very important for the company itself as we see more customers embrace GPUs and adopt GPUs in their data center. So we want to be able to be a core contributor of Kubernetes to make sure that GPUs are first class resources and first class citizens in the Kubernetes ecosystem.

ADAM GLICK: That's great, and we're always happy to see more people joining the open-source community and helping make the community stronger through all that. One of the things that has been proprietary and fairly guarded software through most of that timeframe has been the drivers for the various GPUs. Are the drivers open source as well?

PRAMOD RAMARAO: The driver is not open source yet. So we still have proprietary drivers. So we are looking at ways to simplify the deployment of drivers themselves, because one of the things that we've seen is, as I mentioned before, there's a lot of confusion about which drivers to use which GPUs, and so on.

So we have done a number of things to improve the deployment of drivers themselves through better packaging, containerizing the drivers themselves for easier deployments. So installing your driver is as easy as starting a container, for example. So we have done a number of things to ease the deployment of the drivers themselves, but as of today, the drivers still continue to be proprietary.

ADAM GLICK: And with many pieces of software that's out there, people have become familiar with the EULAs-- the End User License Agreements-- the clickthroughs that people have to go through. How does that work with the open-source code that you're creating, especially for server-side pieces?

PRAMOD RAMARAO: We are improving the deployment of the drivers. So if you used packages, for example, then you're not really presented with a EULA. And the other thing that I mentioned was we are also working on containerizing the drivers themselves. So it's as easy as starting up a container. So if you used any of those methodologies, then it's easy to deploy our drivers at scale by using package managers or even orchestrators like Kubernetes, where with Kubernetes itself would just start up a container to get the drivers provisioned on the GPU nodes.

ADAM GLICK: How does the use of GPUs with Kubernetes impact the portability of the applications that developers write in Kubernetes?

PRAMOD RAMARAO: We are moving towards containerizing all of our applications that use NVIDIA GPUs. So we publish every month curated containers on the NGC platform. And NGC is our hub for getting accelerated containers, pre-trained models, and even health charts for deployment.

So the way to think about it is that we have built containers to deploy deep learning and machine learning workloads using some of the technology that we have done to make GPUs visible in different container runtimes for example, whether it's Docker, or LXC, or cri-o, or other container runtime technologies. So by using container runtime stack that integrates with different container runtime technologies and the containers themselves that we make available on NGC, you can get a lot of portability. So the portability still exists for GPU applications themselves. And then any customer can just pull the containers from the registry and be able to run them using Kubernetes. So we've tried to make it as simple as running any CPU application, for example.

ADAM GLICK: And is there any kind of software emulator that people can use if they want to be developing for things that will be CUDA-based GPU applications, but don't necessarily have that hardware in their development system?

PRAMOD RAMARAO: I would say that's one of the beauties of CUDA platform. So CUDA works on pretty much every GPU, whether it's a GeForce whether it's a Quadro, or whether it's even one of our high end Tesla GPUs for servers.

So what we've tried to do when we built up the CUDA ecosystem is make sure that every GPU that NVIDIA produces is able to run the same CUDA application. I mean, even if you buy the lowest GeForce card for example, for your workstation and you develop CUDA software, we have made it possible that you can take the same application and run it on a server with a Tesla GPU. And even today, where we work with many different cloud service providers to have GPUs available at even less than about $0.50 an hour.

So it's really that we try to make it as easy as possible for developers to get access to CUDA applications. And the way our development environment is built up, we've made it possible that you can run the same CUDA application on the lowest GPU to the most expensive GPU. That's the way to think about it.

ADAM GLICK: Is there something special that developers need to do to make sure that the system has the right hardware needed for their application when running in Kubernetes?

PRAMOD RAMARAO: We have worked with the Kubernetes community to have this concept of labels, for example. So you can basically label a GPU node with different attributes, saying that maybe it has this amount of memory, or it's this generation of GPU. So if you label the node with standard attributes, you can actually use those attributes in the Pod spec. And Pod spec is what the users would write in order to submit jobs to Kubernetes. So by using the Pod spec and these different labels, you can actually have your application be deployed on the right GPU based on what your application needs.

ADAM GLICK: You've talked a bunch about ML workloads that people are running in Kubernetes, and they're building using the power of the GPUs in order to do a lot of the parallel processing. What's the software stack for that look like?

PRAMOD RAMARAO: We don't recommend any specific deep learning framework or ML technologies. So our goal is to make sure that GPUs are available in every framework, in every application. So to that end, we invest a lot of engineering resources into making sure that the most popular deep learning frameworks, whether it's TensorFlow, or whether it's a PyTorch, or MXNet, has full GPU acceleration. We contribute all of those changes back to the community. And we also, as I said, build all of these deep learning frameworks, and we make them available every month on our NGC registry for people to consume at no cost, essentially.

So we see all of our customers use these pre-built containers, or we provide them with the set of dependencies that they need, or even the set of building blocks if they wanted to build their own containers. So we see a lot of customers either using the pre-built containers or using the building blocks that we provide to build their own containers, and deploy them in Kubernetes. So we also work with the higher level workflows-- for example, whether it's Kubeflow or some other workflow within Kubernetes-- and make sure that GPUs are supported in all of those different frameworks as well.

ADAM GLICK: There's a lot of emerging technologies that I hear a lot of organizations talking about these days, including NVIDIA. Notably, I saw some discussion about 5G, about edge, and about Kubernetes. How do you think about these three coming together?

PRAMOD RAMARAO: So edge computing is a very rapidly emerging space for NVIDIA from both technology as well as a business opportunity perspective. So specifically, the 5G-- we've been using GPUs to accelerate both the packet processing as well as the baseband technology.

So we're seeing GPUs emerge for accelerating the 5G processing as well. And there are other use cases in the edge as well, whether it's video inferencing or image recognition. Our GPUs are pretty good at processing all of these different workloads.

So we have, specifically for the edge, been building technology that allows customers to use GPU accelerated servers as well as put Kubernetes on top of it. And use some of our software stack in terms of containers to be able to accelerate these different workloads, whether it's packet processing, or whether it's video inferencing at the edge, or whether it's image processing, or image recognition. So we've been trying to put a lot of effort into building those edge software stacks.

ADAM GLICK: You used a term of art from the worlds of machine learning and AI that I'm not sure everyone will be familiar with. What you mean when you say inferencing?

PRAMOD RAMARAO: Usually, deep learning is kind of divided into kind of two workflows. So where you take the neural network that you have built for your task, and you kind of train it against some of the data that you want to train your neural network on.

And once it's trained, and you've set up all the weights of your neural network, you can then take the pre-trained model, and usually deploy it in production to actually do inferencing. So for example, if you give an image of a dog to your neural network, then is your neural network able to recognize whether that is an image of a dog or if it's an image of something else. So the act of the neural network recognizing the image would then be an inferencing when it's already been trained on a specific set of images.

ADAM GLICK: So if you think of basically the two things that happen, there's training-- and that's building your model essentially, based on your data-- and then there's inferencing which is once you have your model built, going and testing it against things using it in production.

PRAMOD RAMARAO: That's right.

ADAM GLICK: All in all, what's next for NVIDIA as you work on Kubernetes? What's going to come next in terms of check-ins? What are you looking for next as you invest more in the Kubernetes community?

PRAMOD RAMARAO: NVIDIA's work in Kubernetes kind of started in 2017 actually. So where we worked for the community to define something called the device plugin API. So the device plugin API is essentially what allows different devices to be recognized as resource types within Kubernetes, whether they're GPUs or whether they're NICs inside Kubernetes.

So that was the work that we contributed back in about 2017 or so. And then we have an implementation of the device plugin that basically advertises GPUs to the scheduler. So that was one of the first contributions that we did back in 2017.

So in terms of looking forward, we're kind of focused on a few areas. So the first one is deployment. So in order to provision a GPU node, you require CUDA, you require drivers, and you require the device plug-in.

So we've been working with Red Hat to build something called the GPU Operator, which uses the operator framework in Kubernetes to simplify the deployment-- all of these different components as containers. So it's almost like the operator comes in and starts up a number of containers to provision the GPU node. So that's one of the first things that we've been working on.

The second area of focus for us has been what you mentioned, Adam, a few minutes ago in terms of our contributions in 1.16, which relates extensions to the topology manager. So we're adding things like GPU affinity into Kubernetes, because if you pick the wrong GPUs in terms of topology when you are scheduling jobs within Kubernetes, you can see an order of magnitude performance variation.

So we're adding extensions to the topology manager to be able to choose the right set of GPUs based on the topology of your node, and that's the extensions to the topology manager. The other area that we have been focused on also is the scheduler itself. So a lot of the jobs that use GPUs are mainly batch jobs which kind of run for a fairly long period of time, whether it's training, or you're training a neural network, or even if you're running an HPC application trying to solve a computation problem.

So these are all batch jobs. And Kubernetes has been very focused on microservices, where the services might not run for a long period of time. So we've been looking at ways in order to add more batch type capabilities into the Kubernetes scheduler. So that's also one of our main areas of focus, is how do we make GPU jobs be more efficient when running under Kubernetes?

And there's a number of improvements that we've been looking-- to the scheduler and also contribute them back to the community. And of course, there's also going to be many things as we build more denser GPU nodes with multiple GPUs packed into single a single server. There will be things like how do you use multiple GPUs to solve a computation problem, or even multi-nodes where you're using multiple nodes to be able to solve or train a deep neural network.

So these are things like multi-node deep learning training, and how do you achieve that within Kubernetes, and what are all the changes across the software stack that makes many of these things possible. So those are a few highlights of the areas that we have been actively working. And we want to contribute all of these changes back to the community so that people can get access to all of these changes, and be able to run applications at scale with GPUs in the data center.

ADAM GLICK: That's fantastic. It's been great seeing NVIDIA get involved in the community, and it's been wonderful having you on the show, Pramod.

PRAMOD RAMARAO: Yeah, thanks a lot, Adam. I'm really happy to be here.

ADAM GLICK: You can find Pramod on the web at https://devblogs.nvidia.com/author/pramarao/.

[MUSIC PLAYING]

CRAIG BOX: Thanks for listening. As always, if you've enjoyed the show, please help us spread the word and tell a friend or two. If you have any feedback, you can find us on Twitter @kubernetespod, or you can send us an email at kubernetespodcast@google.com.

ADAM GLICK: You can also check out our website at kubernetespodcast.com, where you'll find transcripts and show notes. Until next time, take care.

CRAIG BOX: See you next week.

[MUSIC PLAYING]

View More Episodes

Accelerators and GPUs at NVIDIA, with Pramod Ramarao

Chatter of the week

News of the week

Links from the interview

Transcript