While graphics processing units (GPUs) once resided exclusively in the domains of graphic-intensive games and video streaming, GPUs are now equally associated with and machine learning (ML). Their ability to perform multiple, simultaneous computations that distribute tasks—significantly speeding up ML workload processing—makes GPUs ideal for powering artificial intelligence (AI) applications.
The single instruction multiple data (SIMD) stream architecture in a GPU enables data scientists to break down complex tasks into multiple small units. As such, enterprises pursuing AI and ML initiatives are now more likely to choose GPUs instead of central processing units (CPUs) to rapidly analyze large data sets in algorithmically complex and hardware-intensive machine learning workloads. This is especially true for large language models (LLMs) and the generative AI applications built on LLMs.
However, lower-cost CPUs are more than capable of running certain machine learning tasks where parallel processing is unnecessary. These include algorithms that perform statistical computations, such as natural language processing (NLP), and some deep learning algorithms. There are also examples of AI that are appropriate for CPUs, such as telemetry and network routing, object recognition in CCTV cameras, fault detection in manufacturing, and object detection in CT and MRI scans.
Enabling GPU-based app development
While the above CPU use cases continue to deliver benefits to businesses, the big push in generative AI demands more GPUs. This has been a boon to GPU manufacturers across the board, and especially Nvidia, the undisputed leader in the category. And yet, as demand grows for GPUs around the world, more enterprises are realizing that configuring GPU stacks and developing on GPUs is not easy.
To overcome these challenges, Nvidia and other organizations have introduced different tool sets and frameworks to make it easier for developers to manage ML workloads and write high-performance code. These include GPU-optimized deep learning frameworks such as PyTorch and TensorFlow as well as Nvidia’s CUDA framework. It’s not an overstatement to say that the CUDA framework has been a game-changer in accelerating GPU tasks for researchers and data scientists.
On-premises GPUs vs. cloud GPUs
Given that GPUs are preferable to CPUs for running many machine learning workloads, it’s important to understand what deployment approach—on-premises or cloud-based—is most suitable for the AI and ML initiatives a given enterprise undertakes.
In an on-premises GPU deployment, a business must purchase and configure their own GPUs. This requires a significant capital investment to cover both the cost of the GPUs and building a dedicated data center, as well as the operational expense of maintaining both. These businesses do enjoy an advantage of ownership: Their developers are free to iterate and experiment endlessly without incurring additional usage costs, which would not be the case with a cloud-based GPU deployment.
Cloud-based GPUs, on the other hand, offer a pay-as-you-go paradigm that enables organizations to scale their GPU use up or down at a moment’s notice. Cloud GPU providers offer dedicated support teams to handle all tasks related to GPU cloud infrastructure. In this way, the cloud GPU provider allows users to quickly get started by provisioning services, which saves time and cuts down on liabilities. It also ensures that developers have access to the latest technology and the right GPUs for their current ML use cases.
Businesses can gain the best of both worlds through a hybrid GPU deployment. In this approach, developers can use their on-prem GPUs to test and train models, and devote their cloud-based GPUs to scale services and provide greater resilience. Hybrid deployments allow enterprises to balance their expenditures between CapEx and OpEx while ensuring that GPU resources are available in the vicinity of the enterprise’s data center operations.
Optimizing for machine learning workloads
Working with GPUs is challenging, both from the configuration and app development standpoints. Enterprises that opt for on-prem deployments often experience productivity losses as their developers must perform repetitive procedures to prepare a suitable environment for their operations.
To prepare the GPU for performing any tasks, one must complete the following actions:
- Install and configure the CUDA drivers and CUDA toolkit to interact with the GPU and perform any additional GPU operations.
- Install the necessary CUDA libraries to maximize the GPU efficiency and use the computational resources of the GPU.
- Install deep learning frameworks such as TensorFlow and PyTorch to perform machine learning workloads like training, inference, and fine-tuning.
- Install tools like JupyterLab to run and test code and Docker to run containerized GPU applications.
This lengthy process of preparing GPUs and configuring the desired environments frequently overwhelms developers and may also result in errors due to unmatched or outdated versions of required tools.
When enterprises provide their developers with turnkey, pre-configured infrastructure and a cloud-based GPU stack, developers can avoid performing burdensome administrative tasks and procedures such as downloading tools. Ultimately, this enables developers to focus on high-value work and maximize their productivity, as they can immediately start building and testing solutions.
A cloud GPU strategy also provides businesses with the flexibility to deploy the right GPU for any use case. This enables them to match GPU utilization to their business needs, even as those needs change, boosting productivity and efficiency, without being locked into a specific GPU purchase.
Moreover, given how rapidly GPUs are evolving, partnering with a cloud GPU provider offers GPU capacity wherever the organization needs it, and the cloud provider will maintain and upgrade their GPUs to ensure customers always have access to GPUs that offer peak performance. A cloud or hybrid deployment paradigm will enable data science teams to focus on revenue-generating activities instead of provisioning and maintaining GPUs and related infrastructure, as well as avoid investing in hardware that could soon become outdated.
Kevin Cochrane is chief marketing officer at Vultr.
—
Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.
Copyright © 2024 IDG Communications, Inc.