Selecting the right service provider and tuning your AI workloads can dramatically reduce associated training and inference costs. The exercise improves time-to-market and competitiveness while lowering expense and the environmental impact of this new technology.
AI is coming for your business. If you’re not already leveraging generative AI to drive operational efficiencies and boost worker productivity, you likely will be soon – according to McKinsey, GenAI is expected to add up to $7.9 trillion annually to the global economy.
Although AI promises to deliver a significant return on investment in the form of increased operational efficiencies, enhanced worker creativity and productivity, and the creation of new value streams, the associated costs to train and implement it are high, both financially and in terms of its environmental impact. To plan and budget for new AI projects, you’ll want to know a few things about where these projects will run.
See also: Groups Focus on Infrastructure for AI and High-Performance Workloads
Not Your Average Cloud Load
The world has a lot of data center capacity, but availability for new projects is declining. In the U.S., for example, the Silicon Valley region has a data center vacancy rate of under 3%.
Traditional public cloud services configured for application hosting can be ill-suited for AI workloads, which require exceptionally high data and processing power and performance. A standard enterprise data center network architecture is also unlikely to have enough network performance to keep the high-powered GPUs that run AI loads operating at capacity, and paying for idle GPUs can be very expensive, not to mention the waste heat they generate.
While traditional hyperscale cloud providers are quickly gearing up to support AI workloads, other options exist. Many data processing businesses were created over the past several years to support similar GPU-intensive loads. Notably, many former crypto mining companies have begun shifting their GPU resources to create cloud services specifically designed to support AI workloads.
These GPU clouds are different from public hyperscaler cloud data centers in a few key ways that make them well-suited for AI projects: Architecturally, they resemble high-performance computing (HPC) clusters in that the projects they take on often take over the entire facility while running. They utilize next-generation GPU compute hardware, not general-purpose CPUs used for transaction processing. They also typically leverage modern data pipeline-oriented storage architectures to support their performance and scale requirements, and their jobs run “closer to the iron” without going through virtual machines or containers.
Many GPU farms set up for the cryptocurrency business were also built with financial efficiency in mind, so they frequently leverage advanced data center cooling technologies, run entirely on renewable energy sources, or have been built in proximity to renewable energy sources like hydroelectric generators.
Pay Per GPU, By the Hour
For workloads on GPU cloud installations, you often pay by the number of GPUs and the time you need them, not by actual workload, so it pays to optimize.
In the case of Atomwise, a pharmaceutical research company using AI for drug discovery and one of our customers, an AI experiment could take several months to run on its GPU-based data pipeline, which regularly needed to ingest petabytes of unstructured data and access tens of millions of files. Atomwise also needed to complete multiple I/O steps to train its model: import the data, clean it, generate descriptors, and package the data. The different requirements of each stage created storage silos and a performance bottleneck; one training cycle could take up to four days to run. By adding software optimization that centralized the data for the entire pipeline and managed the transfer of data from storage to the GPU servers, it was possible to reduce model training time by 20x, so projects that used to take up to three months can now be completed in less than a week.
For Atomwise, using optimization meant that processing that used to take a year could be completed in just 12 days. Not only did this improve the company’s competitive position in advancing research for fields like oncology and rare diseases, but it saved a substantial amount of money in compute expenses.
Optimizing data architectures to optimize GPUs and AI workloads is a developing science. Benchmarks do exist, but different types of AI workloads put varying levels of strain on data infrastructure. For example, research shows that training generative AI language models generates much smaller input/output operations than training models for imaging. However, both modalities put extreme stress on storage and data transfer capabilities.
Although hyperscale cloud vendors have deep resources and can afford to invest in new technologies, some customers report that they can’t always deliver the high-touch attention required to get their services matched to these new workloads. In addition, they are generally staffed for time-sliced and managed processes to begin with, which are different from AI loads.
On the other hand, the service providers with racks of GPUs and HPC gear typically sit on thousands of expensive and sought-after processors that are well-suited for AI workloads powered by nearby clean energy resources. They may also be generally newer to the enterprise hosting business and will work harder to provide more custom and dedicated resources.
Staging the Work
Business tech and research teams are still learning how to spec and acquire services for their new AI workloads. However, one thing that is universal for these new jobs is the need to focus on data storage and the movement of data into processing systems. It is typically those two legs (storage and networking) of the data center triad that have an outsize influence on the cost and efficiency of a project. We find the third leg, compute, is generally at the mercy of the other two.
We are in the early days of learning how to evaluate service providers on their AI workload capabilities. Technology executives must evaluate all options, including and outside of their traditional data center partners, and discuss their needs with the providers they are considering. Workloads should be trialed on candidate systems, and business teams should work on correlating (and adapting) benchmarks, such as MLPerf, if applicable, with their needs.
Selecting the right service provider and tuning your AI workloads can dramatically reduce associated training and inference costs. The exercise improves time-to-market and competitiveness while lowering expense and the environmental impact of this new technology.