By increasing the utilization rates of GPUs, the cost of building and deploying AI applications will substantially decrease.
At the VMworld 2019 conference today, NVIDIA and VMware announced they jointly will make it possible to now deploy VMware virtual machines on graphical processor units (GPUs).
As part of this initiative, NVIDIA announced that its virtual GPU (vGPU) software can now be deployed on virtual machines on servers in addition to existing support for client systems. NVIDIA Virtual Compute Server (vComputeServer) software for GPUs is also being extended to add support for VMware vSphere platforms. NVIDIA also committed to making its hub for accessing tools for building artificial intelligence (AI) applications available on VMware platforms.
See also: What are the 3 Key Components of Artificial Intelligence Readiness?
Using these NVIDIA technologies, VMware pledged to make available a cloud service consisting of Amazon EC2 bare metal instances accelerated by NVIDIA T4 GPUs running vComputeServer software on VMware Cloud on AWS.
Collectively, these advances not only have significant implications for improving GPU utilization, they also make it possible for data scientists to aggregate multiple workloads on GPUs running on VMware servers residing on-premises or in the cloud, says John Fanelli, vice president of product for NVIDIA Grid.
While interest in employing GPUs to build artificial intelligence (AI) applications has been massive, the cost of building those applications has often been prohibitive. In the absence of a virtual machine, each GPU has previously been dedicated to running one workload at a time. By increasing the utilization rates of GPUs, the cost of building and deploying AI applications will substantially decrease, says Fanelli.
That’s critical because those costs have inhibited organizations from investing more in AI applications that have the potential to transform almost aspect of human existence.
“AI is the most powerful technology of our time,” says Fanelli.
The performance of those AI workloads running on virtual machines, however, will vary depending on their individual attributes, says Fanelli. Many developers will be able to compensate for any performance issues by taking advantage of a NVIDIA CUDA toolkit to execute AI workloads in parallel, noted Fanelli.
As it becomes more affordable to efficiently build and deploy AI applications, the number of AI projects that will be launched in the months ahead should increase. To accelerate that process, VMware also said customers will be able to migrate workloads from to GPU instances running in local data centers to the cloud using VMware HCX tools that automate the movement of virtual machines and accelerate the transfer of data between platforms.
GPUs accessed mainly in the cloud have become the preferred platform for training AI models because of how efficiently GPUs manage memory and I/O overhead. NVIDIA, however, has been making a case with mixed success to also rely on GPUs to run the inference engines need to execute those AI models in place of x86 servers. By adding support for server virtual machines to its software, it should become a lot more feasible to run multiple inference engines on the same GPU platform in much the same way inference engines are deployed on x86 platforms.
Of course, having access to additional infrastructure resources doesn’t necessarily guarantee AI success. Those resources, however, will go a very long way toward reducing the cost of being wrong during the AI development process.