Google Cloud introduced a brand new supercomputer virtual-machine collection aimed toward quickly coaching massive AI fashions.
Unveiled on the Google I/O convention, the brand new A3 supercomputer VMs are purpose-built to deal with the appreciable useful resource calls for of a giant language mannequin (LLM).
“A3 GPU VMs had been purpose-built to ship the highest-performance coaching for immediately’s ML workloads, full with trendy CPU, improved host reminiscence, next-generation Nvidia GPUs and main community upgrades,” the corporate mentioned in an announcement.
The cases are powered by eight Nvidia H100 GPUs, Nvidia’s latest GPU that simply start transport earlier this month, in addition to Intel’s 4th Technology Xeon Scalable processors, 2TB of host reminiscence and three.6 TBs bisectional bandwidth between the eight GPUs by way of Nvidia’s NVSwitch and NVLink 4.0 interconnects.
All collectively, Google is claiming these machines can present as much as 26 exaFlops of energy. That’s the cumulative efficiency of the whole supercomputer, not every particular person occasion. Nonetheless, it blows away the previous document for the quickest supercomputer, Frontier, which was just a bit over one exaFlop.
Based on Google, A3 is the primary production-level deployment of its GPU-to-GPU information interface, which Google calls the infrastructure processing unit (IPU). It permits for sharing information at 200 Gbps immediately between GPUs with out having to undergo the CPU. This result’s a ten-fold improve in obtainable community bandwidth for A3 digital machines in comparison with prior-generation A2 VMs.
A3 workloads will likely be run on Google’s specialised Jupiter information heart networking cloth, which the corporate says “scales to tens of 1000’s of extremely interconnected GPUs and permits for full-bandwidth reconfigurable optical hyperlinks that may regulate the topology on demand.”
Google will likely be providing the A3 in two methods: prospects can run it themselves or as a managed service the place Google handles many of the work. If you happen to choose to do it your self, the A3 VMs run on Google Kubernetes Engine (GKE) and Google Compute Engine (GCE). If you happen to go together with a managed service, the VMs run on Vertex, the corporate’s managed machine studying platform.
The A3 digital machines can be found for preview, which requires filling out an software to hitch the Early Entry Program. Google makes no guarantees you’re going to get a spot in this system.
Copyright © 2023 IDG Communications, Inc.