Set up and configuration of high-performance computing (HPC) methods is usually a appreciable problem that requires expert IT execs to arrange the software program stack, for instance, and optimize it for max efficiency – it is not like constructing a PC with components purchased off NewEgg.
GigaIO, which makes a speciality of infrastructure for AI and technical computing, is trying to simplify the duty. The seller just lately introduced a self-contained, single-node system with 32 configured GPUs within the field to supply simplified deployment of AI and supercomputing assets.
So far, the one approach to harness 32 GPUs would require 4 servers with eight GPUs apiece. There can be latency to deal with, because the servers talk over networking protocols, and all that {hardware} would eat flooring house.
What makes GigaIO’s machine – known as SuperNODE – notable is that it gives a selection of GPUs: as much as 32 AMD Intuition MI210 GPUs or 24 NVIDIA A100s, plus as much as 1PB storage to a single off-the-shelf server. The MI210 is a step down in efficiency from the top-of-the-line MI250 card (at the very least for now) that is used within the Frontier exaFLOP supercomputer. It has a couple of much less cores and fewer reminiscence however remains to be based mostly on AMD’s Radeon GPU expertise.
“AMD collaborates with startup innovators like GigaIO in an effort to carry distinctive options to the evolving workload calls for of AI and HPC,” mentioned Andrew Dieckmann, company vp and common supervisor of the info heart and accelerated processing group at AMD, in a press release. “The SuperNODE system created by GigaIO and powered by AMD Intuition accelerators gives compelling TCO for each conventional HPC and generative AI workloads.”
SuperNODE is constructed on GigaIO’s FabreX customized material expertise, a memory-centric material that reduces latency from system reminiscence of 1 server speaking with different servers within the system to only 200ns. This allows the FabreX Gen4 implementation to scale as much as 512Gbits/sec bandwidth.
FabreX can join a wide range of assets, together with accelerators resembling GPUs, DPUs, TPUs, FPGAs and SoCs; storage units, resembling NVMe, PCIe native storage; and different I/O assets related to compute nodes. Mainly, something that makes use of a PCI Specific bus will be related to FabreX for direct device-to-device communication throughout the identical material.
SuperNODE has three modes of operation: beast mode, for purposes that take advantage of many or all GPUs; freestyle mode, the place each person will get their very own GPU to make use of for processing functions; and swarm mode, the place purposes run on a number of servers.
SuperNODE can run current purposes written on standard AI frameworks resembling PyTorch and TensorFlow with out requiring modification. It makes use of Nvidia’s Brilliant Cluster Supervisor Knowledge Science software program to handle and configure the setting and deal with scheduling in addition to container administration.
SuperNODE is out there now from GigaIO.
Copyright © 2023 IDG Communications, Inc.