When Nvidia launched its Ada Lovelace household of graphics processing models earlier this week, it primarily targeted on its top-of-the-range AD102 GPU and its flagship GeForce RTX 4090 graphics card. It did not launch too many particulars about its AD103 and AD104 graphics chips. Thankfully, Nvidia uploaded its Ada Lovelace whitepaper immediately that comprises a great deal of information concerning the new GPUs and fills in lots of gaps. We have up to date the RTX 40-series GPUs every part we all know hub with the brand new particulars, however this is the overview of the brand new and attention-grabbing data.
Large GPUs for Large Gaming
We already know that Nvidia’s range-topping AD102 is a 608-mm^2 GPU containing 76.3 billion transistors, 18,432 CUDA cores, and 96MB of L2 cache. We now additionally know that AD103 is a 378.6 mm^2 graphics processor that includes 45.9 billion transistors, 10,240 CUDA cores, and 64MB L2 cache. As for the AD104, it has a die measurement of 294.5 mm^2, 35.8 billion transistors, 7680 CUDA cores, and 48MB of L2.
GPU/Graphics Card | Full AD102 | RTX 4090 | RTX 4080 16GB | RTX 4080 12GB | RTX 3090 Ti |
---|---|---|---|---|---|
Structure | AD102 | AD102 | AD103 | AD104 | GA102 |
Course of Know-how | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N | Samsung 8LPP |
Transistors (Billion) | 76.3 | 76.3 | 45.9 | 35.8 | 28.3 |
Die measurement (mm^2) | 608 | 608 | 378.6 | 294.5 | 628.4 |
Streaming Multiprocessors | 144 | 128 | 76 | 60 | 84 |
GPU Cores (Shaders) | 18432 | 16384 | 9728 | 7680 | 10752 |
Tensor Cores | 576 | 512 | 320 | 240 | 336 |
Ray Tracing Cores | 144 | 144 | 80 | 60 | 84 |
TMUs | 512 | 512 | 304? | 240 | 336 |
ROPs | 192 | 192 | 112 | 80 | 112 |
L2 Cache (MB) | 96 | 96 | 64 | 48 | 6 |
Enhance Clock (MHz) | ? | 2520 | 2505 | 2600 | 1860 |
TFLOPS FP32 (Enhance) | ? | 82.6 | 48.7 | 40.1 | 40.0 |
TFLOPS FP16 (FP8) | ? | 661 (1321) | 390 (780) | 319 (639) | 320 (N/A) |
TFLOPS Ray Tracing | ? | 191 | 113 | 82 | 78.1 |
Reminiscence Interface (bit) | 384 | 384 | 256 | 192 | 384 |
Reminiscence Pace (GT/s) | ? | 21 | 22.4 | 21 | 21 |
Bandwidth (GBps) | ? | 1008 | 736 | 504 | 1008 |
TDP (watts) | ? | 450 | 320 | 285 | 450 |
Launch Date | ? | Oct 12, 2022 | Nov 2022? | Nov 2022? | Mar 2022 |
Launch Value | ? | $1,599 | $1,199 | $899 | $1,999 |
One of many attention-grabbing issues that Nvidia tells in its whitepaper is that Ada Lovelace GPUs use high-speed transistors in essential paths to spice up most clock speeds. In consequence, its fully-enabled AD102 GPU with 18,432 CUDA cores is ”able to working at clocks over 2.5 GHz, whereas sustaining the identical 450W TGP.” Maintaining this in thoughts, we’re not stunned that the corporate is speaking about 3.0 GHz clocks for the GeForce RTX 4090 (with 16,384 CUDA cores) reached in its labs. At 3.0 GHz, the GeForce RTX 4090 will completely headline our listing of the finest graphics playing cards round.
Along with excessive clocks, Nvidia’s Ada Lovelace GPU additionally boast huge L2 caches that enhance efficiency in compute intensive workloads (e.g., ray tracing, path tracing, simulations, and so on.) and reduces reminiscence bandwidth necessities. Primarily, Nvidia’s Ada GPUs take a web page from RDNA 2 Infinity Cache’s e book right here, though we imagine that basic targets for the brand new structure had been set nicely earlier than AMD’s Radeon RX 6000-series merchandise debuted in 2020.
Talking of workloads like simulations, we should notice that within the supercomputer world they’re carried out with numbers in double-precision floating-point format (FP64) to enhance accuracy of the outcomes. FP64 is extra pricey than FP32 each by way of efficiency and by way of {hardware} complexity. That is why pc graphics use FP32 codecs and plenty of simulations of non-critical property are additionally achieved with FP32 precision. In the meantime, the AD102 GPU options simply 288 FP64 cores (two per streaming multiprocessors) included to make sure any applications with FP64 code function accurately, together with FP64 Tensor Core code.
Nonetheless, AD102’s FP64 charge is 1/sixty fourth the TFLOP charge of FP32 operations (which is in keeping with the Ampere structure). Nvidia doesn’t depict its FP64 cores in diagrams of its streaming multiprocessor (SM) modules and doesn’t disclose the variety of such cores in AD103 and AD104 GPUs. The poor FP64 charge of Ada graphics processors emphasizes that these components are aimed primarily at gaming.
Extra Transistors = Extra Efficiency
Complexity and die sizes of Nvidia’s Ada Lovelace graphics processors in comparison with the corporate’s Ampere GPUs shouldn’t come as a shock. The brand new Ada GPUs are made utilizing TSMC’s 4N (5nm-class) fabrication applied sciences, whereas Ampere was fabbed on Samsung Foundry’s 8LPP course of (a 10nm-class node with a ten% optical shrink). That added complexity (transistor rely) is what permits spectacular efficiency good points in issues like ray tracing and high quality good points with DLSS 3.0.
GPU/Graphics Card | AD102 | RTX 4090 | RTX 4080 16GB | RTX 4080 12GB | RTX 3090 Ti |
---|---|---|---|---|---|
GPU | AD102 | AD102 | AD103 | AD104 | GA102 |
TFLOPS FP32 (Enhance) | ? | 82.6 | 48.7 | 40.1 | 40.0 |
TFLOPS FP16 (FP8) | ? | 661 (1321) | 390 (780) | 319 (639) | 320 (N/A) |
TFLOPS Ray Tracing | ? | 191 | 113 | 82 | 78.1 |
One other factor to notice is that Nvidia’s AD102 GPU has a better transistor density than its lesser siblings. On the one hand, that 3.6% added transistor density permits it to pack considerably extra execution models into AD102 in comparison with its smaller brethren. However however, the relaxed transistor density of AD103 and AD104 in lots of instances permits higher yields (assuming that the node’s defect density isn’t excessive on the whole) and better clocks.
It’s exhausting to make predictions concerning the frequency potential of AD103 and AD104 with out entry to precise {hardware} and/or information of their precise yield charges. Nonetheless, if the AD102 can run at 2.50 GHz ~ 3.0 GHz, then it’s affordable to anticipate that AD103 and AD104 have even larger potential. We all know as nicely that the RTX 4080 12GB makes use of a completely enabled AD104 chip working at 2610 MHz, whereas RTX 4080 16GB makes use of 95% of an AD103 chip (76 of 80 SMs) working at 2505 MHz, and RTX 4090 solely makes use of 89% (128 of 144 SMs) working at 2510 MHz — additionally with 25% of the L2 cache disabled.
An excessive variety of execution models, enabled by excessive complexity, coupled with excessive clocks ought to ship exceptional efficiency good points. Nvidia’s GeForce RTX 4090 has over two instances larger peak theoretical FP32 compute charge (~82.6 TFLOPS) in comparison with the GeForce RTX 3090 Ti (~40 TFLOPS).
In the meantime, the present lineup of Nvidia’s Ada GPUs for demanding avid gamers exhibits that the corporate is again on monitor with its three-chip method to the high-end gaming market. Usually, Nvidia releases its flagship gaming GPU, follows it up with a chip that has roughly 66% ~ 75% of the flagship’s sources (e.g., CUDA cores), after which unveils a graphics processor that has about 50% of the flagship’s models. With the Ampere household, that technique was considerably adjusted as Nvidia’s GA103 chip was primarily designed with laptops in thoughts and barely made it to desktops (it was late to the social gathering too), but with the Ada technology Nvidia is again to its regular method with three chips.
Extra SKUs Incoming
One attention-grabbing takeaway is the disparity between most configurations provided by the AD102 GPU and the GeForce RTX 4090 graphics card. AD102 packs 18,432 CUDA cores, whereas the GeForce RTX 4090 comes with 16,384 CUDA cores enabled. Such an method offers Nvidia some extra flexibility relating to yields and the introduction of latest graphics playing cards sooner or later, so there’s loads of room for an RTX 4090 Ti, RTX 4080 Ti, and RTX 5500/5000 Ada Technology for ProViz markets, and so on.
In the meantime, the GeForce RTX 4080 16GB and RTX 4080 12GB use almost full AD103 and fully-fledged AD104 GPUs, respectively. We have no idea what the longer term brings, however we anticipate we’ll ultimately see cut-down variations of AD103 and AD104 GPUs. We are able to speculate about GeForce RTX 4070 Ti and/or RTX 4070 primarily based on cut-down bins of the AD104 chip, in addition to the potential for ultra-high-end graphics options for laptops powered by the AD103 graphics processor, however we will solely guess concerning the specs of those components.
Some Ideas
Nvidia’s Ada Lovelace structure is each a qualitative and quantitative leap over the Ampere structure. Nvidia not solely critically enhanced the efficiency of its ray tracing, tensor cores, and another models on the architectural stage, but it surely additionally elevated their quantity, and boosted their clocks. A significant enhancement listed below are the massively elevated L2 caches of Ada GPUs in comparison with Ampere GPUs.
To a big diploma, these leaps had been enabled by the Nvidia GPU-optimized 4N course of expertise from TSMC. Moreover, the corporate additionally used high-speed transistors to extend the frequencies of its new graphics processors, which offered extra efficiency good points.
However a modern manufacturing node and enormous die sizes of Nvidia’s new GPUs additionally make the components considerably dearer to construct, which is why costs of GeForce RTX 4080 and 4090 graphics playing cards carry significantly larger value tags than their direct predecessors.
Nvidia has launched solely 5 Ada Lovelace-based merchandise to date: GeForce RTX 4080 12GB, RTX 4080 16GB, and RTX 4090 graphics playing cards for desktops, alongside the RTX 6000 Ada technology for workstation/datacenters and L40 (Lovelace 40) boards for high-end workstations and virtualized workstation environments.
Contemplating that the corporate can provide full-fat AD102 and cut-down variations of AD102, AD103, and AD104 GPUs, we will envision a large number of new GeForce RTX 40-series playing cards for consumer machines and Ada RTX-series options for datacenters. In the meantime, Nvidia might be prepping some smaller GPUs (AD106, AD107), so it appears to be like just like the Ada Lovelace household of merchandise will probably be not less than as broad because the Ampere lineup.