HPC

NVIDIA Blackwell Deployments - GB200 NVL72, DGX/HGX B200, HGX B100

May 29, 2024

8 min read

NVIDIA Blackwell Configurations & Specifications

Available in four models, Blackwell GPUs cater to a range of computing needs:

NVIDIA DGX™GB200 NVL72: A Blackwell platform that connects 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale design. The GB200 NVL72 is a liquid-cooled, rack-scale solution.
NVIDIA DGX™ B200: A Blackwell platform that combines 8x NVIDIA Blackwell GPUs (HGX Baseboard) and a next-gen data center x86 processor to deliver 72 petaFLOPS training and 144 petaFLOPS inference GPU compute. DGX B200 will be available to purchase from Exxact once available.
NVIDIA HGX™ B200: A Blackwell platform with the same groundbreaking architecture. It is the eight-Blackwell GPU baseboard that will be offered in custom configurations by solutions partners and us at Exxact.
NVIDIA HGX™ B100: Also, an x86 platform based on a cut down eight-Blackwell GPU baseboard, delivering 112 AI petaFLOPs. HGX B100 is a drop-in compatible upgrade for NVIDIA Hopper™ systems in existing data center infrastructure.

Configuration Specifications

	GB200 NVL72	HGX B200	HGX B100
Blackwell GPUs	72	8	8
FP4 Tensor Core	1,440 petaFLOPS	144 petaFLOPS	112 petaFLOPS
FP8/FP6 Tensor Core	720 petaFLOPS	72 petaFLOPS	56 petaFLOPS
TF16 Tensor Core	720 petaOPS	72 petaOPS	56 petaOPS
FP16/BF16 Tensor Core	360 petaFLOPS	36 petaFLOPS	28 petaFLOPS
TF32 Tensor Core	180 petaFLOPS	18 petaFLOPS	14 petaFLOPS
FP32	6,480 petaFLOPS	640 petaFLOPS	480 petaFLOPS
FP64	3240 petaFLOPS	320 petaFLOPS	240 petaFLOPS
Total GPU Memory	13.5TB	Up to 1.5TB	Up to 1.5TB
Aggregate Memory Bandwidth	576TB/s	Up to 64TB/s	Up to 64TB/s
Aggregate NVLink Bandwidth	130TB/s	14.4TB/s	14.4TB/s
CPU Cores	2592 ARM Neoverse V2 Cores	--	--

GPU Option Specifications

	GB200	B200	B100
Type	Grace Blackwell Superchip	GPU Accelerator	GPU Accelerator
Memory Clock	8Gbps HBM3e	8Gbps HBM3e	8Gbps HBM3e
Memory Bandwidth	16TB/sec	8TB/sec	8TB/sec
VRAM	384GB (2x2x96GB)	192GB (2x96GB)	192GB (2x96GB)
FP4 Dense Tensor	20 PFLOPS	9 PFLOPS	7 PFLOPS
INT8/FP8 Dense Tensor	10 P(FL)OPS	4.5 P(FL)OPS	3.5 P(FL)OPS
FP16 Dense Tensor	5 PFLOPS	2.2 PFLOPS	1.8 PFLOPS
TF32 Dense Tensor	2.5 PFLOPS	1.1 PFLOPS	0.9 PFLOPS
FP64 Dense Tensor	90 TFLOPS	40 TFLOPS	30 TFLOPS
NVLink Bandwidth	NVLink 5 (2x 1800GB/sec)	NVLink 5 (1800GB/sec)	NVLink 5 (1800GB/sec)
PCIe 6.0 Bandwidth	2x 256GB/sec	256GB/sec	256GB/sec
GPU	2x Blackwell GPU	Blackwell GPU	Blackwell GPU
GPU Transistor Count	416B (2x2x104B)	208B (2x104B)	208B (2x104B)
TDP	2700W	1000W	700W
Manufacturing Process	TSMC 4NP	TSMC 4NP	TSMC 4NP
Interface	Superchip	SXM-Next?	SXM-Next?
Architecture	Grace + Blackwell	Blackwell	Blackwell

Blackwell DGX GB200 NVL72

The NVIDIA GB200 NVL72 cluster connects 36 GB200 Superchips (36 Grace CPUs and 72 Blackwell GPUs) in a rack-scale design. The GB200 NVL72 is a liquid cooled, rack-scale 72-GPU NVLink connected powerhouse, that can act as one massive GPU with up to 13.5TB of GPU memory or 30TB of total fast memory.

It introduces performance capabilities for running Trillion Parameter AI, complex data analytics, and high-performance computing tasks. The use of expansive rack integrated NVLink Spine and full rack liquid cooling enables fast 1.8TB/s GPU to GPU interconnect speeds as well as efficient cooling system to address the power and cooling challenges, as well as rack density posed by the 72 GPU deployment.

The NVLink Spine used in DGX GB200 NVL72 translates 72 individual GPUs into a single compute node with scalability through advanced networking options. Pairing DGX GB200 NVL with NVIDIA Quantum InfiniBand, Spectrum-X800, and BlueField-3, huge AI factories spanning thousands of Grace Blackwell Compute nodes represent a new standard for performance, efficiency, and density for data centers running the largest AI models.

NVIDIA DGX/HGX B200 - The DGX H100 Successor

Alongside the AI factories built with alongside NVIDIA’s Grace CPU, NVIDIA is still supporting their x86 deployment with their classic gold box DGX. The NVIDIA DGX B200 is designed as the true successor to the DGX H100 for offering the most powerful GPU compute nodes for AI development and deployment. DGX B200 offers 3x the AI training performance and 15x the real-time inferencing throughput on GPT MoE 1.8T (a trillion-parameter model).

DGX B200 houses an HGX system board featured 8 Blackwell GPUs (16 Blackwell dies) connected through NVIDIA 5th gen NVLink. Multiple NVIDIA DGX B200 can further be interconnected via the NVLink Switch System, similar to the deployments with Hopper.

Apart from the DGX, the NVIDIA HGX B200 system board will be the building blocks for HGX servers from with dedicated features. Expect HGX B200 equipped computer servers from the likes of Supermicro, Asus, and Exxact.

NVIDIA HGX B100 - Easy Upgrade from Hopper HGX

A slightly cut down version of the HGX B200 features an easy deployment for existing NVIDIA Hopper HGX users. This includes DGX H100, H200 as well as HGX H100 and H200 users.

The NVIDIA HGX B100 offers the same new features such as 2nd gen Transformer Engine, next gen Tensor Cores, next gen NVLink, and more, but can be easy swapped into existing system. The system board is plug and play drop-in replacement; slide out the Hopper HGX and replaced it with a Blackwell HGX for fast deployment and increased compute.

NVIDIA Blackwell’s Role in the Age of AI

With Generative AI models getting larger and larger to accomplish tasks at a higher fidelity, there needs to be a way to power these trillion parameter models. These highly complex models are the future of accelerated computing including finding cures to cancer, predicting weather events, automating a fully robotic fleet.

The journey to achieve large scale artificial intelligence starts at the computational resources delivered from the best of the best. New and extremely large LLMs and generative AI models not only require large amounts of compute to train but to also run for inference tasks.

NVIDIA Blackwell is a generational leap with the power and energy efficiency needed to train and inference these generative AI models with the goal of democratizing the usage and deployment of foundational models. By employing NVIDIA Blackwell and other future NVIDIA GPU Architectures we can believe that AI models will become more complex than ever before.

Topics

Have any questions?

HPC