HPC

NVIDIA Blackwell Deployments - GB200 NVL72, DGX/HGX B200, HGX B100

May 29, 2024
8 min read
EXX-Blog-Nvidia-NVL72-B200-B100-DGX-HGX.jpg

NVIDIA Blackwell Configurations & Specifications

Available in four models, Blackwell GPUs cater to a range of computing needs:

  • NVIDIA DGXâ„¢GB200 NVL72: A Blackwell platform that connects 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale design. The GB200 NVL72 is a liquid-cooled, rack-scale solution.
  • NVIDIA DGXâ„¢ B200: A Blackwell platform that combines 8x NVIDIA Blackwell GPUs (HGX Baseboard) and a next-gen data center x86 processor to deliver 72 petaFLOPS training and 144 petaFLOPS inference GPU compute. DGX B200 will be available to purchase from Exxact once available.
  • NVIDIA HGXâ„¢ B200: A Blackwell platform with the same groundbreaking architecture. It is the eight-Blackwell GPU baseboard that will be offered in custom configurations by solutions partners and us at Exxact.
  • NVIDIA HGXâ„¢ B100: Also, an x86 platform based on a cut down eight-Blackwell GPU baseboard, delivering 112 AI petaFLOPs. HGX B100 is a drop-in compatible upgrade for NVIDIA Hopperâ„¢ systems in existing data center infrastructure.

Configuration Specifications

GB200 NVL72 HGX B200 HGX B100
Blackwell GPUs 72 8 8
FP4 Tensor Core 1,440 petaFLOPS 144 petaFLOPS 112 petaFLOPS
FP8/FP6 Tensor Core 720 petaFLOPS 72 petaFLOPS 56 petaFLOPS
TF16 Tensor Core 720 petaOPS 72 petaOPS 56 petaOPS
FP16/BF16 Tensor Core 360 petaFLOPS 36 petaFLOPS 28 petaFLOPS
TF32 Tensor Core 180 petaFLOPS 18 petaFLOPS 14 petaFLOPS
FP32 6,480 petaFLOPS 640 petaFLOPS 480 petaFLOPS
FP64 3240 petaFLOPS 320 petaFLOPS 240 petaFLOPS
Total GPU Memory 13.5TB Up to 1.5TB Up to 1.5TB
Aggregate Memory Bandwidth 576TB/s Up to 64TB/s Up to 64TB/s
Aggregate NVLink Bandwidth 130TB/s 14.4TB/s 14.4TB/s
CPU Cores 2592 ARM Neoverse V2 Cores -- --

GPU Option Specifications

GB200 B200 B100
Type Grace Blackwell Superchip GPU Accelerator GPU Accelerator
Memory Clock 8Gbps HBM3e 8Gbps HBM3e 8Gbps HBM3e
Memory Bandwidth 16TB/sec 8TB/sec 8TB/sec
VRAM 384GB
(2x2x96GB)
192GB
(2x96GB)
192GB
(2x96GB)
FP4 Dense Tensor 20 PFLOPS 9 PFLOPS 7 PFLOPS
INT8/FP8 Dense Tensor 10 P(FL)OPS 4.5 P(FL)OPS 3.5 P(FL)OPS
FP16 Dense Tensor 5 PFLOPS 2.2 PFLOPS 1.8 PFLOPS
TF32 Dense Tensor 2.5 PFLOPS 1.1 PFLOPS 0.9 PFLOPS
FP64 Dense Tensor 90 TFLOPS 40 TFLOPS 30 TFLOPS
NVLink Bandwidth NVLink 5
(2x 1800GB/sec)
NVLink 5
(1800GB/sec)
NVLink 5
(1800GB/sec)
PCIe 6.0 Bandwidth 2x 256GB/sec 256GB/sec 256GB/sec
GPU 2x Blackwell GPU Blackwell GPU Blackwell GPU
GPU Transistor Count 416B (2x2x104B) 208B (2x104B) 208B (2x104B)
TDP 2700W 1000W 700W
Manufacturing Process TSMC 4NP TSMC 4NP TSMC 4NP
Interface Superchip SXM-Next? SXM-Next?
Architecture Grace + Blackwell Blackwell Blackwell

Blackwell DGX GB200 NVL72

The NVIDIA GB200 NVL72 cluster connects 36 GB200 Superchips (36 Grace CPUs and 72 Blackwell GPUs) in a rack-scale design. The GB200 NVL72 is a liquid cooled, rack-scale 72-GPU NVLink connected powerhouse, that can act as one massive GPU with up to 13.5TB of GPU memory or 30TB of total fast memory.

It introduces performance capabilities for running Trillion Parameter AI, complex data analytics, and high-performance computing tasks. The use of expansive rack integrated NVLink Spine and full rack liquid cooling enables fast 1.8TB/s GPU to GPU interconnect speeds as well as efficient cooling system to address the power and cooling challenges, as well as rack density posed by the 72 GPU deployment.

The NVLink Spine used in DGX GB200 NVL72 translates 72 individual GPUs into a single compute node with scalability through advanced networking options. Pairing DGX GB200 NVL with NVIDIA Quantum InfiniBand, Spectrum-X800, and BlueField-3, huge AI factories spanning thousands of Grace Blackwell Compute nodes represent a new standard for performance, efficiency, and density for data centers running the largest AI models.

nvidia dgx gb200 nvl72

NVIDIA DGX/HGX B200 - The DGX H100 Successor

Alongside the AI factories built with alongside NVIDIA’s Grace CPU, NVIDIA is still supporting their x86 deployment with their classic gold box DGX. The NVIDIA DGX B200 is designed as the true successor to the DGX H100 for offering the most powerful GPU compute nodes for AI development and deployment. DGX B200 offers 3x the AI training performance and 15x the real-time inferencing throughput on GPT MoE 1.8T (a trillion-parameter model).

DGX B200 houses an HGX system board featured 8 Blackwell GPUs (16 Blackwell dies) connected through NVIDIA 5th gen NVLink. Multiple NVIDIA DGX B200 can further be interconnected via the NVLink Switch System, similar to the deployments with Hopper.

Apart from the DGX, the NVIDIA HGX B200 system board will be the building blocks for HGX servers from with dedicated features. Expect HGX B200 equipped computer servers from the likes of Supermicro, Asus, and Exxact.

nvidai hgx b100 and b200

NVIDIA HGX B100 - Easy Upgrade from Hopper HGX

A slightly cut down version of the HGX B200 features an easy deployment for existing NVIDIA Hopper HGX users. This includes DGX H100, H200 as well as HGX H100 and H200 users.

The NVIDIA HGX B100 offers the same new features such as 2nd gen Transformer Engine, next gen Tensor Cores, next gen NVLink, and more, but can be easy swapped into existing system. The system board is plug and play drop-in replacement; slide out the Hopper HGX and replaced it with a Blackwell HGX for fast deployment and increased compute.

NVIDIA Blackwell’s Role in the Age of AI

With Generative AI models getting larger and larger to accomplish tasks at a higher fidelity, there needs to be a way to power these trillion parameter models. These highly complex models are the future of accelerated computing including finding cures to cancer, predicting weather events, automating a fully robotic fleet.

The journey to achieve large scale artificial intelligence starts at the computational resources delivered from the best of the best. New and extremely large LLMs and generative AI models not only require large amounts of compute to train but to also run for inference tasks.

NVIDIA Blackwell is a generational leap with the power and energy efficiency needed to train and inference these generative AI models with the goal of democratizing the usage and deployment of foundational models. By employing NVIDIA Blackwell and other future NVIDIA GPU Architectures we can believe that AI models will become more complex than ever before.

Topics

EXX-Blog-Nvidia-NVL72-B200-B100-DGX-HGX.jpg
HPC

NVIDIA Blackwell Deployments - GB200 NVL72, DGX/HGX B200, HGX B100

May 29, 20248 min read

NVIDIA Blackwell Configurations & Specifications

Available in four models, Blackwell GPUs cater to a range of computing needs:

  • NVIDIA DGXâ„¢GB200 NVL72: A Blackwell platform that connects 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale design. The GB200 NVL72 is a liquid-cooled, rack-scale solution.
  • NVIDIA DGXâ„¢ B200: A Blackwell platform that combines 8x NVIDIA Blackwell GPUs (HGX Baseboard) and a next-gen data center x86 processor to deliver 72 petaFLOPS training and 144 petaFLOPS inference GPU compute. DGX B200 will be available to purchase from Exxact once available.
  • NVIDIA HGXâ„¢ B200: A Blackwell platform with the same groundbreaking architecture. It is the eight-Blackwell GPU baseboard that will be offered in custom configurations by solutions partners and us at Exxact.
  • NVIDIA HGXâ„¢ B100: Also, an x86 platform based on a cut down eight-Blackwell GPU baseboard, delivering 112 AI petaFLOPs. HGX B100 is a drop-in compatible upgrade for NVIDIA Hopperâ„¢ systems in existing data center infrastructure.

Configuration Specifications

GB200 NVL72 HGX B200 HGX B100
Blackwell GPUs 72 8 8
FP4 Tensor Core 1,440 petaFLOPS 144 petaFLOPS 112 petaFLOPS
FP8/FP6 Tensor Core 720 petaFLOPS 72 petaFLOPS 56 petaFLOPS
TF16 Tensor Core 720 petaOPS 72 petaOPS 56 petaOPS
FP16/BF16 Tensor Core 360 petaFLOPS 36 petaFLOPS 28 petaFLOPS
TF32 Tensor Core 180 petaFLOPS 18 petaFLOPS 14 petaFLOPS
FP32 6,480 petaFLOPS 640 petaFLOPS 480 petaFLOPS
FP64 3240 petaFLOPS 320 petaFLOPS 240 petaFLOPS
Total GPU Memory 13.5TB Up to 1.5TB Up to 1.5TB
Aggregate Memory Bandwidth 576TB/s Up to 64TB/s Up to 64TB/s
Aggregate NVLink Bandwidth 130TB/s 14.4TB/s 14.4TB/s
CPU Cores 2592 ARM Neoverse V2 Cores -- --

GPU Option Specifications

GB200 B200 B100
Type Grace Blackwell Superchip GPU Accelerator GPU Accelerator
Memory Clock 8Gbps HBM3e 8Gbps HBM3e 8Gbps HBM3e
Memory Bandwidth 16TB/sec 8TB/sec 8TB/sec
VRAM 384GB
(2x2x96GB)
192GB
(2x96GB)
192GB
(2x96GB)
FP4 Dense Tensor 20 PFLOPS 9 PFLOPS 7 PFLOPS
INT8/FP8 Dense Tensor 10 P(FL)OPS 4.5 P(FL)OPS 3.5 P(FL)OPS
FP16 Dense Tensor 5 PFLOPS 2.2 PFLOPS 1.8 PFLOPS
TF32 Dense Tensor 2.5 PFLOPS 1.1 PFLOPS 0.9 PFLOPS
FP64 Dense Tensor 90 TFLOPS 40 TFLOPS 30 TFLOPS
NVLink Bandwidth NVLink 5
(2x 1800GB/sec)
NVLink 5
(1800GB/sec)
NVLink 5
(1800GB/sec)
PCIe 6.0 Bandwidth 2x 256GB/sec 256GB/sec 256GB/sec
GPU 2x Blackwell GPU Blackwell GPU Blackwell GPU
GPU Transistor Count 416B (2x2x104B) 208B (2x104B) 208B (2x104B)
TDP 2700W 1000W 700W
Manufacturing Process TSMC 4NP TSMC 4NP TSMC 4NP
Interface Superchip SXM-Next? SXM-Next?
Architecture Grace + Blackwell Blackwell Blackwell

Blackwell DGX GB200 NVL72

The NVIDIA GB200 NVL72 cluster connects 36 GB200 Superchips (36 Grace CPUs and 72 Blackwell GPUs) in a rack-scale design. The GB200 NVL72 is a liquid cooled, rack-scale 72-GPU NVLink connected powerhouse, that can act as one massive GPU with up to 13.5TB of GPU memory or 30TB of total fast memory.

It introduces performance capabilities for running Trillion Parameter AI, complex data analytics, and high-performance computing tasks. The use of expansive rack integrated NVLink Spine and full rack liquid cooling enables fast 1.8TB/s GPU to GPU interconnect speeds as well as efficient cooling system to address the power and cooling challenges, as well as rack density posed by the 72 GPU deployment.

The NVLink Spine used in DGX GB200 NVL72 translates 72 individual GPUs into a single compute node with scalability through advanced networking options. Pairing DGX GB200 NVL with NVIDIA Quantum InfiniBand, Spectrum-X800, and BlueField-3, huge AI factories spanning thousands of Grace Blackwell Compute nodes represent a new standard for performance, efficiency, and density for data centers running the largest AI models.

NVIDIA DGX/HGX B200 - The DGX H100 Successor

Alongside the AI factories built with alongside NVIDIA’s Grace CPU, NVIDIA is still supporting their x86 deployment with their classic gold box DGX. The NVIDIA DGX B200 is designed as the true successor to the DGX H100 for offering the most powerful GPU compute nodes for AI development and deployment. DGX B200 offers 3x the AI training performance and 15x the real-time inferencing throughput on GPT MoE 1.8T (a trillion-parameter model).

DGX B200 houses an HGX system board featured 8 Blackwell GPUs (16 Blackwell dies) connected through NVIDIA 5th gen NVLink. Multiple NVIDIA DGX B200 can further be interconnected via the NVLink Switch System, similar to the deployments with Hopper.

Apart from the DGX, the NVIDIA HGX B200 system board will be the building blocks for HGX servers from with dedicated features. Expect HGX B200 equipped computer servers from the likes of Supermicro, Asus, and Exxact.

NVIDIA HGX B100 - Easy Upgrade from Hopper HGX

A slightly cut down version of the HGX B200 features an easy deployment for existing NVIDIA Hopper HGX users. This includes DGX H100, H200 as well as HGX H100 and H200 users.

The NVIDIA HGX B100 offers the same new features such as 2nd gen Transformer Engine, next gen Tensor Cores, next gen NVLink, and more, but can be easy swapped into existing system. The system board is plug and play drop-in replacement; slide out the Hopper HGX and replaced it with a Blackwell HGX for fast deployment and increased compute.

NVIDIA Blackwell’s Role in the Age of AI

With Generative AI models getting larger and larger to accomplish tasks at a higher fidelity, there needs to be a way to power these trillion parameter models. These highly complex models are the future of accelerated computing including finding cures to cancer, predicting weather events, automating a fully robotic fleet.

The journey to achieve large scale artificial intelligence starts at the computational resources delivered from the best of the best. New and extremely large LLMs and generative AI models not only require large amounts of compute to train but to also run for inference tasks.

NVIDIA Blackwell is a generational leap with the power and energy efficiency needed to train and inference these generative AI models with the goal of democratizing the usage and deployment of foundational models. By employing NVIDIA Blackwell and other future NVIDIA GPU Architectures we can believe that AI models will become more complex than ever before.

Topics