NVIDIA A100 Ampere Solutions

NVIDIA Ampere: Unprecedented Acceleration at Every Scale

The NVIDIA A100 Tensor Core GPU is based on the new NVIDIA Ampere GPU architecture, and builds upon the capabilities of the prior NVIDIA V100 GPU. It adds many new features and delivers significantly faster performance for HPC, AI, and data analytics workloads. The A100 provides strong scaling for GPU compute and DL applications running in single– and multi-GPU workstations, servers, clusters, cloud data centers, systems at the edge, and supercomputers. The A100 GPU enables building elastic, versatile, and high throughput data centers.

Let's Start Building


As an NVIDIA Elite Partner, Exxact Corporation works closely with the NVIDIA team to ensure seamless factory development and support. We pride ourselves on providing value-added service standards unmatched by our competitors.


Find the Right Fit for Your Needs

The Most Powerful End-to-End AI and HPC Data Center Platforms from Exxact

4x NVIDIA A100 SXM4 Server

TensorEX Server

TS2-171138844-NTS

Let's Start Building
  • 2x AMD EPYC 7002 CPUs
  • 4x NVIDIA Tesla A100 SXM4-40GB + NVLink
  • Up to 4x Mellanox InfiniBand HDR 200Gbps Cards
  • Up to 8TB DDR4 Memory
  • Up to 60TB NVMe Storage

8x NVIDIA A100 SXM4 Server

TensorEX Server

TS4-168747704-NTS

Let's Start Building
  • 2x AMD EPYC 7002 CPUs
  • 8x NVIDIA Tesla A100 SXM4-40GB + NVSwitch
  • Up to 9x Mellanox InfiniBand HDR 200Gbps Cards
  • Up to 4TB DDR4 Memory
  • Up to 96TB NVMe Storage



8x NVIDIA A100 PCIe Server

TensorEX Server

TS4-173535991-NTS

Let's Start Building
  • 2x Intel Xeon CPUs
  • Up to 8x NVIDIA A100 PCIe GPUs
  • Up to 8TB DDR4 Memory
  • Up to 96TB NVMe Storage
  •  
  •  

Not sure what you need?

Let us know what kind of project you have planned.
We can help you decide.


Tell us what you want to do.

Compare Ampere’s Powerful Features

Fabricated on the TSMC 7nm N7 manufacturing process, the NVIDIA Ampere architecture-based GA100 GPU that powers A100 includes 54.2 billiontransistors with a die size of 826 mm2.

ModelMemoryMem.
Bandwidth
CUDA
Cores
Tensor
Cores
FP32
TFLOPs
FP64
TFLOPs
TP32
TFLOPs
Ampere A100 SXM440GB HBM2e1.555 TB/s691243219.59.7156/312*
Ampere A100 PCIe40GB HBM2e1.555 TB/s691243219.59.7156/312*

NVIDIA DGX A100

The universal system for all AI workloads, offering unprecedented compute density, performance and flexibility in the world’s first 5 petaFLOPS AI system. Order yours today.

Learn More

Faster Deep Learning with Sparsity Support

New Sparsity support in A100 Tensor Cores can exploit fine-grained structured sparsity in DL networks to double the throughput of Tensor Core operations. Sparsity features are described in detail in the A100 introduces fine-grained structured Sparsity section later in this post.

The larger and faster L1 cache and shared memory unit in A100 provides 1.5x the aggregate capacity per SM compared to V100 (192 KB vs. 128 KB per SM) to deliver additional acceleration for many HPC and AI workloads.

Several other new SM features improve efficiency and programmability and reduce software complexity.


High-Performance Computing with NVIDIA A100

To unlock next-generation discoveries, scientists look to simulations to better understand complex molecules for drug discovery, physics for potential new sources of energy, and atmospheric data to better predict and prepare for extreme weather patterns.

A100 introduces double-precision Tensor Cores, providing the biggest milestone since the introduction of double-precision computing in GPUs for HPC. This enables researchers to reduce a 10-hour, double-precision simulation running on NVIDIA V100 Tensor Core GPUs to just four hours on A100. HPC applications can also leverage TF32 precision in A100’s Tensor Cores to achieve up to 10x higher throughput for single-precision dense matrix multiply operations.


Geometric mean of application speedups vs. P100: benchmark application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec], MILC [Apex Medium], NAMD [stmv_nve_cuda], Pytorch [BERT Large Fine Tuner], Quantum Espresso [AUSURF112-jR]; Random Forest FP32 [make_blobs (160000 x 64 : 10)], TensorFlow [ResNet-50], VASP 6 [Si Huge], | GPU node with dual-socket CPUs with 4x NVIDIA P100, V100, or A100 GPUs.

NVIDIA A100 Accelerates Deep Learning Training and Inference


BERT pre training throughput using Pytorch, including (2/3) Phase 1 and (1/3) Phase 2 | Phase 1 Seq Len = 128, Phase 2 Seq Len = 512; V100: NVIDIA DGX-1™ server with 8x V100 using FP32 precision; A100: DGX A100 Server with 8x A100 using TF32 precision.


BERT Large Inference | NVIDIA T4 Tensor Core GPU: NVIDIA TensorRT (TRT) 7.1, precision = INT8, batch size = 256 | V100: TRT 7.1, precision = FP16, batch size = 256 | A100 with 7 MIG instances of 1g.5gb: pre-production TRT, batch size = 94, precision = INT8 with sparsity.

A100 GPU Streaming Multiprocessor

The new streaming multiprocessor (SM) in the NVIDIA Ampere architecture-based A100 Tensor Core GPU significantly increases performance, builds upon features introduced in both the Volta and Turing SM architectures, and adds many new capabilities.

The A100 third-generation Tensor Cores enhance operand sharing and improve efficiency, and add powerful new data types, including the following:

  • TF32 Tensor Core instructions that accelerate processing of FP32 data
  • IEEE-compliant FP64 Tensor Core instructions for HPC
  • BF16 Tensor Core instructions at the same throughput as FP16
Peak FP6419.7 TFLOPs
Peak FP64 Tensor Core119.5 TFLOPS
Peak FP32119.5 TFLOPS
Peak FP16178 TFLOPs
Peak FP16139 TFLOPs
Peak FP32 Tensor Core1156 TFLOPS | 312 TFLOPS2
Peak FP16 Tensor Core1312 TFLOPS | 624 TFLOPS2
Peak BF16 Tensor Core1312 TFLOPS | 624 TFLOPS2
Peak INT8 Tensor Core1624 TFLOPS | 1,248 TFLOPS2
Peak INT4 Tensor Core11,248 TFLOPS | 2,496 TFLOPS2

Table 1. A100 Tensor Core GPU performance specs.
1) Peak rates are based on the GPU boost clock.
2) Effective TFLOPS / TOPS using the new Sparsity feature.

Exxact Systems Featuring NVIDIA Ampere GPUs Provide State of The Art Performance

3 Year Warranty

Exxact provides a 3 year warranty on all our systems. Have peace of mind, and focus on what matters most, knowing you're taken care of.

Planning & Integration

Exxact works closely with customers to build and spec a system that meets your high-performance computing and AI infrastructure needs.

System Testing & Validation

Each NVIDIA A100 GPU system is thoroughly tested and validated to ensure system reliability, and ensure performance meets benchmarked performance.


Our Partners

NVIDIA Elite
PNY Quadro
Intel Platinum
BeeGFS
Bright Premier
Panasa
Exxact TensorEX TS2-171138844-NTS 2U 2x AMD EPYC 7002-Series - 4x NVIDIA® A100 SXM4 GPUs
MPN: TS2-171138844-NTS
Exxact TensorEX TS4-168747704-NTS 4U 2x Intel Xeon processor server - 8x NVIDIA® A100 SXM4 GPUs
MPN: TS4-168747704-NTS
Exxact TensorEX TS4-173535991-NTS 4U 2x AMD EPYC processor server - 8x NVIDIA® Tesla® GPUs
MPN: TS4-173535991-NTS