NVIDIA A100 Ampere Solutions

Scalable Server Platforms Featuring the NVIDIA A100 Tensor Core GPU

The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing (HPC) to tackle the world’s toughest computing challenges.

Let's Start Building


As an NVIDIA Elite Partner, Exxact Corporation works closely with the NVIDIA team to ensure seamless factory development and support. We pride ourselves on providing value-added service standards unmatched by our competitors.


Exxact Systems Featuring NVIDIA Ampere GPUs Provide State of The Art Performance

Multi-Instance GPU (MIG)

With MIG, each A100 can be partitioned into as many as seven GPU instances, fully isolated and secured at the hardware level with their own high-bandwidth memory, cache, and compute cores.

Third-Generation NVLink

The third generation of NVIDIA® NVLink® in A100 doubles the GPU-to-GPU direct bandwidth to 600 gigabytes per second (GB/s), almost 10x higher than PCIe Gen4.

Smarter and Faster Memory

The A100 has significantly more on-chip memory, including a 40 megabyte (MB) level 2 cache—7x larger than the previous generation—to maximize compute performance.


Find the Right Fit for Your Needs

The Most Powerful End-to-End AI and HPC Data Center Platforms from Exxact


SXM4 Based

4x NVIDIA A100 Server
with AMD EPYC Processors

TensorEX Server

TS2-171138844-DPN

Let's Start Building
  • 2x AMD EPYC 7002 CPUs
  • 4x NVIDIA Tesla A100 SXM4-40GB + NVLink
  • Up to 4x Mellanox InfiniBand HDR 200Gbps Cards
  • Up to 8TB DDR4 Memory
  • Up to 60TB NVMe Storage

8x NVIDIA A100 Server
with AMD EPYC Processors

TensorEX Server

TS4-130921967-DPN

Let's Start Building
  • 2x AMD EPYC 7002 CPUs
  • 8x NVIDIA Tesla A100 SXM4-40GB + NVSwitch
  • Up to 9x Mellanox InfiniBand HDR 200Gbps Cards
  • Up to 4TB DDR4 Memory
  • Up to 96TB NVMe Storage

8x NVIDIA A100 Server
with Intel Xeon Processors

TensorEX Server

TS4-168747704-DPN

Let's Start Building
  • 2x Intel Xeon Scalable Family
  • 8x NVIDIA Tesla A100 SXM4-40GB + NVSwitch
  • Up to 9x Mellanox InfiniBand HDR 200Gbps Cards
  • Up to 4TB DDR4 Memory
  • Up to 96TB NVMe Storage

PCIe Based

4x NVIDIA A100 Server
with AMD EPYC Processor

TensorEX Server

TS2-158632687-DPN

Let's Start Building
  • 1x AMD EPYC 7002 CPUs
  • Up to 4x NVIDIA A100 PCIe GPUs
  • Up to 1TB DDR4 Memory
  • Up to 60TB NVMe Storage

8x NVIDIA A100 Server
with AMD EPYC Processors

TensorEX Server

TS4-173535991-DPN

Let's Start Building
  • 2x AMD EPYC 7002 CPUs
  • Up to 8x NVIDIA A100 PCIe GPUs
  • Up to 4TB DDR4 Memory
  • Up to 96TB NVMe Storage

10x NVIDIA A100 Server
with Intel Xeon Processors

TensorEX Server

TS4-133524070-DPN

Let's Start Building
  • 2x Intel Xeon CPUs
  • 8x NVIDIA A100 PCIE GPUs
  • Up to 8TB DDR4 Memory
  • Up to 96TB NVMe Storage

Not sure what you need?

Let us know what kind of project you have planned.
We can help you decide.


Tell us what you want to do.

Compare Ampere’s Powerful Features

ModelMemoryMem.
Bandwidth
CUDA
Cores
Tensor
Cores
FP32
TFLOPs
FP64
TFLOPs
TP32
TFLOPs
Explore
Ampere A100 SXM440GB HBM2e1.555 TB/s691243219.59.7156/312* -
Ampere A100 PCIe40GB HBM2e1.555 TB/s691243219.59.7156/312*Specs

What Form Factor is Right for Me?

Deep Learning Training
For the absolute fastest model training time

Deep Learning Inference
For batch and real time inference

HPC and AI
For scientific computing centers, higher ed, and research institutions

Enterprise Acceleration
Mixed workloads graphics, ML, DL, analytics, training, inference

SMX4
8-16 GPUs1 GPU w/ MIG4 GPUs with MIG for supercomputing centers1-4 GPUs with MIG for compute intensive multi GPU workloads
PCIe
4-8 GPUs1 GPU w/ MIG1-4 GPUs with MIG for higher ed and research1-4 GPUs with MIG for compute intensive single GPU workloads

NVIDIA DGX™ A100

The universal system for all AI workloads, offering unprecedented compute density, performance and flexibility in the world’s first 5 petaFLOPS AI system. Order yours today.

Learn More

Faster Deep Learning with Sparsity Support

New Sparsity support in A100 Tensor Cores can exploit fine-grained structured sparsity in DL networks to double the throughput of Tensor Core operations.

The larger and faster L1 cache and shared memory unit in A100 provides 1.5x the aggregate capacity per streaming multiprocessor (SM). SM compared to V100 (192 KB vs. 128 KB per SM) to deliver additional acceleration for many HPC and AI workloads.

Several other new SM features improve efficiency and programmability and reduce software complexity.


High-Performance Computing with NVIDIA Tesla A100

To unlock next-generation discoveries, scientists look to simulations to better understand complex molecules for drug discovery, physics for potential new sources of energy, and atmospheric data to better predict and prepare for extreme weather patterns.

A100 introduces double-precision Tensor Cores, providing the biggest milestone since the introduction of double-precision computing in GPUs for HPC. This enables researchers to reduce a 10-hour, double-precision simulation running on NVIDIA V100 Tensor Core GPUs to just four hours on A100. HPC applications can also leverage TF32 precision in A100’s Tensor Cores to achieve up to 10x higher throughput for single-precision dense matrix multiply operations.


Geometric mean of application speedups vs. P100: benchmark application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec], MILC [Apex Medium], NAMD [stmv_nve_cuda], Pytorch [BERT Large Fine Tuner], Quantum Espresso [AUSURF112-jR]; Random Forest FP32 [make_blobs (160000 x 64 : 10)], TensorFlow [ResNet-50], VASP 6 [Si Huge], | GPU node with dual-socket CPUs with 4x NVIDIA P100, V100, or A100 GPUs.

Ampere A100 Accelerates Deep Learning Training and Inference


BERT pre training throughput using Pytorch, including (2/3) Phase 1 and (1/3) Phase 2 | Phase 1 Seq Len = 128, Phase 2 Seq Len = 512; V100: NVIDIA DGX-1™ server with 8x V100 using FP32 precision; A100: DGX A100 Server with 8x A100 using TF32 precision.


BERT Large Inference | NVIDIA T4 Tensor Core GPU: NVIDIA TensorRT (TRT) 7.1, precision = INT8, batch size = 256 | V100: TRT 7.1, precision = FP16, batch size = 256 | A100 with 7 MIG instances of 1g.5gb: pre-production TRT, batch size = 94, precision = INT8 with sparsity.

A100 GPU Streaming Multiprocessor

The new streaming multiprocessor (SM) in the NVIDIA Ampere architecture-based A100 Tensor Core GPU significantly increases performance, builds upon features introduced in both the Volta and Turing SM architectures, and adds many new capabilities.

The A100 third-generation Tensor Cores enhance operand sharing, improve efficiency, and add powerful new data types including the following:


  • TF32 Tensor Core instructions that accelerate processing of FP32 data
  • IEEE-compliant FP64 Tensor Core instructions for HPC
  • BF16 Tensor Core instructions at the same throughput as FP16
Peak FP6419.7 TFLOPs
Peak FP64 Tensor Core119.5 TFLOPS
Peak FP32119.5 TFLOPS
Peak FP16178 TFLOPs
Peak FP16139 TFLOPs
Peak FP32 Tensor Core1156 TFLOPS | 312 TFLOPS2
Peak FP16 Tensor Core1312 TFLOPS | 624 TFLOPS2
Peak BF16 Tensor Core1312 TFLOPS | 624 TFLOPS2
Peak INT8 Tensor Core1624 TFLOPS | 1,248 TFLOPS2
Peak INT4 Tensor Core11,248 TFLOPS | 2,496 TFLOPS2

Table 1. A100 Tensor Core GPU performance specs.
1) Peak rates are based on the GPU boost clock.
2) Effective TFLOPS / TOPS using the new Sparsity feature.

The Most Powerful End-to-End AI and Data Center Platforms from Exxact

NVIDIA GPU Solutions

Learn More »

NVIDIA Clusters

Learn More »

Quadro RTX Solutions

Learn More »

NVIDIA Data Science Workstations

Learn More »

Explore Related Resources

Deep Learning Blog

Deep Learning Benchmarks Comparison 2019: RTX 2080 Ti vs. TITAN RTX vs. RTX 6000 vs. RTX 8000 Selecting the Right GPU for your Needs

Read the post »

Advantages of On-Premises Deep Learning and the Hidden Costs of Cloud Computing

Read the post »

HGX-2 Benchmarks for Deep Learning in TensorFlow (16x V100)

Read the post »

Hands-on TensorFlow Tutorial: Train ResNet-50 From Scratch Using the ImageNet Dataset

Read the post »

Deep Learning eBook

Get Started with Deep Learning

View the eBook »

Getting Started With AI Software

View the eBook »

Implementing AI Solutions for Every Industry

View the eBook »

The 5 Essential Steps to Get Started in AI

View the eBook »

Case Studies

Lost It! - Nutritional mindfulness promoted by robust data processing and analytics.

Read the post »

Accelerating Epigenetic Research at UPenn’s Perelman School of Medicine with Exxact’s GPU Workstations

Read the post »
Testimonial

The solution from Exxact allowed us to iterate quickly during the development of Snap It to the point where model development became interactive. We were so impressed with both the hardware and responsiveness of the support team that we recently made further investments with Exxact by purchasing a TensorEX 8-GPU turn-key deep learning solution.

Dr. Edward Lowe
Director of Data Science
Lose It!

Exxact Systems Featuring NVIDIA Ampere GPUs Provide State of The Art Performance

3 Year Warranty

Exxact provides a 3 year warranty on all our systems. Have peace of mind, and focus on what matters most, knowing you're taken care of.

Planning & Integration

Exxact works closely with customers to build and spec a system that meets your high-performance computing and AI infrastructure needs.

System Testing & Validation

Each NVIDIA A100 GPU system is thoroughly tested and validated to ensure system reliability, and ensure performance meets benchmarked expectations.


Our Partners

NVIDIA Elite
PNY Quadro
Intel Platinum
BeeGFS
Bright Premier
Panasa
Exxact TensorEX TS2-158632687-DPN 2U 2x AMD EPYC 7002 Series - Deep Learning & AI Server
MPN: TS2-158632687-DPN
Exxact TensorEX TS2-171138844-DPN 2U 2x AMD EPYC 7002-Series - Deep Learning & AI server
MPN: TS2-171138844-DPN
Exxact TensorEX TS4-130921967-DPN 4U 2x AMD EPYC 7002 Series - Deep Learning & AI server
MPN: TS4-130921967-DPN
Exxact TensorEX TS4-133524070-DPN 4U 2x Intel Xeon processor - Deep Learning & AI server
MPN: TS4-133524070-DPN
Exxact TensorEX TS4-168747704-DPN 4U 2x Intel Xeon processor - Deep Learning & AI server
MPN: TS4-168747704-DPN
Exxact TensorEX TS4-173535991-DPN 4U 2x AMD EPYC processor - Deep Learning & AI server
MPN: TS4-173535991-DPN