Benchmarks

NVIDIA A30 Deep Learning Benchmarks for TensorFlow

June 3, 2021
4 min read
A30_GPU.jpg

NVIDIA A30 Benchmarks

For this blog article, we conducted deep learning performance benchmarks for TensorFlow on NVIDIA A30 GPUs.

Our Deep Learning Server was fitted with eight A30 GPUs and we ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow github. We tested on the following networks: ResNet50, ResNet152, Inception v3, and Googlenet. Furthermore, we ran the same tests using 1, 2, 4, and 8 GPU configurations with a batch size of 128 for FP32 and 256 for FP16.

Key Points and Observations

  • The NVIDIA A30 exhibits near linear scaling up to 8 GPUs
  • The NVIDIA A30 is a well rounded GPU for most deep learning applications. 
  • For those not needing the full compute power of A100, you should consider the A30 as an option.
  • Since the A30 is FP64 capable, it may also be well suited for other HPC applications. 

Interested in getting faster results?
Learn more about Exxact deep learning workstations starting at $3,700


NVIDIA A30 TensorFlow FP 16 Benchmarks

NVIDIA A30 Benchmark FP16
1x GPU 2x GPU 4x GPU 8x GPU
ResNet 50 1,110 2,369 4,646 8,975
ResNet 152 498 1,004 2,015 3,818
Inception V3 725 1,390 2,799 5,455
googlenet 2,010 3,896 7,960 14,545

Batch Size 256 for all FP16 tests.


NVIDIA A30 TensorFlow FP 32 Benchmarks

FP32 NVIDIA A30 Benchmarks
1x GPU 2x GPU 4x GPU 8x GPU
ResNet 50 463 915 1,787 3,458
ResNet 152 206 402 786 1,509
Inception V3 319 636 1,244 2,395
googlenet 1,060 2,100 3,983 7,690

Batch Size 128 for all FP32 tests.

More About NVIDIA A30's Features

  • NVIDIA A30 Tensor Cores with Tensor Float (TF32) provide up to 10X higher performance over the NVIDIA T4 with zero code changes and an additional 2X boost with automatic mixed precision and FP16, delivering a combined 20X throughput increase. When combined with NVIDIA® NVLink®, PCIe Gen4, NVIDIA networking, and the NVIDIA Magnum IO™ SDK, it’s possible to scale to thousands of GPUs. 
  • Tensor Cores and MIG enable A30 to be used for workloads dynamically throughout the day. It can be used for production inference at peak demand, and part of the GPU can be repurposed to rapidly re-train those very same models during off-peak hours.
  • The NVIDIA A30 GPU delivers a versatile platform for mainstream enterprise workloads, like AI inference, training, and HPC. With TF32 and FP64 Tensor Core support, as well as an end-to-end software and hardware solution stack, A30 ensures that mainstream AI training and HPC applications can be rapidly addressed.

Have any questions about NVIDIA GPUs or AI workstations and servers?
Contact Exxact Today


Topics

A30_GPU.jpg
Benchmarks

NVIDIA A30 Deep Learning Benchmarks for TensorFlow

June 3, 20214 min read

NVIDIA A30 Benchmarks

For this blog article, we conducted deep learning performance benchmarks for TensorFlow on NVIDIA A30 GPUs.

Our Deep Learning Server was fitted with eight A30 GPUs and we ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow github. We tested on the following networks: ResNet50, ResNet152, Inception v3, and Googlenet. Furthermore, we ran the same tests using 1, 2, 4, and 8 GPU configurations with a batch size of 128 for FP32 and 256 for FP16.

Key Points and Observations

  • The NVIDIA A30 exhibits near linear scaling up to 8 GPUs
  • The NVIDIA A30 is a well rounded GPU for most deep learning applications. 
  • For those not needing the full compute power of A100, you should consider the A30 as an option.
  • Since the A30 is FP64 capable, it may also be well suited for other HPC applications. 

Interested in getting faster results?
Learn more about Exxact deep learning workstations starting at $3,700


NVIDIA A30 TensorFlow FP 16 Benchmarks

NVIDIA A30 Benchmark FP16
1x GPU 2x GPU 4x GPU 8x GPU
ResNet 50 1,110 2,369 4,646 8,975
ResNet 152 498 1,004 2,015 3,818
Inception V3 725 1,390 2,799 5,455
googlenet 2,010 3,896 7,960 14,545

Batch Size 256 for all FP16 tests.


NVIDIA A30 TensorFlow FP 32 Benchmarks

FP32 NVIDIA A30 Benchmarks
1x GPU 2x GPU 4x GPU 8x GPU
ResNet 50 463 915 1,787 3,458
ResNet 152 206 402 786 1,509
Inception V3 319 636 1,244 2,395
googlenet 1,060 2,100 3,983 7,690

Batch Size 128 for all FP32 tests.

More About NVIDIA A30's Features

  • NVIDIA A30 Tensor Cores with Tensor Float (TF32) provide up to 10X higher performance over the NVIDIA T4 with zero code changes and an additional 2X boost with automatic mixed precision and FP16, delivering a combined 20X throughput increase. When combined with NVIDIA® NVLink®, PCIe Gen4, NVIDIA networking, and the NVIDIA Magnum IO™ SDK, it’s possible to scale to thousands of GPUs. 
  • Tensor Cores and MIG enable A30 to be used for workloads dynamically throughout the day. It can be used for production inference at peak demand, and part of the GPU can be repurposed to rapidly re-train those very same models during off-peak hours.
  • The NVIDIA A30 GPU delivers a versatile platform for mainstream enterprise workloads, like AI inference, training, and HPC. With TF32 and FP64 Tensor Core support, as well as an end-to-end software and hardware solution stack, A30 ensures that mainstream AI training and HPC applications can be rapidly addressed.

Have any questions about NVIDIA GPUs or AI workstations and servers?
Contact Exxact Today


Topics