Benchmarks

NVIDIA A30 Deep Learning Benchmarks for TensorFlow

June 3, 2021

4 min read

NVIDIA A30 Benchmarks

For this blog article, we conducted deep learning performance benchmarks for TensorFlow on NVIDIA A30 GPUs.

Our Deep Learning Server was fitted with eight A30 GPUs and we ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow github. We tested on the following networks: ResNet50, ResNet152, Inception v3, and Googlenet. Furthermore, we ran the same tests using 1, 2, 4, and 8 GPU configurations with a batch size of 128 for FP32 and 256 for FP16.

Key Points and Observations

The NVIDIA A30 exhibits near linear scaling up to 8 GPUs
The NVIDIA A30 is a well rounded GPU for most deep learning applications.
For those not needing the full compute power of A100, you should consider the A30 as an option.
Since the A30 is FP64 capable, it may also be well suited for other HPC applications.

Interested in getting faster results?
Learn more about Exxact deep learning workstations starting at $3,700

NVIDIA A30 TensorFlow FP 16 Benchmarks

	1x GPU	2x GPU	4x GPU	8x GPU
ResNet 50	1,110	2,369	4,646	8,975
ResNet 152	498	1,004	2,015	3,818
Inception V3	725	1,390	2,799	5,455
googlenet	2,010	3,896	7,960	14,545

Batch Size 256 for all FP16 tests.

NVIDIA A30 TensorFlow FP 32 Benchmarks

	1x GPU	2x GPU	4x GPU	8x GPU
ResNet 50	463	915	1,787	3,458
ResNet 152	206	402	786	1,509
Inception V3	319	636	1,244	2,395
googlenet	1,060	2,100	3,983	7,690

Batch Size 128 for all FP32 tests.

More About NVIDIA A30's Features

NVIDIA A30 Tensor Cores with Tensor Float (TF32) provide up to 10X higher performance over the NVIDIA T4 with zero code changes and an additional 2X boost with automatic mixed precision and FP16, delivering a combined 20X throughput increase. When combined with NVIDIA® NVLink®, PCIe Gen4, NVIDIA networking, and the NVIDIA Magnum IO™ SDK, it’s possible to scale to thousands of GPUs.
Tensor Cores and MIG enable A30 to be used for workloads dynamically throughout the day. It can be used for production inference at peak demand, and part of the GPU can be repurposed to rapidly re-train those very same models during off-peak hours.
The NVIDIA A30 GPU delivers a versatile platform for mainstream enterprise workloads, like AI inference, training, and HPC. With TF32 and FP64 Tensor Core support, as well as an end-to-end software and hardware solution stack, A30 ensures that mainstream AI training and HPC applications can be rapidly addressed.

Have any questions about NVIDIA GPUs or AI workstations and servers?
Contact Exxact Today

Topics

Have any questions?

Benchmarks