Benchmarks

NVIDIA RTX A4000, A5000 and A6000 Comparison: Deep Learning Benchmarks for TensorFlow

August 12, 2021
7 min read
blog-rtx-a-series-tensorflow_(1).jpg

Deep Learning Benchmarks for TensorFlow

For this blog article, we conducted deep learning performance benchmarks for TensorFlow comparing the NVIDIA RTX A4000 to NVIDIA RTX A5000 and A6000 GPUs.

Our Deep Learning Server was fitted with four RTX A4000 GPUs and we ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow GitHub. We tested on the following networks: ResNet50, ResNet152, Inception v3, and Inception v4. Furthermore, we ran the same tests using 1, 2, and 4 GPU configurations with largest batch size for FP16.

Key Points and Observations

  • In our benchmarks the NVIDIA RTX A4000 exhibits near linear scaling up to 4 GPUs.
  • The RTX A4000 GPU memory foot print of 16GB limited our batch size to 256 against much larger datasets.
  • Powered by the NVIDIA Ampere architecture, the RTX A4000 features 3rd gen Tensor Core, 2nd gen RT Cores. Plus 16GB of superfast GDDR6X memory, as well as Single Slot form factor and PCI Express Gen 4.
  • The A4000 offers full support for all our latest and greatest technologies.

NVIDIA RTX A4000 Highlights:

Spec
CUDA Cores6144
Tensor Cores192
RT Cores48
Single Precision Performance19.2 TFLOPS
RT Core Performance37.4 TFLOPS
Tensor Performance153.4 TFLOPS
GPU Memory16 GB GDDR6 with ECC
Memory Interface256-bit
Memory Bandwidth448 GB/sec



Interested in getting faster results?
Learn more about Exxact deep learning workstations starting at $3,700



Exxact RTX A4000 Workstation System Specs:

Spec
Nodes1
Processor / Count2x AMD EPYC 7552
Total Logical Cores48
MemoryDDR4 512 GB
StorageNVMe 3.84 TB
OSUbuntu 18.04
CUDA Version11.2
Tensorflow Version

2.40

TensorFlow FP16 Benchmarks comparison on ResNet50

Screen_Shot_2021-08-05_at_1.55.46_PM.png

GPUType1x GPU2x GPUBatch Size
RTX A4000684.56844.57256
RTX A50001096.252154.67512
RTX A60001145.62404.771024

Largest Batch Size for all FP16 tests.

TensorFlow FP16 Benchmarks comparison on ResNet152

GPU Type1x GPU2x GPUBatch Size
RTX A4000303.87551.86128
RTX A5000450.79869.87256
RTX A6000605.671128.83512

Largest Batch Size for all FP16 tests.

TensorFlow FP16 Benchmarks comparison on Inception3

Screen_Shot_2021-08-05_at_2.04.13_PM.png

GPU Type1x GPU2x GPUBatch Size
RTX A4000488.24943.46256
RTX A5000724.441353.78256

Largest Batch Size for all FP16 tests.

TensorFlow FP16 Benchmarks comparison on Inception4

GPU Type1x GPU2x GPUBatch Size
RTX A4000245.6451.38128
RTX A5000383.13737.86256

Largest Batch Size for all FP16 tests.

4x GPU TensorFlow FP16 Benchmarks comparison on ResNet50

GPU Type1x GPU2x GPU4x GPUBatch Size
RTX A4000684.56844.571482.31256
RTX A50001096.252154.673107.32512

Largest Batch Size for all FP16 tests.

NVIDIA RTX (Ampere) / Quadro RTX (Turing) comparison

RTX A4000RTX A5000RTX A6000Quadro RTX 4000Quadro RTX 5000Quadro RTX 6000
ArchitectureAmpereAmpereAmpereTuringTuringTuring
GPU memory16 GB GDDR624 GB GDDR648 GB GDDR68 GB GDDR616 GB GDDR624 GB GDDR6
ECC memoryYesYesYesNoYesYes
CUDA cores6,1448,19210,75223043,0724,608
Tensor Cores192256336288384576
RT Cores486484364872
SP perf19.2 TFLOPS27.8 TFLOPS38.7 TFLOPS7.1 TFLOPS11.2 TFLOPS16.3 TFLOPS
RT Core perf
37.4 TFLOPS54.2 TFLOPS75.6 TFLOPSN/AN/AN/A
Tensor perf
153.4 TFLOPS222.2 TFLOPS309.7 TFLOPS57.0 TFLOPS89.2 TFLOPS130.5 TFLOPS
Max Power140W230W300W160W265W295W
Graphic busPCI-E 4.0 x16PCI-E 4.0 x16PCI-E 4.0 x16PCI-E 3.0 x16PCI-E 3.0 x16PCI-E 3.0 x16
ConnectorsDP 1.4 (4)DP 1.4 (4)DP 1.4 (4)DP 1.4 (3), USB-CDP 1.4 (4), USB-CDP 1.4 (4), USB-C
Form FactorSingle slotDual SlotDual SlotSingle slotDual SlotDual Slot
vGPU Software
N/ANVIDIA RTX vWSNVIDIA RTX vWSN/AN/ANVIDIA RTX vWS
NvlinkN/A2x RTX A50002x RTX A6000N/A2x RTX 50002x RTX 6000
Power Connector1x 6-pin PCIe1x 8-pin PCIe1x 8-pin CPU1x 6-pin PCIe1x 8-pin PCIe2x 8-pin PCIe

Additional GPU Benchmarks

Final Thoughts for NVIDIA RTX A4000

The NVIDIA RTX A4000 is the most powerful single-slot GPU for professionals, delivering real-time ray tracing, AI-accelerated compute, and high-performance graphics performance to your desktop. 

Built on the NVIDIA Ampere architecture, the RTX A4000 combines 48 second-generation RT Cores, 192 third-generation Tensor Cores, and 6144 CUDA cores with 16 GB of graphics memory. So you can engineer products and solutions from your desktop workstation.


Have any questions?
Contact Exxact Today


Topics

blog-rtx-a-series-tensorflow_(1).jpg
Benchmarks

NVIDIA RTX A4000, A5000 and A6000 Comparison: Deep Learning Benchmarks for TensorFlow

August 12, 20217 min read

Deep Learning Benchmarks for TensorFlow

For this blog article, we conducted deep learning performance benchmarks for TensorFlow comparing the NVIDIA RTX A4000 to NVIDIA RTX A5000 and A6000 GPUs.

Our Deep Learning Server was fitted with four RTX A4000 GPUs and we ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow GitHub. We tested on the following networks: ResNet50, ResNet152, Inception v3, and Inception v4. Furthermore, we ran the same tests using 1, 2, and 4 GPU configurations with largest batch size for FP16.

Key Points and Observations

  • In our benchmarks the NVIDIA RTX A4000 exhibits near linear scaling up to 4 GPUs.
  • The RTX A4000 GPU memory foot print of 16GB limited our batch size to 256 against much larger datasets.
  • Powered by the NVIDIA Ampere architecture, the RTX A4000 features 3rd gen Tensor Core, 2nd gen RT Cores. Plus 16GB of superfast GDDR6X memory, as well as Single Slot form factor and PCI Express Gen 4.
  • The A4000 offers full support for all our latest and greatest technologies.

NVIDIA RTX A4000 Highlights:

Spec
CUDA Cores6144
Tensor Cores192
RT Cores48
Single Precision Performance19.2 TFLOPS
RT Core Performance37.4 TFLOPS
Tensor Performance153.4 TFLOPS
GPU Memory16 GB GDDR6 with ECC
Memory Interface256-bit
Memory Bandwidth448 GB/sec



Interested in getting faster results?
Learn more about Exxact deep learning workstations starting at $3,700



Exxact RTX A4000 Workstation System Specs:

Spec
Nodes1
Processor / Count2x AMD EPYC 7552
Total Logical Cores48
MemoryDDR4 512 GB
StorageNVMe 3.84 TB
OSUbuntu 18.04
CUDA Version11.2
Tensorflow Version

2.40

TensorFlow FP16 Benchmarks comparison on ResNet50

Screen_Shot_2021-08-05_at_1.55.46_PM.png

GPUType1x GPU2x GPUBatch Size
RTX A4000684.56844.57256
RTX A50001096.252154.67512
RTX A60001145.62404.771024

Largest Batch Size for all FP16 tests.

TensorFlow FP16 Benchmarks comparison on ResNet152

GPU Type1x GPU2x GPUBatch Size
RTX A4000303.87551.86128
RTX A5000450.79869.87256
RTX A6000605.671128.83512

Largest Batch Size for all FP16 tests.

TensorFlow FP16 Benchmarks comparison on Inception3

Screen_Shot_2021-08-05_at_2.04.13_PM.png

GPU Type1x GPU2x GPUBatch Size
RTX A4000488.24943.46256
RTX A5000724.441353.78256

Largest Batch Size for all FP16 tests.

TensorFlow FP16 Benchmarks comparison on Inception4

GPU Type1x GPU2x GPUBatch Size
RTX A4000245.6451.38128
RTX A5000383.13737.86256

Largest Batch Size for all FP16 tests.

4x GPU TensorFlow FP16 Benchmarks comparison on ResNet50

GPU Type1x GPU2x GPU4x GPUBatch Size
RTX A4000684.56844.571482.31256
RTX A50001096.252154.673107.32512

Largest Batch Size for all FP16 tests.

NVIDIA RTX (Ampere) / Quadro RTX (Turing) comparison

RTX A4000RTX A5000RTX A6000Quadro RTX 4000Quadro RTX 5000Quadro RTX 6000
ArchitectureAmpereAmpereAmpereTuringTuringTuring
GPU memory16 GB GDDR624 GB GDDR648 GB GDDR68 GB GDDR616 GB GDDR624 GB GDDR6
ECC memoryYesYesYesNoYesYes
CUDA cores6,1448,19210,75223043,0724,608
Tensor Cores192256336288384576
RT Cores486484364872
SP perf19.2 TFLOPS27.8 TFLOPS38.7 TFLOPS7.1 TFLOPS11.2 TFLOPS16.3 TFLOPS
RT Core perf
37.4 TFLOPS54.2 TFLOPS75.6 TFLOPSN/AN/AN/A
Tensor perf
153.4 TFLOPS222.2 TFLOPS309.7 TFLOPS57.0 TFLOPS89.2 TFLOPS130.5 TFLOPS
Max Power140W230W300W160W265W295W
Graphic busPCI-E 4.0 x16PCI-E 4.0 x16PCI-E 4.0 x16PCI-E 3.0 x16PCI-E 3.0 x16PCI-E 3.0 x16
ConnectorsDP 1.4 (4)DP 1.4 (4)DP 1.4 (4)DP 1.4 (3), USB-CDP 1.4 (4), USB-CDP 1.4 (4), USB-C
Form FactorSingle slotDual SlotDual SlotSingle slotDual SlotDual Slot
vGPU Software
N/ANVIDIA RTX vWSNVIDIA RTX vWSN/AN/ANVIDIA RTX vWS
NvlinkN/A2x RTX A50002x RTX A6000N/A2x RTX 50002x RTX 6000
Power Connector1x 6-pin PCIe1x 8-pin PCIe1x 8-pin CPU1x 6-pin PCIe1x 8-pin PCIe2x 8-pin PCIe

Additional GPU Benchmarks

Final Thoughts for NVIDIA RTX A4000

The NVIDIA RTX A4000 is the most powerful single-slot GPU for professionals, delivering real-time ray tracing, AI-accelerated compute, and high-performance graphics performance to your desktop. 

Built on the NVIDIA Ampere architecture, the RTX A4000 combines 48 second-generation RT Cores, 192 third-generation Tensor Cores, and 6144 CUDA cores with 16 GB of graphics memory. So you can engineer products and solutions from your desktop workstation.


Have any questions?
Contact Exxact Today


Topics