Blog

Benchmarks

NVIDIA A4500 Deep Learning Benchmarks for TensorFlow

February 9, 2022
6 min read
EXX-Blog-NVIDIA-RTX-A4500-Deep-Learning-Benchmark.jpg

NVIDIA RTX A4500 Benchmarks

For this blog article, we conducted deep learning performance benchmarks for TensorFlow on NVIDIA A4500 GPUs.

Our Deep Learning Server was fitted with eight A4500 GPUs and we ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow github. We tested on the following networks: ResNet50, ResNet152, Inception v3, and Googlenet. Furthermore, we ran the same tests using 2, 4, and 8 GPU configurations with a batch size of 64 for FP32 and 128 for FP16.

Key Points and Observations

  • The NVIDIA A4500 exhibits near linear scaling up to 8 GPUs.
  • The RTX A4500 is expandable up to 40GB of memory using NVIDIA NVLink® to connect two GPUs and delivers up to 112 gigabytes per second (GB/s) of bandwidth.
  • PCIe Gen 4: Doubles the bandwidth of the previous generation and speeds up data transfers for data-intensive tasks such as AI, data science, and creating 3D models.
  • Supports NVIDIA RTX vWS (virtual workstation software) so it can deliver multiple high-performance virtual workstation instances that enable remote users to share resources.

NVIDIA RTX A4500 Highlights

CUDA Cores7168
Tensor Cores & Performance224 / 189.2 TFLOPS
RT Cores & Performance56 / 46.2 TFLOPS
Single Precision Performance23.7 TFLOPS
GPU Memory20 GB GDDR6 with ECC
Memory Interface & Bandwidth320-bit / 640 GB/sec
System InterfacePCI Express 4.0 x16
Display Connectors4x DisplayPort 1.4a
Maximum Power Consumption

200 W


Interested in getting faster results?
Learn more about Exxact deep learning workstations starting around $5,500

Exxact RTX A4500 Workstation System Specs

Make / ModelAS-4124GS-TN
Nodes1
Processor / Count2x AMD EPYC 7552
Total Logical Cores48
MemoryDDR4 512 GB
StorageNVMe 3.84 TB
OSUbuntu 18.04
CUDA Version11.4
TensorFlow Version

2.4


NVIDIA A4500 TensorFlow FP 16 Benchmarks

fp16.png

Model Type2x GPU4x GPU8x GPU
ResNet 50873.933343.45860.13
ResNet 152740.141390.52306.82
Inception V31130.262206.953794.27
Googlenet3345.56001.7911161.05

Batch Size 128 for all FP16 tests.


NVIDIA A4500 TensorFlow FP 32 Benchmarks

fp32.png

Model Type2x GPU4x GPU8x GPU
ResNet 50688.591295.232252.24
ReesNet 152290.41536.55945.84
Inception V3484.38893.151603.48
Googlenet1687.883081.525475.36

Batch Size 64 for all FP32 tests.


More About NVIDIA A4500's Features

  • NVIDIA Ampere Architecture-Based CUDA Cores: Accelerate graphics workflows with the latest CUDA® cores for up to 2X single-precision floating-point (FP32) performance compared to the previous generation architecture.
  • Second-Generation RT Cores: Produce more visually accurate renders faster with hardware-accelerated ray tracing and motion blur, with up to 2X faster performance than the previous generation architecture.
  • Third Generation Tensor Cores: Boost AI and data science model training performance compared to the previous generation with support for TF32 precision and structural sparsity while bringing advanced capabilities like AI denoising, DLSS, and more to graphics workflows.
  • 20GB of GPU Memory: Tackle memory-intensive workloads, from virtual production to engineering simulation, with 20 GB of GDDR6 memory with ECC.
  • Third-Generation NVIDIA NVLink: Scale memory and performance across multiple GPUs with NVIDIA® NVLink to tackle larger datasets, models, and scenes.
  • PCI Express Gen 4: Improve data-transfer speeds from CPU memory for data-intensive tasks with support for PCI Express Gen 4.
  • Power Efficiency: Leverage a dual-slot, power- efficient design crafted to fit a wide range of workstations.

Have any questions about NVIDIA GPUs or AI workstations and servers?
Contact Exxact Today


Free Resources

Browse our whitepapers, e-books, case studies, and reference architecture.

Explore
EXX-Blog-NVIDIA-RTX-A4500-Deep-Learning-Benchmark.jpg
Benchmarks

NVIDIA A4500 Deep Learning Benchmarks for TensorFlow

February 9, 2022 6 min read

NVIDIA RTX A4500 Benchmarks

For this blog article, we conducted deep learning performance benchmarks for TensorFlow on NVIDIA A4500 GPUs.

Our Deep Learning Server was fitted with eight A4500 GPUs and we ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow github. We tested on the following networks: ResNet50, ResNet152, Inception v3, and Googlenet. Furthermore, we ran the same tests using 2, 4, and 8 GPU configurations with a batch size of 64 for FP32 and 128 for FP16.

Key Points and Observations

  • The NVIDIA A4500 exhibits near linear scaling up to 8 GPUs.
  • The RTX A4500 is expandable up to 40GB of memory using NVIDIA NVLink® to connect two GPUs and delivers up to 112 gigabytes per second (GB/s) of bandwidth.
  • PCIe Gen 4: Doubles the bandwidth of the previous generation and speeds up data transfers for data-intensive tasks such as AI, data science, and creating 3D models.
  • Supports NVIDIA RTX vWS (virtual workstation software) so it can deliver multiple high-performance virtual workstation instances that enable remote users to share resources.

NVIDIA RTX A4500 Highlights

CUDA Cores7168
Tensor Cores & Performance224 / 189.2 TFLOPS
RT Cores & Performance56 / 46.2 TFLOPS
Single Precision Performance23.7 TFLOPS
GPU Memory20 GB GDDR6 with ECC
Memory Interface & Bandwidth320-bit / 640 GB/sec
System InterfacePCI Express 4.0 x16
Display Connectors4x DisplayPort 1.4a
Maximum Power Consumption

200 W


Interested in getting faster results?
Learn more about Exxact deep learning workstations starting around $5,500

Exxact RTX A4500 Workstation System Specs

Make / ModelAS-4124GS-TN
Nodes1
Processor / Count2x AMD EPYC 7552
Total Logical Cores48
MemoryDDR4 512 GB
StorageNVMe 3.84 TB
OSUbuntu 18.04
CUDA Version11.4
TensorFlow Version

2.4


NVIDIA A4500 TensorFlow FP 16 Benchmarks

fp16.png

Model Type2x GPU4x GPU8x GPU
ResNet 50873.933343.45860.13
ResNet 152740.141390.52306.82
Inception V31130.262206.953794.27
Googlenet3345.56001.7911161.05

Batch Size 128 for all FP16 tests.


NVIDIA A4500 TensorFlow FP 32 Benchmarks

fp32.png

Model Type2x GPU4x GPU8x GPU
ResNet 50688.591295.232252.24
ReesNet 152290.41536.55945.84
Inception V3484.38893.151603.48
Googlenet1687.883081.525475.36

Batch Size 64 for all FP32 tests.


More About NVIDIA A4500's Features

  • NVIDIA Ampere Architecture-Based CUDA Cores: Accelerate graphics workflows with the latest CUDA® cores for up to 2X single-precision floating-point (FP32) performance compared to the previous generation architecture.
  • Second-Generation RT Cores: Produce more visually accurate renders faster with hardware-accelerated ray tracing and motion blur, with up to 2X faster performance than the previous generation architecture.
  • Third Generation Tensor Cores: Boost AI and data science model training performance compared to the previous generation with support for TF32 precision and structural sparsity while bringing advanced capabilities like AI denoising, DLSS, and more to graphics workflows.
  • 20GB of GPU Memory: Tackle memory-intensive workloads, from virtual production to engineering simulation, with 20 GB of GDDR6 memory with ECC.
  • Third-Generation NVIDIA NVLink: Scale memory and performance across multiple GPUs with NVIDIA® NVLink to tackle larger datasets, models, and scenes.
  • PCI Express Gen 4: Improve data-transfer speeds from CPU memory for data-intensive tasks with support for PCI Express Gen 4.
  • Power Efficiency: Leverage a dual-slot, power- efficient design crafted to fit a wide range of workstations.

Have any questions about NVIDIA GPUs or AI workstations and servers?
Contact Exxact Today


Free Resources

Browse our whitepapers, e-books, case studies, and reference architecture.

Explore