Blog

Benchmarks

RTX A5500 Deep Learning Benchmarks for TensorFlow

June 6, 2022
5 min read
EXX-Blog-RTX-AA5500-Benchmarks-for-TF.png

Deep Learning Benchmarks for TensorFlow

For this blog article,

Our Deep Learning Server was fitted with eight A5500 GPUs and we ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow GitHub. We tested on the following networks: ResNet50, ResNet152, Inception v3, and Googlenet. Furthermore, we ran the same tests using 2, 4, and 8 GPU configurations with a batch size of 64 for FP32 and 128 for FP16.

Key Points and Observations

  • As expected the NVIDIA A5500 exhibits near linear scaling up to 8 GPUs.
  • By comparison, the A5500 performed slightly better than the RTX A5000
  • The RTX A5500 is expandable up to 48GB of memory using NVIDIA NVLink® to connect two GPUs and delivers up to 112 gigabytes per second (GB/s) of bandwidth
  • PCIe Gen 4: Doubles the bandwidth of the previous generation and speeds up data transfers for data-intensive tasks such as AI, data science, and creating 3D models.
  • Supports Nvidia RTX vWS (virtual workstation software) so it can deliver multiple high-performance virtual workstation instances that enable remote users to share resources.

NVIDIA RTX A5500 Specs & Highlights: 

Component

Specs

CUDA Cores

 10240

Tensor Cores & Performance

320 / 272.8 TFLOPS

RT Cores & Performance

80 / 66.6 TFLOPS

Single Precision Performance

34.1 TFLOPS

GPU Memory

24 GB GDDR6 with ECC

Memory Interface & Bandwidth

384-bit / 768 GB/sec

System Interface

PCI Express 4.0 x16

Display Connectors

4x DisplayPort 1.4a

Maximum Power Consumption 

230 W

Interested in getting faster results?

Learn more about Exxact deep learning workstations starting at $3,700

Exxact RTX A5500 Workstation System Specs: 

Make / Model

 Supermicro AS -4124GS-TN

Nodes

1

Processor / Count

2x AMD EPYC 7552

Total Logical Cores

48

Memory

DDR4 512 GB

Storage

NVMe 3.84 TB

OS

Ubuntu 18.04

CUDA Version

11.4

Tensorflow Version 

2.4

NVIDIA A5500 TensorFlow FP 16 Benchmarks

Model Type

2x GPU

4x GPU

8x GPU

ResNet 50

2342.76

4452.09

8112.51

ResNet 152

959.16

1763.99

3365.04

Inception V3

1523.05

2997.03

5322.61

Googlenet

4476.41

8597.12

15648.5

Batch Size 256 for all FP16 tests.

NVIDIA A5500 TensorFlow FP 32 Benchmarks

Model Type

2x GPU

4x GPU

8x GPU

ResNet 50

875.93

1739.51

3116.23

ResNet 152

361.05

721.42

1334.13

Inception V3

621.71

1232.18

2234.22

Googlenet

2179.04

4222.93

7626.51

Batch Size 128 for all FP32 tests.

More About NVIDIA A5500's Features

  • NVIDIA Ampere Architecture-Based CUDA Cores: Accelerate graphics workflows with the latest CUDA® cores for up to 3X single-precision floating-point (FP32) performance compared to the previous generation.
  • Second-Generation RT Cores: Produce more visually accurate renders faster with hardware-accelerated ray tracing and motion blur, with up to 2X faster performance than the previous generation.
  • Third Generation Tensor Cores: Boost AI and data science model training with up to 12X faster training performance compared to the previous generation with hardware support for structural sparsity.
  • 24GB of GPU Memory: Tackle memory-intensive workloads, from virtual production to engineering simulation, with 24GB of GDDR6 memory with ECC.
  • Third-Generation NVIDIA NVLink:  Scale memory and performance across multiple GPUs with NVIDIA® NVLink™ to tackle larger datasets, models, and scenes
  • PCI Express Gen 4: Improve data-transfer speeds from CPU memory for data-intensive tasks with support for PCIe Gen 4.
  • Power Efficiency: Leverage a dual-slot design that’s 3X more power efficient than the previous generation and is crafted to fit a wide range of workstations.

Have any questions about NVIDIA GPUs or AI workstations and servers?
Contact Exxact Today


Free Resources

Browse our whitepapers, e-books, case studies, and reference architecture.

Explore
EXX-Blog-RTX-AA5500-Benchmarks-for-TF.png
Benchmarks

RTX A5500 Deep Learning Benchmarks for TensorFlow

June 6, 20225 min read

Deep Learning Benchmarks for TensorFlow

For this blog article,

Our Deep Learning Server was fitted with eight A5500 GPUs and we ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow GitHub. We tested on the following networks: ResNet50, ResNet152, Inception v3, and Googlenet. Furthermore, we ran the same tests using 2, 4, and 8 GPU configurations with a batch size of 64 for FP32 and 128 for FP16.

Key Points and Observations

  • As expected the NVIDIA A5500 exhibits near linear scaling up to 8 GPUs.
  • By comparison, the A5500 performed slightly better than the RTX A5000
  • The RTX A5500 is expandable up to 48GB of memory using NVIDIA NVLink® to connect two GPUs and delivers up to 112 gigabytes per second (GB/s) of bandwidth
  • PCIe Gen 4: Doubles the bandwidth of the previous generation and speeds up data transfers for data-intensive tasks such as AI, data science, and creating 3D models.
  • Supports Nvidia RTX vWS (virtual workstation software) so it can deliver multiple high-performance virtual workstation instances that enable remote users to share resources.

NVIDIA RTX A5500 Specs & Highlights: 

Component

Specs

CUDA Cores

 10240

Tensor Cores & Performance

320 / 272.8 TFLOPS

RT Cores & Performance

80 / 66.6 TFLOPS

Single Precision Performance

34.1 TFLOPS

GPU Memory

24 GB GDDR6 with ECC

Memory Interface & Bandwidth

384-bit / 768 GB/sec

System Interface

PCI Express 4.0 x16

Display Connectors

4x DisplayPort 1.4a

Maximum Power Consumption 

230 W

Interested in getting faster results?

Learn more about Exxact deep learning workstations starting at $3,700

Exxact RTX A5500 Workstation System Specs: 

Make / Model

 Supermicro AS -4124GS-TN

Nodes

1

Processor / Count

2x AMD EPYC 7552

Total Logical Cores

48

Memory

DDR4 512 GB

Storage

NVMe 3.84 TB

OS

Ubuntu 18.04

CUDA Version

11.4

Tensorflow Version 

2.4

NVIDIA A5500 TensorFlow FP 16 Benchmarks

Model Type

2x GPU

4x GPU

8x GPU

ResNet 50

2342.76

4452.09

8112.51

ResNet 152

959.16

1763.99

3365.04

Inception V3

1523.05

2997.03

5322.61

Googlenet

4476.41

8597.12

15648.5

Batch Size 256 for all FP16 tests.

NVIDIA A5500 TensorFlow FP 32 Benchmarks

Model Type

2x GPU

4x GPU

8x GPU

ResNet 50

875.93

1739.51

3116.23

ResNet 152

361.05

721.42

1334.13

Inception V3

621.71

1232.18

2234.22

Googlenet

2179.04

4222.93

7626.51

Batch Size 128 for all FP32 tests.

More About NVIDIA A5500's Features

  • NVIDIA Ampere Architecture-Based CUDA Cores: Accelerate graphics workflows with the latest CUDA® cores for up to 3X single-precision floating-point (FP32) performance compared to the previous generation.
  • Second-Generation RT Cores: Produce more visually accurate renders faster with hardware-accelerated ray tracing and motion blur, with up to 2X faster performance than the previous generation.
  • Third Generation Tensor Cores: Boost AI and data science model training with up to 12X faster training performance compared to the previous generation with hardware support for structural sparsity.
  • 24GB of GPU Memory: Tackle memory-intensive workloads, from virtual production to engineering simulation, with 24GB of GDDR6 memory with ECC.
  • Third-Generation NVIDIA NVLink:  Scale memory and performance across multiple GPUs with NVIDIA® NVLink™ to tackle larger datasets, models, and scenes
  • PCI Express Gen 4: Improve data-transfer speeds from CPU memory for data-intensive tasks with support for PCIe Gen 4.
  • Power Efficiency: Leverage a dual-slot design that’s 3X more power efficient than the previous generation and is crafted to fit a wide range of workstations.

Have any questions about NVIDIA GPUs or AI workstations and servers?
Contact Exxact Today


Free Resources

Browse our whitepapers, e-books, case studies, and reference architecture.

Explore