Blog

Benchmarks
TensorFlow Benchmarks for Exxact Server Featuring NVIDIA V100S
July 8, 2020
17 min read
blog-v100s-Benchmarks.jpg

For this post, we show deep learning benchmarks for TensorFlow on an Exxact TensorEX Server. To conduct these benchmarks this deep learning server was outfitted with 4 NVIDIA V100S GPUs.

We ran the standard “tf_cnn_benchmarks.py” benchmark script from TensorFlow’s github. To compare, tests were run on the following networks: ResNet-50, ResNet-152, Inception V3, Inception V4 and googlenet. In addition we compared the FP16 to FP32 performance, and used batch size of 128 . The same tests were run using 2 and 4 GPU configurations. All benchmarks were done using ‘vanilla’ TensorFlow settings for FP16 and FP32.

Tesla GPU servers

NVIDIA V100S Deep Learning Benchmark Snapshot

As we see, running FP16 gives a great boost to performance in the overall images/sec metric. If you’re able to train using FP16 vs FP32, we recommend to do so.

V100S Benchmarks

NVIDIA V100S Deep Learning Benchmarks FP16

V100S Benchmarks AI

2 GPU img/sec4 GPU img/sec Batch Size
ResNet501735.563218128
ResNet152760.571415.56128
Inception V31134.882161.02128
Inception V4602.361205.97128
googlenet2820.475265.14128

NVIDIA V100S Deep Learning Benchmarks FP32

V100S Benchmarks Deep Learning

2 GPU img/sec4 GPU img/sec Batch Size
ResNet50762.211432.69128
ResNet152278.17577.26128
Inception V3495.51926.93128
Inception V4227.05455.65128
googlenet1692.943393.91128

System Specifications:

ModelExxact TensorEX Deep Learning Server
GPUNVIDIA Tesla V100S 32 GB PCIe
CPUIntel Xeon Silver 4116
RAM128GB DDR4
SSD (OS)120 GB
SSD (Data)1024.2 GB
OSCentOS Linux 7
NVIDIA DRIVER440.82
CUDA Version10.2
Python3.6.9
TensorFlow20.02-tf1-py3
Docker Imagenvcr.io/nvidia/tensorflow:20.02-tf1-py3

Training Parameters

Dataset:Imagenet
Mode:training
SingleSess:False
Batch Size:128
Num Batches:100
Num Epochs:0.16
Devices:[‘/gpu:0’]…(varied)
NUMA bind:False
Data format:NCHW
Optimizer:momentum
Variables:parameter_server

Interested in More Deep Learning Benchmarks?

Deep Learning Ebook Free tensorflow

Free Resources

Browse our whitepapers, e-books, case studies, and reference architecture.

Explore
blog-v100s-Benchmarks.jpg
Benchmarks
TensorFlow Benchmarks for Exxact Server Featuring NVIDIA V100S
July 8, 2020 17 min read

For this post, we show deep learning benchmarks for TensorFlow on an Exxact TensorEX Server. To conduct these benchmarks this deep learning server was outfitted with 4 NVIDIA V100S GPUs.

We ran the standard “tf_cnn_benchmarks.py” benchmark script from TensorFlow’s github. To compare, tests were run on the following networks: ResNet-50, ResNet-152, Inception V3, Inception V4 and googlenet. In addition we compared the FP16 to FP32 performance, and used batch size of 128 . The same tests were run using 2 and 4 GPU configurations. All benchmarks were done using ‘vanilla’ TensorFlow settings for FP16 and FP32.

Tesla GPU servers

NVIDIA V100S Deep Learning Benchmark Snapshot

As we see, running FP16 gives a great boost to performance in the overall images/sec metric. If you’re able to train using FP16 vs FP32, we recommend to do so.

V100S Benchmarks

NVIDIA V100S Deep Learning Benchmarks FP16

V100S Benchmarks AI

2 GPU img/sec4 GPU img/sec Batch Size
ResNet501735.563218128
ResNet152760.571415.56128
Inception V31134.882161.02128
Inception V4602.361205.97128
googlenet2820.475265.14128

NVIDIA V100S Deep Learning Benchmarks FP32

V100S Benchmarks Deep Learning

2 GPU img/sec4 GPU img/sec Batch Size
ResNet50762.211432.69128
ResNet152278.17577.26128
Inception V3495.51926.93128
Inception V4227.05455.65128
googlenet1692.943393.91128

System Specifications:

ModelExxact TensorEX Deep Learning Server
GPUNVIDIA Tesla V100S 32 GB PCIe
CPUIntel Xeon Silver 4116
RAM128GB DDR4
SSD (OS)120 GB
SSD (Data)1024.2 GB
OSCentOS Linux 7
NVIDIA DRIVER440.82
CUDA Version10.2
Python3.6.9
TensorFlow20.02-tf1-py3
Docker Imagenvcr.io/nvidia/tensorflow:20.02-tf1-py3

Training Parameters

Dataset:Imagenet
Mode:training
SingleSess:False
Batch Size:128
Num Batches:100
Num Epochs:0.16
Devices:[‘/gpu:0’]…(varied)
NUMA bind:False
Data format:NCHW
Optimizer:momentum
Variables:parameter_server

Interested in More Deep Learning Benchmarks?

Deep Learning Ebook Free tensorflow

Free Resources

Browse our whitepapers, e-books, case studies, and reference architecture.

Explore