Blog

Benchmarks

TensorFlow Benchmarks for Exxact Server Featuring NVIDIA V100S

July 8, 2020
4 min read
blog-v100s-Benchmarks.jpg

For this post, we show deep learning benchmarks for TensorFlow on an Exxact TensorEX Server. To conduct these benchmarks this deep learning server was outfitted with 4 NVIDIA V100S GPUs.

We ran the standard “tf_cnn_benchmarks.py” benchmark script from TensorFlow’s github. To compare, tests were run on the following networks: ResNet-50, ResNet-152, Inception V3, Inception V4 and googlenet. In addition we compared the FP16 to FP32 performance, and used batch size of 128 . The same tests were run using 2 and 4 GPU configurations. All benchmarks were done using ‘vanilla’ TensorFlow settings for FP16 and FP32.

Tesla GPU servers

NVIDIA V100S Deep Learning Benchmark Snapshot

As we see, running FP16 gives a great boost to performance in the overall images/sec metric. If you’re able to train using FP16 vs FP32, we recommend to do so.

V100S Benchmarks

NVIDIA V100S Deep Learning Benchmarks FP16

V100S Benchmarks AI

2 GPU img/sec4 GPU img/sec Batch Size
ResNet501735.563218128
ResNet152760.571415.56128
Inception V31134.882161.02128
Inception V4602.361205.97128
googlenet2820.475265.14128

NVIDIA V100S Deep Learning Benchmarks FP32

V100S Benchmarks Deep Learning

2 GPU img/sec4 GPU img/sec Batch Size
ResNet50762.211432.69128
ResNet152278.17577.26128
Inception V3495.51926.93128
Inception V4227.05455.65128
googlenet1692.943393.91128

System Specifications:

ModelExxact TensorEX Deep Learning Server
GPUNVIDIA Tesla V100S 32 GB PCIe
CPUIntel Xeon Silver 4116
RAM128GB DDR4
SSD (OS)120 GB
SSD (Data)1024.2 GB
OSCentOS Linux 7
NVIDIA DRIVER440.82
CUDA Version10.2
Python3.6.9
TensorFlow20.02-tf1-py3
Docker Imagenvcr.io/nvidia/tensorflow:20.02-tf1-py3

Training Parameters

Dataset:Imagenet
Mode:training
SingleSess:False
Batch Size:128
Num Batches:100
Num Epochs:0.16
Devices:[‘/gpu:0’]…(varied)
NUMA bind:False
Data format:NCHW
Optimizer:momentum
Variables:parameter_server

Interested in More Deep Learning Benchmarks?

Deep Learning Ebook Free tensorflow

Free Resources

Browse our whitepapers, e-books, case studies, and reference architecture.

Explore
blog-v100s-Benchmarks.jpg
Benchmarks

TensorFlow Benchmarks for Exxact Server Featuring NVIDIA V100S

July 8, 2020 4 min read

For this post, we show deep learning benchmarks for TensorFlow on an Exxact TensorEX Server. To conduct these benchmarks this deep learning server was outfitted with 4 NVIDIA V100S GPUs.

We ran the standard “tf_cnn_benchmarks.py” benchmark script from TensorFlow’s github. To compare, tests were run on the following networks: ResNet-50, ResNet-152, Inception V3, Inception V4 and googlenet. In addition we compared the FP16 to FP32 performance, and used batch size of 128 . The same tests were run using 2 and 4 GPU configurations. All benchmarks were done using ‘vanilla’ TensorFlow settings for FP16 and FP32.

Tesla GPU servers

NVIDIA V100S Deep Learning Benchmark Snapshot

As we see, running FP16 gives a great boost to performance in the overall images/sec metric. If you’re able to train using FP16 vs FP32, we recommend to do so.

V100S Benchmarks

NVIDIA V100S Deep Learning Benchmarks FP16

V100S Benchmarks AI

2 GPU img/sec4 GPU img/sec Batch Size
ResNet501735.563218128
ResNet152760.571415.56128
Inception V31134.882161.02128
Inception V4602.361205.97128
googlenet2820.475265.14128

NVIDIA V100S Deep Learning Benchmarks FP32

V100S Benchmarks Deep Learning

2 GPU img/sec4 GPU img/sec Batch Size
ResNet50762.211432.69128
ResNet152278.17577.26128
Inception V3495.51926.93128
Inception V4227.05455.65128
googlenet1692.943393.91128

System Specifications:

ModelExxact TensorEX Deep Learning Server
GPUNVIDIA Tesla V100S 32 GB PCIe
CPUIntel Xeon Silver 4116
RAM128GB DDR4
SSD (OS)120 GB
SSD (Data)1024.2 GB
OSCentOS Linux 7
NVIDIA DRIVER440.82
CUDA Version10.2
Python3.6.9
TensorFlow20.02-tf1-py3
Docker Imagenvcr.io/nvidia/tensorflow:20.02-tf1-py3

Training Parameters

Dataset:Imagenet
Mode:training
SingleSess:False
Batch Size:128
Num Batches:100
Num Epochs:0.16
Devices:[‘/gpu:0’]…(varied)
NUMA bind:False
Data format:NCHW
Optimizer:momentum
Variables:parameter_server

Interested in More Deep Learning Benchmarks?

Deep Learning Ebook Free tensorflow

Free Resources

Browse our whitepapers, e-books, case studies, and reference architecture.

Explore