The new NVIDIA Volta™ GPU architecture is the new driving force behind artificial intelligence. With over 21 billion transistors, Volta is the most powerful GPU architecture the world has ever seen. It pairs NVIDIA® CUDA® and Tensor Cores to deliver the performance of an AI supercomputer in a GPU. By coupling NVIDIA Tesla Volta GPUs along with our expert systems engineering, we have created a wide portfolio of NVIDIA Tesla Volta Solutions to help fuel breakthroughs in every industry.
Volta Architecture Featuring NVIDIA Tensor Cores for Deep Learning
Tensor Cores are a key capability enabling the Volta GPU architecture to deliver the performance required to train large neural networks. Each NVIDIA Volta V100 contains 640 Tensor Cores, which are designed specifically for deep learning delivering groundbreaking performance—up to 12X higher peak teraflops (TFLOPS) for training and 6X higher peak TFLOPS for inference. This key capability enables the Volta Architecture to deliver 3X performance speedups in training and inference over the previous generation.
Mixed Precision Matrix-Matrix Multiplies are over 9x faster on Tesla V100 with CUDA 9 compared to FP32 matrix multiplies on Tesla P100 with CUDA 8
AI Training and Inferencing with NVIDIA Tesla V100
From recognizing speech to training virtual personal assistants and teaching autonomous cars to drive, data scientists are taking on increasingly complex challenges with AI. Solving these kinds of problems requires training deep learning models that are exponentially growing in complexity, in a practical amount of time. With 640 Tensor Cores, Tesla V100 is the world’s first GPU to break the 100 teraFLOPS (TFLOPS) barrier of deep learning performance. The next generation of NVIDIA NVLink™ connects multiple V100 GPUs at up to 300 GB/s to create the world’s most powerful computing servers. AI models that would consume weeks of computing resources on previous systems can now be trained in a few days. With this dramatic reduction in training time, a whole new world of problems will now be solvable with AI.
Server Config: Dual Xeon E5-2699 v4, 2.6GHz | 8x Tesla K80, Tesla P100 or Tesla V100 | V100 performance measures on pre-production hardware. | ResNet-50 Training on Microsoft Cognitive Toolkit for 90 Epochs with 1.28M ImageNet dataset