HOROVOD


Scaling Deep Learning Across Multiple GPU's with Horovod

Horovod is a distributed training framework for TensorFlow, Keras, and PyTorch. The goal of Horovod is to make distributed Deep Learning fast and easy to use.

Horovod was created to alleviate existing methods for enabling multi-GPU training in the TensorFlow library, which entailed non-negligible communication overhead and also required users to heavily modify their existing model-building code, which lead researchers to avoid multi GPU training. The developers alleviates these problems by, in their words, "employing efficient inter-GPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow".



HOROVOD VS DISTRIBUTED TENSORFLOW

The primary motivation for Horovod is to make it easy to take a single-GPU TensorFlow program and successfully train it on a multi-GPU platform.

The Developers of Horovod define these two aspects:

  • How much modifications does one have to make to a program to make it distributed, and how easy is it to run it.
  • How much faster would it run in distributed mode?

The Reserachers found the MPI model to be much more straightforward and require far less code changes than the Distributed TensorFlow with parameter servers.


Horovod Benchmarks


DISTRIBUTED TENSORFLOW VS HOROVOD

Training with synthetic data on NVIDIA Pascal GPUs.




A comparison of images processed per second with standard distributed TensorFlow and Horovod when running a distributed training job over different numbers of NVIDIA Pascal GPUs for Inception V3 and ResNet-101 TensorFlow models over 25GbE TCP. (Source: Sergeev, A., Del Balso, M. (2018) Horovod: fast and easy distributed deep learning in TensorFlow, Figure 6.)

SCALING FURTHER WITH HOROVOD

Training with synthetic data on NVIDIA Pascal GPUs.




A comparison of the images processed per second of the Horovod over plain 25GbE TCP and the Horovod with 25GbE RDMA-capable networking when running a distributed training job over different numbers of NVIDIA Pascal GPUs for Inception V3, ResNet-101 and VGG-16. (Source: Sergeev, A., Del Balso, M. (2018) Horovod: fast and easy distributed deep learning in TensorFlow, Figure 7.)

KEY RESOURCES FOR HOROVOD

  1. Sergeev, A., Del Balso, M. (2017) Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow.
  2. Sergeev, A. (2017) Horovod - Distributed TensorFlow Made Easy.
  3. Sergeev, A., Del Balso, M. (2018) Horovod: fast and easy distributed deep learning in TensorFlow.

Exxact Deep Learning GPU Solutions


Our deep learning GPU solutions are powered by the leading hardware, software, and systems engineering. Each system comes with our pre-installed deep learning software stack and are fully turnkey to run right out of the box.



SHOP NOW