Getting started with Horovod
WML CE contains the 0.19 Horovod. Horovod is distributed deep learning framework for TensorFlow, Keras, and PyTorch. In WML CE, Horovod uses NCCL with MPI to communicate among nodes. For more information about this package, see Horovod.
Installing Horovod
- Set up the conda channel:
The WML CE packages are distributed as part of the public conda repository. First, update the local conda configuration to point to the public conda channel:
conda config --prepend channels https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
- Install the horovod conda package from the WML CE channel by running the following
command:
conda install horovod
- Install a deep learning framework package so you can test horovod by running one of the
following
commands:
conda install tensorflow-gpu
Or
conda install pytorch
Or
conda install keras
Running horovod based TensorFlow examples
Follow these steps to run the horovod based TensorFlow examples:
- Install the examples that are shipped with the horovod package by running the following
command:
horovod-install-samples <user-directory>
- Recommended: Install the DDL conda package. To run the examples, you can use
horovodrun
as shipped with horovod. However, we recommend usingddlrun
because it provides more flexibility. Running withddlrun
requires that you install the DDL conda package by running this command:conda install ddl
For more information about the DDL conda package, see Getting started with DDL.
- Run the example script by using
ddlrun
:ddlrun -H host1,host2 python tensorflow2_mnist.py
For more information about
ddlrun
, see Using the ddlrun tool.