Getting started with Horovod
WML CE contains the 0.19 Horovod. Horovod is distributed deep learning framework for TensorFlow, Keras, and PyTorch. In WML CE, Horovod uses NCCL with MPI to communicate among nodes. For more information about this package, see Horovod.
- Set up the conda channel:
The WML CE packages are distributed as part of the public conda repository. First, update the local conda configuration to point to the public conda channel:
conda config --prepend channels https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
- Install the horovod conda package from the WML CE channel by running the following
conda install horovod
- Install a deep learning framework package so you can test horovod by running one of the
conda install tensorflow-gpu
conda install pytorch
conda install keras
Running horovod based TensorFlow examples
Follow these steps to run the horovod based TensorFlow examples:
- Install the examples that are shipped with the horovod package by running the following
- Recommended: Install the DDL conda package. To run the examples, you can use
horovodrunas shipped with horovod. However, we recommend using
ddlrunbecause it provides more flexibility. Running with
ddlrunrequires that you install the DDL conda package by running this command:
conda install ddl
For more information about the DDL conda package, see Getting started with DDL.
- Run the example script by using
ddlrun -H host1,host2 python tensorflow2_mnist.py
For more information about
ddlrun, see Using the ddlrun tool.