Integration with deep learning frameworks

IBM® Watson™ Machine Learning Community Edition distributed deep learning has been integrated with the WML CE IBM Caffe, Pytorch, and TensorFlow packages. The ddlrun command is used to launch training using the WML CE distributed deep learning integration.

Caffe

WML CE distributed deep learning is directly integrated into Caffe, and can be exercised by using the ddlrun to launch Caffe.

Pytorch

WML CE distributed deep learning is directly integrated into Pytorch, in the form of ddl backend in the Pytorch communication package torch.distributed.

The WML CE Pytorch package provides example training setup based on the Pytorch training model scripts. These can be found on your system in:

$CONDA_PREFIX/lib/python$PY_VER/site-packages/torch/examples/ddl_examples/

More details about WML CE distributed deep learning integration into Pytorch, can be found in Tutorial: Pytorch with DDL

TensorFlow

DDL is indirectly integrated into TensorFlow in the form of a custom operator. The custom operator is provided as a shared library, which is loaded and invoked in the python training script.

The WML CE ddl-tensorflow package provides an example training setup based on the TensorFlow High Performance models from the TensorFlow benchmarks repository and the TensorFlow-Slim model library from the TensorFlow models repository. Those can be found on your system in:

$CONDA_PREFIX/ddl-tensorflow/examples/

More details about WML CE distributed deep learning integration into TensorFlow can be found in Tutorial: TensorFlow with DDL.