Integration with deep learning frameworks
IBM® Watson™ Machine Learning Community Edition distributed deep learning has been integrated with the WML CE IBM Caffe, Pytorch, and TensorFlow packages. The ddlrun command is used to launch training using the WML CE distributed deep learning integration.
Caffe
WML CE distributed deep learning is directly
integrated into Caffe, and can be exercised by using the ddlrun
to launch
Caffe.
Pytorch
WML CE distributed deep learning is directly
integrated into Pytorch, in the form of ddl
backend in the Pytorch communication
package torch.distributed
.
The WML CE Pytorch package provides example training setup based on the Pytorch training model scripts. These can be found on your system in:
$CONDA_PREFIX/lib/python$PY_VER/site-packages/torch/examples/ddl_examples/
More details about WML CE distributed deep learning integration into Pytorch, can be found in Tutorial: Pytorch with DDL
TensorFlow
DDL is indirectly integrated into TensorFlow in the form of a custom operator. The custom operator is provided as a shared library, which is loaded and invoked in the python training script.
The WML CE ddl-tensorflow package provides an example training setup based on the TensorFlow High Performance models from the TensorFlow benchmarks repository and the TensorFlow-Slim model library from the TensorFlow models repository. Those can be found on your system in:
$CONDA_PREFIX/ddl-tensorflow/examples/
More details about WML CE distributed deep learning integration into TensorFlow can be found in Tutorial: TensorFlow with DDL.