IBM DDL library Python API

IBM IBM Watson® Machine Learning Community Edition distributed deep learning (or DDL) is a MPI-based communication library, which is specifically optimized for deep learning training. DDL provides the ability to perform an allreduce function on GPUs across multiple machines in a cluster. DDL also provides functions for GPU memory management.

A Python wrapper for the C DDL library is provided. You can either use functions from the DDL library to explicitly manage memory on the GPU or use numPy arrays and allow the DDL library to manage memory on the GPU. Examples for both uses are provided below. For more details on the functions provided by DDL, see $CONDA_PREFIX/doc/ddl/pyddl.txt.

Using DDL with explicit memory management within a Python script

The following example program creates a buffer on each process, performs an allreduce, then prints out the results.

Example program:

import pyddl
import numpy as np

n = 100

# Initialize DDL
pyddl.init()

# Create an array on the CPU
arr = np.array([x for x in range(n)], dtype=np.float32)

# Allocate memory on the GPU
res, buf = pyddl.malloc(64, n * 4)

# Copy buffer from the CPU to the GPU
pyddl.memcpy_host_to_device(buf, arr)

# Synchronize DDL streams
pyddl.memsync()

# Perform DDL's allreduce function
pyddl.allreduce(buf, n, pyddl.DDL_TYPE_FLOAT, pyddl.DDL_OP_SUM)

# Copy buffer from the GPU to the CPU
pyddl.memcpy_device_to_host(arr, buf)

# Synchronize DDL streams
pyddl.memsync()

# Deallocate memory on the GPU
pyddl.free(buf)

# Verify that the allreduce produced the correct result
if pyddl.rank() == 0:
    print(arr)

pyddl.finalize()

Run the Python script

The ddlrun program can be used to launch the example script. To run the example script on two nodes, named host1 and host2:

ddlrun -H host1,host2 python example.py

Using DDL without explicit memory management within a Python script

The following example program creates a buffer on each process, performs an allreduce, and then prints out the results.

Example program:

import pyddl
import numpy as np

n = 100

# Initialize DDL
pyddl.init()

# Create an array on the CPU
arr = np.array([x for x in range(n)], dtype=np.float32)

# Perform DDL's allreduce function
pyddl.allreduce(arr, n, pyddl.DDL_TYPE_FLOAT, pyddl.DDL_OP_SUM)

# Verify that the allreduce produced the correct result
if pyddl.rank() == 0:
    print(arr)

pyddl.finalize()

Run the Python script

The ddlrun program can be used to launch the example script. To run the example script on two nodes, named host1 and host2:

ddlrun -H host1,host2 python example.py