Using frameworks via command line interface

IBM Spectrum Conductor Deep Learning Impact 1.2 supports using additional frameworks via the command line interface (CLI). Utilize your existing cluster resources to use any framework to run deep learning jobs using the dlicmd command.

Users can submit deep learning tasks to a particular deep learning framework provided that it was installed and made available by the cluster administrator. The dlicmd command utilizes the framework plugins that are found in the $EGO_TOP/dli/<dli-version>/dlpd/tools/dl_plugins directory. By default, some framework plugins are provided with IBM Spectrum Conductor Deep Learning Impact 1.2, including:

  • IBM Caffe
  • PowerAI Caffe (BVLC)
  • Distributed Deep Learning (DDL)
  • Caffe Large Model Support (LMS)
  • TensorFlow
  • Distributed TensorFlow
  • Keras
  • PyTorch

New plugins can be created and added to the $EGO_TOP/dli/<dli-version>/dlpd/tools/dl_plugins directory by a cluster administrator, see Add a framework.

The dlicmd command assumes that models can access data sources from within the IBM Spectrum Conductor Deep Learning Impact cluster. Model data must either be dynamically downloaded, reside on shared directories, or be available from remote data connection services.

Limitations

Consider the following limitations when using framework plugins:
  • Frameworks configured with a plugin for dlicmd command can not be used from within a Jupyter notebook.
  • All dlicmd framework plugins cannot be managed from the cluster management console; they must be managed using command line.

Examples

  •  To use the TensorFlow plugin to execute a TensorFlow task using the dlicmd command, do the following:
    1. From any host, log in to IBM Spectrum Conductor Deep Learning Impact.
      $ python dlicmd.py --logon  --master-host abc.ibm.com --username Admin --password Admin
    2. List all available frameworks on host abc.ibm.com.
      $ python dlicmd.py --dl-frameworks --master-host abc.ibm.com
    3. Execute a TensorFlow task named mnist.py using instance group dliig on host abc.ibm.com.
      $ python dlicmd.py --exec-start tensorflow --master-host abc.ibm.com --ig dliig --model-main mnist.py
    4. After executing the task, there are two ways to see the submitted task, either:
      • Using the CLI:
        • Using the dlicmd and execution ID of the task, you can obtain information about the task.
      • Using the cluster management console:
        • Log in to the cluster management console and navigate to the Workload tab and select Spark > My Applications & Notebooks. The application name and execution ID is available as a submitted application. Note, no information about the execution is available under Spark > Deep Learning.
  • To use the Caffe plugin to execute a Caffe task using the dlicmd command, do the following:
    1. From any host, log in to IBM Spectrum Conductor Deep Learning Impact. This assumes that the cluster DLI_DLPD_REST_PORT is 9243 (default) and master host is a host where the DLPD service is running.
      $ python dlicmd.py --logon  --master-host abc.ibm.com --username Admin --password Admin
    2. List all available frameworks on host abc.ibm.com.
      $ python dlicmd.py --dl-frameworks --master-host abc.ibm.com
    3. Execute a Caffe-BVLC train task named lenet_solver.prototxt. To run Caffe job on multiple nodes, make sure all nodes have access to the files required by Caffe task such as models and datasets.
      $ python dlicmd.py --exec-start caffe --master-host abc.ibm.com --ig dliig --gpuPerWorker 1 --model-dir /dli_shared_fs/models/caffe --model-main lenet_solver.prototxt train
    4. After executing the task, there are two ways to see the submitted task, either:
      • Using the CLI:
        • Using the dlicmd and execution ID of the task, you can obtain information about the task.
      • Using the cluster management console:
        • Log in to the cluster management console and navigate to the Workload tab and select Spark > My Applications & Notebooks. The application name and execution ID is available as a submitted application. Note, no information about the execution is available under Spark > Deep Learning.