Edit a Caffe training model for distributed training with IBM Fabric

Before uploading a Caffe training model, edit the model to work with the distributed training engine option in IBM Spectrum Conductor Deep Learning Impact. The distributed training engine utilizes Fabric technology.

About this task

Editing the Caffe model, ensures that the model works with the distributed training operations in IBM Spectrum Conductor Deep Learning Impact.

Procedure

  1. Make sure that the Caffe files are named accordingly.
    • The Caffe solver model definition file must be named solver.prototxt.
    • The Caffe training model definition file must be named train_test.prototxt.
    • The Caffe inference model definition file must be named inference.prototxt.
  2. Edit the solver.prototxt file.
    1. Set test_compute_loss to true.
      test_compute_loss: true
    2. Set snapshot_format to HDF5.
      snapshot_format: HDF5
  3. Edit the train_test.prototxt file so that input_param is set to shape.
    For example:
    input_param { shape: { dim:200 dim: 3 dim: 32 dim: 32 } }
  4. Edit the train_test.prototxt file to include an accuracy layer.
    For example:
    layer {
      name: "accuracy"
      type: "Accuracy"
      bottom: "ip2"
      bottom: "label"
      top: "accuracy"
    }
    

Results

The edited Caffe model is ready for distributed training with IBM Fabric.

What to do next

Add the model to IBM Spectrum Conductor Deep Learning Impact, see Create a training model.