Edit a TensorFlow training model for distributed training with IBM Fabric
Before uploading a TensorFlow training model, edit the model to work with the distributed training engine option in IBM Spectrum Conductor Deep Learning Impact. The distributed training engine must use a fabricmodel.py file.
Before you begin
Before editing your TensorFlow training model to work with IBM Spectrum Conductor Deep Learning Impact, consider the following
limitations:
- The
tf.placeholder()
data input schema is not supported. Models must use the TensorFlow multithreaded queue schema as data input for high performance. To learn more about multithreaded queues in TensorFlow, see Threading and Queues.Note: Due to this limitation, the distributed training with IBM Fabric option is most suitable for object detection and object classification deep learning. - Models are set to automatically deploy to single-node, multi-node, and multi-GPU devices. Make
sure that you do not define the
tf.device()
operation in your TensorFlow model.
About this task
By editing the TensorFlow model, the model will provide the following operations:
- The train accuracy operation
- The train loss operation
- The test accuracy operation
- The test loss operation
- The global step operation
- The gradients and variables (grads_and_vars) operation which can be
obtained by:
optimizer.compute_gradients(train_loss)
. - The apply gradient operation which can be obtained by:
optimizer.apply_gradients (grads, global_step=global_step)
Procedure
Results
The edited TensorFlow model is ready for distributed training with IBM Fabric
What to do next
Add the model to IBM Spectrum Conductor Deep Learning Impact, see Create a training model.