Table of contents

Training neural networks using the deep learning experiment builder (Watson Machine Learning)

This topic describes how to train a neural network using the experiment builder in Watson Studio.

Service To build deep learning experiments, you must have access to Watson Machine Learning Accelerator, which is not installed by default as part of IBM Watson Machine Learning. An administrator must install Watson Machine Learning Accelerator on the IBM Cloud Pak for Data platform and provide you with user access.

As a data scientist, you need to train thousands of models to identify the right combination of data in conjunction with hyperparameters to optimize the performance of your neural networks. You want to perform more experiments and faster. You want to train deeper neural networks and explore more complicated hyperparameter spaces. IBM Watson Machine Learning accelerates this iterative cycle by simplifying the process to train models in parallel with an auto-allocated GPU compute containers.

To try the Deep Learning Experiment Builder for yourself, follow the steps in this Experiment builder tutorial to build and run an experiment that predicts handwritten digits.

Prerequisites for creating deep learning experiments

The following are required for creating experiments:

  • A deployment space where you can evaluate the training assets. If there is no associated training space associated with the current project you will be prompted to create one.
  • Access to the storage space your administrator has established and know the path where you can upload training files.
  • A training definition which contains model building code and metadata about how to run the training
  • User access to Watson Machine Learning Accelerator with the same User ID you use for Watson Studio. Watson Machine Learning Accelerator is a component required to run experiments.

Create a new deep learning experiment

  1. Open a project.

  2. Click Add to project and choose Deep learning experiment.

Define deep learning experiment details

Set a name for the experiment and an optional description.

Specify the source files folder where you have stored your training data. The path should point to a local repository on Watson Machine Learning Accelerator that your system administrator has set up for your use.

Associate training definitions

You must associate one or more training definitions to this experiment. You can associate multiple training definitions as part of running an experiment. Training definitions can be a mix of either existing training definitions or ones that you create as part of the process.

  1. Click Add training definition.

  2. Choose whether to create a new training definition or use an existing training definition.

    • To create a new training definition, click the New tab and specify the options for the training definition.
    • To choose an existing training definition, click the Existing tab and choose the training definition file.

Creating a new training definition

  1. Type a unique name and a description.

  2. Choose a .zip file that has the Python code that you have set up to indicate the metrics to use for training runs. To see a sample of training definition code, see Experiment builder tutorial. Note that when running an experiment on Watson Studio Local, you specify the data directory where your training data is stored in the training definition. For example:

     data_dir= os.environ["DATA_DIR"];
     train_images_file = data_dir + "/" + train_images_file
     train_labels_file = data_dir + "/" + train_labels_file
     test_images_file = data_dir + "/" + test_images_file
     test_labels_file = data_dir + "/" + test_labels_file
  3. From the Frameworks box, select the appropriate framework. This must be compatible with the code you use in the Python file.

  4. In the Execution command box, enter the execution command that can be used to execute the Python code. The execution command must reference the Python code and can optionally pass the name of training files if that is how your experiment is structured.

    For example: --trainImagesFile train-images-idx3-ubyte.gz --trainLabelsFile train-labels-idx1-ubyte.gz --testImagesFile t10k-images-idx3-ubyte.gz --testLabelsFile t10k-labels-idx1-ubyte.gz --learningRate 0.001 --trainingIters 6000
  5. In the Attributes during this experiment section, select a compute plan, which determines the number and size of GPUs to use for the experiment.

  6. Select the distributed training type and the maximum number of nodes.

  7. Create the training definition and run the experiment.

  8. Click Create and run for a completed training run to save the run as a model, then assign a name. The model will display on the Assets page for the project. From there you can deploy the model to a space and pass it sample data to generate predictions.