Training neural networks using the deep learning experiment builder
This topic describes how to train a neural network using the experiment builder in Watson Studio.
Service To build deep learning experiments, you must have access to Watson Machine Learning Accelerator, which is not installed by default as part of IBM Watson Machine Learning. An administrator must install Watson Machine Learning Accelerator on the IBM Cloud Pak for Data platform and provide you with user access.
As a data scientist, you need to train thousands of models to identify the right combination of data in conjunction with hyperparameters to optimize the performance of your neural networks. You want to perform more experiments and faster. You want to train deeper neural networks and explore more complicated hyperparameter spaces. IBM Watson Machine Learning accelerates this iterative cycle by simplifying the process to train models in parallel with an auto-allocated GPU compute containers.
To try the Deep Learning Experiment Builder for yourself, follow the steps in this Experiment builder tutorial to build and run an experiment that predicts handwritten digits.
Deep Learning experiment overview
Required service Watson Machine Learning
Data format - Textual: CSV files with labeled textual data
- Image: Image files in a PKL file. For example, a model testing signatures uses images resized to 32×32 pixels and stored as numpy arrays in a pickled format.
Data size Any size
Prerequisites for creating Deep Learning experiments
The following are required for creating experiments:
- Access to the storage space your administrator has established and know the path where you can upload training files.
- A model definition which contains model building code and metadata about how to run the training
- User access to Watson Machine Learning Accelerator with the same User ID you use for Watson Studio. Watson Machine Learning Accelerator is a component required to run experiments.
- A deployment space is not required to create a deep learning experiment but it is required for the model to be deployed.
Create a new Deep Learning experiment
- Open a project.
- Click New asset and choose Deep learning experiment.
Defining Deep Learning experiment details
- Set the name for the experiment and an optional description.
- Specify the source files folder where you have stored your training data. The path should point to a local repository on Watson Machine Learning Accelerator that your system administrator has set up for your use.
Associating model definitions
You must associate one or more model definitions to this experiment. You can associate multiple model definitions as part of running an experiment. Model definitions can be a mix of either existing model definitions or ones that you create as part of the process.
- Click Add model definition.
- Choose whether to create a new model definition or use an existing model definition.
- To create a new model definition, click the New tab and specify the options for the model definition. For details, refer to Creating a new model definition
- To choose an existing model definition, click the Existing tab and choose the model definition file.
Creating a new model definition
- Type a unique name and a description.
-
Choose a .zip file that has the Python code that you have set up to indicate the metrics to use for training runs. For a model definition code sample, refer to Experiment builder tutorial. Note that when running an experiment on Watson Studio Local, you specify the data directory where your training data is stored in the model definition. For example:
data_dir= os.environ["DATA_DIR"]; train_images_file = data_dir + "/" + train_images_file train_labels_file = data_dir + "/" + train_labels_file test_images_file = data_dir + "/" + test_images_file test_labels_file = data_dir + "/" + test_labels_file -
From the Frameworks box, select the appropriate framework. This must be compatible with the code you use in the Python file.
-
In the Execution command box, enter the execution command that can be used to execute the Python code. The execution command must reference the Python code and can optionally pass the name of training files if that is how your experiment is structured.
For example:
convolutional_network.py --trainImagesFile train-images-idx3-ubyte.gz --trainLabelsFile train-labels-idx1-ubyte.gz --testImagesFile t10k-images-idx3-ubyte.gz --testLabelsFile t10k-labels-idx1-ubyte.gz --learningRate 0.001 --trainingIters 6000 -
In the Attributes during this experiment section, select a compute plan, which determines the number and size of GPUs to use for the experiment.
- Select the distributed training type and the maximum number of nodes.
- Click Create and run to create the model definition and run the experiment.
- Click Save model for a completed training run to save the run as a model, then assign a name. The model will display on the Assets page for the project. From there you can deploy the model to a space and pass it sample data to generate predictions.
Parent topic: Analyzing data and building models