Deep learning experiment tutorial: Build a TensorFlow model using the CIFAR data set (Watson Machine Learning)
This tutorial guides you through using the CIFAR computer vision data set to train a deep learning TensorFlow model to recognize objects in images. In this tutorial, you will train, deploy, and test the model with experiment builder.
Prerequisite
IBM Software Hub scheduling service is a component required to run experiments.
Steps overview
This tutorial presents the basic steps for training a deep learning model with experiment builder in Watson Studio:
- Task 1: Set up data files
- Task 2: Download the sample code
- Task 3: Train the model
- Task 4: Monitor training progress and results
- Task 5: Deploy the trained model
- Task 6: Test the deployed model
This tutorial does not demonstrate distributed deep learning, or using the Watson Machine Learning hyperparameter optimization feature.
Task 1: Set up data files
Training a deep learning model using Watson Machine Learning relies on accessing training files stored in a local repository that your system administrator sets up for Watson Machine Learning, which is a required plug-in for running experiments. Follow these steps to set up the data files:
- Create a new project or open an existing project.
- Download the CIFAR-10 Python package to your local machine, and add it as a data asset in your project.
- Optionally, you can create a storage volume and connection that is used to save the dataset that is downloaded during training.
Task 2: Download the sample code
Download the sample TensorFlow model-building Python code from here: tf_model_cifar10.zip.
Note:
- When deploying the model, you will need to save the model into a required directory in the model training script, and the directory is
$RESULT_DIR/model
whereRESULT_DIR
is an environment variable accessible by the model training script. - To monitor training progress and results, you will need to log the training metrics in the model training scripts. For example, using the
record
interface from theEMetrics
module that can be directly used in the model training script:from emetrics import EMetrics with EMetrics.open() as em: for epoch in range(0,total_epochs): # perform training for epoch # compute training metrics and assign to values train_metric1, train_metric2 etc em.record("training",epoch_num,{'metric1': train_metric1, 'metric2': train_metric2}) # compute test metrics and assign to values test_metric1, test_metric2 etc # NOTE for these metrics use the group EMetrics.TEST_GROUP so that the service will recognize these metrics as computed on test/validation data em.record(EMetrics.TEST_GROUP,epoch_num,{'metric1': test_metric1, 'metric2': test_metric2})
Task 3: Train the model
Follow these steps to create a deep learning experiment in a project to train the model:
- Open the project containing either the uploaded data asset or the HTTP data connection.
- Click New Asset > Build deep learning experiments.
- Specify a name for the experiment and an optional description.
- Optional: Click Select storage volume and select a storage volume that was created for training.
- Click Select data to select the files using one of the following methods:
- From the connection: Select Connection > [your connection] > cifar-10-python.tar.gz.
- From the project: Select Data asset > cifar-10-python.tar.gz.
- Select Pre-download the data asset in the job flow.
- For Pre-download location, select either Training Worker local storage or the optional storage volume.
- Add a model defintion.
-
Specify a name and a description.
-
Click Browse to upload the sample code,
tf_model_cifar10.zip
. -
Specify the framework that is used in the model-building code: tensorflow_rt24.1-py3.11.
-
Specify either CPU or GPU as the resource type.
-
Set the hardware specification: For example, 1 GPU, generic, 1 CPU, and 4GB RAM.
-
Set the training type to Single node.
-
Set the execution command, specify this command for running the model-building code:
train_cifar10.py --epochs 10 --lr 0.001 --batch-size 64 ``` 1. Click **Create** to save the model definition.
-
- Click Create and run.
Task 4: Monitor training progress and results
Follow these steps to monitor the progress of a training run in the Training Runs tab of experiment builder.
- During training, The Training runs tab shows you the current progress of the training: Queued, In progress, or Completed.
- When the training run is complete, click on the training run name to view:
- The Monitor tab to view the progress of the training
- The Overview tab to view the experiment configuration
- The Logs tab to see the details of the training results, including logs and other output.
Task 5: Deploy the trained model
You can use your trained model to classify new images only after the model has been deployed. Follow these steps to promote the model to a deployment space, and then deploy the model:
- In the deep learning experiments list, click on your training run.
- Wait for the training run to complete, and save the model in the Completed section:
- Click the Actions menu
at the end of the row for the model, and choose Save model.
- Provide a name for the model, and click Save. This stores the model in the Watson Machine Learning repository and saves it to the project.
- Click the Actions menu
- On the Assets tab, you can see the saved model.
- To deploy the model, you must first promote it to a deployment space.
- Click the Overflow menu
at the end of the row for the model, and choose Promote to space.
- Select an existing deployment space for the Target space or create a new deployment space.
- Select the option to Go to the model in the space after promoting it.
- Click Promote.
- Click the Overflow menu
- In the deployment space, click New deployment.
- Choose Online.
- Type a name for the deployment.
- Click Create.
- Monitor the status of the model deployment.
Task 6: Test the deployed model
Follow these steps to test your deployed model from the deployment details page:
-
On your local computer, download this sample payload JSON file with input data corresponding to the image "airplane" and "ship": CIFAR-10_sample_payload.json
-
Click the deployment name to view the deployment.
-
Click the Test tab.
-
Click Browse local files, and select the downloaded payload file
CIFAR-10_sample_payload.json
. -
Click Predict.
Sample output:
{
"id": "dense_1",
"values": [
[
0.7929239273071289,
0.004965754225850105,
0.0002665588108357042,
0.011512157507240772,
0.0022618358489125967,
0.0002059160906355828,
0.0015659810742363334,
0.000038293484976748005,
0.15528139472007751,
0.03097810409963131
],
[
0.04695689678192139,
0.0007269788766279817,
0.0035548359155654907,
0.017715563997626305,
0.00337904947809875,
0.002080954145640135,
0.022615352645516396,
0.00000814568284113193,
0.8998115062713623,
0.003150772303342819
]
]
}
This output shows that the first input data was correctly classified as belonging to the class "airplane" with label index 0, and the second input data was correctly classified as belonging to the class "ship" with label index 8.
Parent topic: Deep learning