Deep learning experiment tutorial: Build a TensorFlow model to recognize handwritten digits using the MNIST data set (Watson Machine Learning)
This tutorial guides you through using the MNIST computer vision data set to train a deep learning TensorFlow model to recognize handwritten digits. In this tutorial, you will train, deploy, and test the model with experiment builder.
Prerequisite
User access to Watson Machine Learning Accelerator with the same User ID you use for Watson Studio. Watson Machine Learning Accelerator is a component required to run experiments.
Steps overview
This tutorial presents the basic steps for training a deep learning model with experiment builder in Watson Studio:
- Set up data files
- Download sample code
- Train the model
- Monitor training progress and results
- Deploy the trained model
- Test the deployed model
This tutorial does not demonstrate distributed deep learning, or using the Watson Machine Learning hyperparameter optimization feature.
Preview the tutorial
Watch this video to see how to build a TensorFlow model model to recognize handwritten digits.
This video provides a visual method to learn the concepts and tasks in this documentation.
Step 1: Set up data files
Training a deep learning model using Watson Machine Learning relies on accessing training files stored in a local repository your system administrator sets up for Watson Machine Learning Accelerator, which is a required plug-in for running experiments.
- Create a new project or open an existing project.
- Download MNIST sample data files to your local computer from here: MNIST sample files.
Note: Some browsers automatically uncompress the sample data files, which causes errors later in this tutorial. Follow instructions on the MNIST download page for verifying how your browser handled the files. - Copy or move the data files to a folder where Watson Machine Learning Accelerator is installed.
Step 2: Download sample code
Download sample TensorFlow model-building Python code from here: tf_model_with_metrics_2_1.zip .
tf-model.zip contains three files:
- convolutional_network.py - Model-building Python code
- input_data.py - A "helper" file for reading the MNIST data files
- emetrics.py - A "helper" file for reading metrics
Note:
- When deploying the model, you will need to save the model into a required directory in the model training script, and the directory is
$RESULT_DIR/model
whereRESULT_DIR
is an environment variable accessible by the model training script. - To monitor training progress and results, you will need to log the training metrics in the model training scripts. For example, using the
record
interface from theEMetrics
module that can be directly used in the model training script:from emetrics import EMetrics with EMetrics.open() as em: for epoch in range(0,total_epochs): # perform training for epoch # compute training metrics and assign to values train_metric1, train_metric2 etc em.record("training",epoch_num,{'metric1': train_metric1, 'metric2': train_metric2}) # compute test metrics and assign to values test_metric1, test_metric2 etc # NOTE for these metrics use the group EMetrics.TEST_GROUP so that the service will recognize these metrics as computed on test/validation data em.record(EMetrics.TEST_GROUP,epoch_num,{'metric1': test_metric1, 'metric2': test_metric2})
Step 3: Train the model
This tutorial demonstrates training the model using experiment builder in Watson Studio.
-
Open a project.
-
Click New asset and choose Deep learning.
-
Specify a name for the experiment and an optional description.
-
Specify the path where the training data is stored relative to the mount point for Watson Machine Learning Accelerator. For example, if you saved your files to
wmla-nfs/wmla-data/mnist-tutorial
, and you know that Watson Machine Learning Accelerator is installed towmla-nfs
, enter/wmla-data/mnist-tutorial
as the path for the data. For details, see: Determine mount point for deep learning experiments. -
Add a model defintion.
- Specify a name and a description.
- Upload the sample code, tf_model_with_metrics_2_1.zip, where prompted.
- Set the execution command, specify this command for running the model-building code:
convolutional_network.py --trainImagesFile train-images-idx3-ubyte.gz --trainLabelsFile train-labels-idx1-ubyte.gz --testImagesFile t10k-images-idx3-ubyte.gz --testLabelsFile t10k-labels-idx1-ubyte.gz --learningRate 0.001 --trainingIters 6000
- Populate the attributes to be used during the experiments.
- Specify the framework that is used in the model-building code:
tensorflow_rt24.1-py3.11
. - Specify GPU as the resource type.
- Set the hardware specification:
1 GPU, generic, 1 CPU, and 4GB RAM
. - Set the training type to single node.
- Click Create.
- Specify the framework that is used in the model-building code:
-
Click Create and run.
Step 4: Monitor training progress and results
-
Monitor the progress of a training run in the Training Runs tab of experiment builder.
-
When the training run is complete, click on the training run name to view details of its training results, including logs and other output.
Step 5: Deploy the trained model
You can use your trained model to classify new images only after the model has been deployed.
To deploy the model, you must first promote it to a deployment space.
- In the deep learning experiments list, click on your training run.
- Once the training run is completed, the model appears in the Completed table. Select Save model from the action menu. Give the model a name and click Save. This stores the model in the Watson Machine Learning repository and saves it to the project. In the project's asset you can now see the saved model.
- To deploy the model, you must first promote it to a deployment space. Choose Promote to space from the action menu for the model. Select Target space or Create a new deployment space, and click Promote.
- A prompt is shown to navigate to your Deployment Space after promoting the model.
- After navigating to the deployment space click on the model name.
- Click New deployment.
- Choose Online and enter a name for the deployment.
- Click Create. Monitor the status of the model deployment.
Step 6: Test the deployed model
You can quickly test your deployed model from the deployment details page.
-
On your local computer, download this sample payload JSON file with input data corresponding to the handwritten digits "5" and "4": MNIST_sample_payload_V4.json
-
In the Test area of the deployment details page in Watson Studio, paste the value of the
payload
field from tf-mnist-test-payload.json. Then click Predict.Sample output:
{ "values": [ 5, 4 ] }
This output shows: the first input data was correctly classified as belonging to the class "5", and the second input data was correctly classified as belonging to the class "4".
Parent topic: Deep Learning Experiment Builder