Deep learning experiment tutorial: Build a PyTorch model to recognize handwritten digits using the MNIST data set

This tutorial guides you through using the MNIST computer vision data set to train a deep learning PyTorch model to recognize handwritten digits. In this tutorial, you will train, deploy, and test the model with experiment builder.

Prerequisites

User access to Watson Machine Learning Accelerator with the same User ID you use for Watson Studio. Watson Machine Learning Accelerator is a component required to run experiments.

Overview

This tutorial presents the basic steps for training a deep learning model with experiment builder in Watson Studio: This tutorial does not demonstrate distributed deep learning, or using the Watson Machine Learning hyperparameter optimization feature.

To see the video that uses the PyTorch model-building Python code, see Balance deep learning jobs with Elastic Distributed Training.

Step 1: Set up data files

Training a deep learning model using Watson Machine Learning relies on accessing training files stored in a local repository your system administrator sets up for Watson Machine Learning Accelerator, which is a required plug-in for running experiments.

  1. Create a new project or open an existing project.
  2. Download MNIST sample data files to your local computer from here: MNIST sample files

    Note: Some browsers automatically uncompress the sample data files, which causes errors later in this tutorial. Follow instructions on the MNIST download page for verifying how your browser handled the files.

  3. Copy or move the data files to a folder where Watson Machine Learning Accelerator is installed. See: Import data into Cloud Pak for Data to be used by Watson Machine Learning Accelerator

Step 2: Download sample code

Download sample Python model-building Python code from here: pytorch-mnist-edt-model.zip .

pytorch-mnist-edt-model.zip contains these files:

  • pytorch_mnist_EDT.py - Model-building Python code for use with the elastic distributed training engine
  • pytorch_mnist_original.py - Model-building Python code
  • edtcallback.py - A "helper" file for reading the MNIST data files
  • emetrics.py - A "helper" file for reading metrics
  • elog.py - A "helper" file for logging

Step 3: Train the model

  1. Open a project.
  2. Click Add to project and choose Deep learning experiment.
  3. Specify a name for the experiment and an optional description.
  4. Specify the path where the training data is stored relative to the mount point for Watson Machine Learning Accelerator. For example, if you saved your files to wmla-nfs/wmla-data/mnist-tutorial, and you know that Watson Machine Learning Accelerator is installed to wmla-nfs, enter /wmla-data/mnist-tutorial as the path for the data.
  5. Click Add training definition to create the training definition you will use to run the experiment.
    1. Click the New tab.
    2. Give the training definition a name.
    3. Upload the sample code, pytorch_mnist_EDT.zip, where prompted.
    4. Specify the framework that is used in the model-building code: pytorch_2.1-py3.7 or higher.
    5. In the Execution command box, specify this command for running the model-building code:
      pytorch_mnist_EDT.py --trainImagesFile train-images-idx3-ubyte.gz --trainLabelsFile train-labels-idx1-ubyte.gz --testImagesFile t10k-images-idx3-ubyte.gz --testLabelsFile t10k-labels-idx1-ubyte.gz --learningRate 0.001 --trainingIters 3000
  6. Select “1 x NVIDIA Tesla V100(1GPU)” for the compute plan.
  7. Click Create.

Step 4: Monitor training progress and results

  • You can monitor the progress of a training run in the Training Runs tab of experiment builder.

  • When the training run is complete, click on the training definition to view details of its training results, including logs and other output.

Step 5: Deploy the trained model

You can use your trained model to classify new images only after the model has been deployed.

  1. In the Training Runs tab of experiment builder, select Save model from the action menu. Give the model a name and click Save. This stores the model in the Watson Machine Learning repository and saves it to the project.
  2. In the project’s Asset tab you can now see the saved model. To deploy the model, you must first promote it to a deployment space. Choose Promote from the action menu for the model. You will be prompted to associate a deployment space with your project if you haven’t done so yet.
  3. A prompt is shown to navigate to your Deployment Space after promoting the model. Alternatively the Deployment Space can be accessed via the Overview tab in the project under the Associated deployment space heading.
  4. After navigating to the deployment space click on the model name.
  5. Click Deploy.
  6. Choose Online and enter a name for the deployment.

Step 6: Test the deployed model

You can quickly test your deployed model from the deployment details page.

  1. On your local computer, download this sample payload JSON file with input data corresponding to the handwritten digits "5" and "4": MNIST_sample_payload_V4.json

  2. In the Test area of the deployment details page in Watson Studio, paste the value of the payload field from tf-mnist-test-payload.json. Then click Predict.

    Sample output:
    {
      "values": [
        7,
        4
      ]
    }
    
    This output shows: the first input data was correctly classified as belonging to the class "7", and the second input data was correctly classified as belonging to the class "4".