Import data into Cloud Pak for Data to be used by Watson Machine Learning Accelerator

Add data files to your WML Accelerator cluster in Cloud Pak for Data.

About this task

To add files, create a temporary pod using the wmla_pod_working.yaml file.

Procedure

  1. Obtain the wmla_pod_working.yaml file from https://github.com/IBM/wmla-assets/blob/master/dli-learning-path/movie-recommendation-use-case/wmla_pod_working.yaml
  2. Switch to the WML Accelerator namespace.
  3. Create a temporary pod using the wmla_pod_working.yaml file. Using this file generates a pod named wmla-working-pod.
    oc create -f wmla_pod_working.yaml
  4. Verify that the wmla-working-pod pod is in Running state.
    oc get po |grep wmla-working-pod

  5. Log on to the pod.
    oc exec -it wmla-working-pod  bash
  6. Source and activate conda environment.
    bash-4.2# source /opt/anaconda3/bin/activate 
  7. Install wget and unzip.
    1. Install wget:
      conda install wget
    2. Install unzip:
      conda install unzip
  8. Go to dataset directory and download the dataset.
    (base) bash-4.2# cd /gpfs/mydatafs/
    
    
    (base) bash-4.2# wget https://github.com/IBM/wmla-assets/raw/master/dli-learning-path/datasets/pytorch-mnist-dataset.zip
    Will not apply HSTS. The HSTS database must be a regular and non-world-writable file.
    ERROR: could not open HSTS store at '/root/.wget-hsts'. HSTS will be disabled.
    --2021-03-30 20:42:25--  https://github.com/IBM/wmla-assets/raw/master/dli-learning-path/datasets/pytorch-mnist-dataset.zip
    Resolving github.com... 140.82.113.4
    Connecting to github.com|140.82.113.4|:443... connected.
    HTTP request sent, awaiting response... 302 Found
    Location: https://raw.githubusercontent.com/IBM/wmla-assets/master/dli-learning-path/datasets/pytorch-mnist-dataset.zip [following]
    --2021-03-30 20:42:25--  https://raw.githubusercontent.com/IBM/wmla-assets/master/dli-learning-path/datasets/pytorch-mnist-dataset.zip
    Resolving raw.githubusercontent.com... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
    Connecting to raw.githubusercontent.com|185.199.108.133|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 23006288 (22M) [application/zip]
    Saving to: 'pytorch-mnist-dataset.zip'
    
    
    pytorch-mnist-dataset.zip       100%[=======================================================>]  21.94M  --.-KB/s    in 0.1s    
    
    
    2021-03-30 20:42:25 (217 MB/s) - 'pytorch-mnist-dataset.zip' saved [23006288/23006288]
  9. Unzip the dataset.
    (base) bash-4.2# unzip pytorch-mnist-dataset.zip 
    Archive:  pytorch-mnist-dataset.zip
    
    
    (base) bash-4.2# ls -tlr
    total 22572
    -rw-rw-rw-. 1 root       root       23006288 Mar 30 20:42 pytorch-mnist-dataset.zip
    drwxr-xr-x. 3 1000820000 1000820000     4096 Mar 30 20:43 pytorch-mnist

    From Experiment Builder enter "pytorch-mnist" in your data path.