Determine the mount point for deep learning experiments

Find the path relative to the Watson Machine Learning Accelerator mounting point for Deep Learning Experiments and obtain your trained models.

Model training data and results are located under /gpfs as follows:
Training data
Training data is located in /gpfs/mydatafs/.
Training results
Trained model data is located in /gpfs/myresultsfs/.

If your data is uploaded to NFS, use the following steps to determine the mount point:

  1. Find the PV name:
    oc get pvc wmla-mygpfs -o yaml | grep volumeName

    For example, the output returns the PV name:

    f:volumeName: {}
      volumeName: pvc-2007b8f2-cefb-431c-93bc-5b1c69658e69
  2. Using the PV name, obtain the NFS server
    oc get pv pv-name -o yaml | grep " path:"
    For example, using PV name pvc-2007b8f2-cefb-431c-93bc-5b1c69658e69:
    oc get pv pvc-2007b8f2-cefb-431c-93bc-5b1c69658e69 -o yaml | grep " path:"
    The path to the NFS server is returned:
    path: /export/share1/wmla-wmla-mygpfs-pvc-2007b8f2-cefb-431c-93bc-5b1c69658e69
  3. Using the path to the NFS server, the relative path is path-to-NFS-server/mydatafs.On the NFS server, the path is:
    /export/share1/wmla-wmla-mygpfs-pvc-2007b8f2-cefb-431c-93bc-5b1c69658e69/mydatafs
If your data is uploaded to a service pod, use the following steps to determine the mount point:
  1. Get the wmla-dlpd pod:
    zhao@mbp$ oc get po | grep wmla-dlpd
    wmla-dlpd-74d979b5c5-dklgw              2/2     Running            1 (16d ago)     23d
  2. Get the path for the pod:
    zhao@mbp$ oc exec -it wmla-dlpd-74d979b5c5-dklgw bash
    bash-4.4$ ls /gpfs/mydatafs/
    
    Here, on the wmla-dlpd service pod, the path is /gpfs/mydatafs/.

Obtain trained models from /gpfs/myresults

To obtain your trained model from the /gpfs/myresults directory, run the following command:
#find gpfs/myresultfs -name 'model'
Sample output:
...
gpfs/myresultfs/cpadmin/batchworkdir/cpd-instance-8/model/trained_model.pt
gpfs/myresultfs/cpadmin/batchworkdir/cpd-instance-8/model/trained_model.onnx
...
Note: Use the dlicmd.py tool or the Watson Machine Learning Accelerator REST API to get training result for a trained batch workload. The result will be in zip file, for example: dlpd-model-98543401121871-810703535-cpd-instance-9-result.zip.