Determine the mount point for deep learning experiments

Edit online

Find the path relative to the Watson Machine Learning Accelerator mounting point for Deep Learning Experiments and obtain your trained models.

Model training data and results are located under /gpfs as follows:

Training data: Training data is located in /gpfs/mydatafs/.
Training results: Trained model data is located in /gpfs/myresultsfs/.

If your data is uploaded to NFS, use the following steps to determine the mount point:

Find the PV name:

oc get pvc wmla-mygpfs -o yaml | grep volumeName

For example, the output returns the PV name:

f:volumeName: {}
  volumeName: pvc-2007b8f2-cefb-431c-93bc-5b1c69658e69

Using the PV name, obtain the NFS server

oc get pv pv-name -o yaml | grep " path:"

For example, using PV name pvc-2007b8f2-cefb-431c-93bc-5b1c69658e69:

oc get pv pvc-2007b8f2-cefb-431c-93bc-5b1c69658e69 -o yaml | grep " path:"

The path to the NFS server is returned:

path: /export/share1/wmla-wmla-mygpfs-pvc-2007b8f2-cefb-431c-93bc-5b1c69658e69

Using the path to the NFS server, the relative path is path-to-NFS-server/mydatafs.On the NFS server, the path is:
```
/export/share1/wmla-wmla-mygpfs-pvc-2007b8f2-cefb-431c-93bc-5b1c69658e69/mydatafs
```

If your data is uploaded to a service pod, use the following steps to determine the mount point:

Get the wmla-dlpd pod:

zhao@mbp$ oc get po | grep wmla-dlpd
wmla-dlpd-74d979b5c5-dklgw              2/2     Running            1 (16d ago)     23d

Get the path for the pod:
```
zhao@mbp$ oc exec -it wmla-dlpd-74d979b5c5-dklgw bash
bash-4.4$ ls /gpfs/mydatafs/
```
Here, on the wmla-dlpd service pod, the path is /gpfs/mydatafs/.

Obtain trained models from /gpfs/myresults

To obtain your trained model from the /gpfs/myresults directory, run the following command:

#find gpfs/myresultfs -name 'model'

Sample output:

...
gpfs/myresultfs/cpadmin/batchworkdir/cpd-instance-8/model/trained_model.pt
gpfs/myresultfs/cpadmin/batchworkdir/cpd-instance-8/model/trained_model.onnx
...

Note: Use the dlicmd.py tool or the Watson Machine Learning Accelerator REST API to get training result for a trained batch workload. The result will be in zip file, for example: dlpd-model-98543401121871-810703535-cpd-instance-9-result.zip.