IBM Support

Make NFS shares available on Cloud Pak for Data Runtime Environments

How To


Summary

Cloud pak for Data provides number of ways to import/access data assets into a data science project for analytics activities. Some of the frequently used ways are CSV file uploads and Database connections but often times the requirement is to bring in files which are huge in size or being updated/generated from another service such as IoT sensors. In such situations, uploading such files into projects is simply not possible given the refresh rates or the volumes.

This document aims to provide a solution where if such folders are available as NFS shares then those can be mounted on the runtime env for quick access without needing to perform expensive imports.

Please note that this documen only applies to Cloud Pak for Data 3.0.1 and previous. Cloud Pak for data 3.5 onwards, platform provides a native volume management feature from the product user interface itself.

Objective

Access NFS share from jupyter notebook, In this example,
  •  My Server is ibmnfs.networklayer.com and the share is /IBM/data01/testju/
  • Files which I want to access within notebook environment are kept in /IBM/data01/testju/myFiles

Steps

Step 1 
Create a persistent volume for the existing NFS Share. In this example, /IBM02SEV291130_415/data01/testju is the name of an existing NFS folder which I want to make available on Jupyter Notebook. The storage size should not matter but make sure we have a value there.
  cat << EOF > mypv.yaml    apiVersion: v1  kind: PersistentVolume  metadata:    finalizers:    - kubernetes.io/pv-protection    name: testju-pv  spec:    accessModes:    - ReadWriteMany    capacity:      storage: 1Gi    nfs:      path: /IBM02SEV2930_415/data01/testju      server: fsf-sjc0401f-fz.adn.networklayer.com    persistentVolumeReclaimPolicy: Retain  status:    EOF
 
Run below to create the file
  oc create -f myPV.yaml
Step 2
Now create a persistent volume claim for the volume as below. Lets name is testju-pvc and update the namespace of CPD. Storage value should not matter but its mandatory to put a value.
  cat << EOF > myPVC.yaml    apiVersion: v1  kind: PersistentVolumeClaim  metadata:    annotations:      pv.kubernetes.io/bind-completed: "yes"      pv.kubernetes.io/bound-by-controller: "yes"    finalizers:    - kubernetes.io/pvc-protection    name: testju-pvc    namespace: <NAMESPACE>  spec:    accessModes:    - ReadWriteMany    resources:      requests:        storage: 1Gi    storageClassName: ""    volumeName: testju-pv    EOF
Step 3
Update the runtime definitions for the desired env.
exec into nginx pod to access the configuration file. 
  $ oc rsh $(oc get pods  | grep ibm-nginx  | head -n 1 | awk {'print $1'} )  $
$ edit  /user-home/_global_/config/.runtime-definitions/ibm/jupyter-py36-server.json and add below code block under Volumes section
 
   {        "volume": "myNFS",        "mountPath": "/myNFSShare",        "claimName": "testju-pvc"        "subPath": "myFiles"   },    
 
  • MountPath is the path where your NFS share is mounted on the running jupyter pod. You would be accessing your files from this folder path.
  • ClaimName is the PVC that you created in Step 2
  • subPath is the folder with in the PV (NFS Folder). If this path is non existent, CPD would create a new folder in the PV
Save the edited file.
Step 4
In order to make sure that the changes are not lost, the configuration map should also be edited to reflect the change permanently
oc edit cm runtime-addon-py36
Edit the text under data.jupyter-py36-server.json to add the new claim name
Step 5
Restart the spawner pods
oc delete pod  -l component=spawner-api
Check the logs of the new spawner pods to ensure that there are not errors.
Step 6
The existing jupyter environment either needs to be cleaned up (by deleting their deployments or stopping the running env from the user interface) or their deployments would need to be modified to add the volume manually.
Any new jupyter notebook environment would automatically mount the new volumes. Once started, access the files from location  "/myNFSShare"
image 3779

Additional Information

(1) The files mounted will not be accessible on the asset page since asset page is driven by another mounted folder however if at all the requirement is to see those file as well, a soft link can be created on the pod startup script but if there are new files that gets created, the pod will not be able able to show them unless restarted.
(2)  This mount would be accessible to all the users across projects since in most cases this might be desirable. If only project specific mounts are needed, a mix of pod startup scripts and "subPath": ".../$project_id/..." could be employed.
(3) This document only applies to 3.0.1. Plese refer to 3.5 documentation
      https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/cpd/admin/manage_storage_volumes.html

Document Location

Worldwide

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m50000000ClWMAA0","label":"Analyze->Environments"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
29 January 2021

UID

ibm16214475