Deploy DSX Local into IBM Cloud Private
You can deploy IBM Data Science Experience Local into IBM Cloud Private, IBM's new Kubernetes-based private cloud, by using the IBM Cloud Private CLI.
Steps to complete:
- Verify software requirements
- Install IBM Cloud Private
- Install DSX Local on the ICP catalog
- Set up storage
- Configure DSX Local on ICP
- Deploy DSX Local on ICP
You can also uninstall DSX Local from ICP.
Verify software requirements
DSX Local is delivered as an integrated set of pods and Kubernetes services. DSX pods use
kube-dns to discover each other by a fixed name, so it is important that each
independent copy of DSX gets deployed in a separate Kube namespace.
Software requirements:
- IBM Cloud Private Version 2.1 or later (GlusterFS requires Version 2.1.0.2 or later)
- IBM Cloud Private CLI
- Kubectl command line
- Shared storage (GlusterFS or NFS)
- Three worker nodes (minimum 8 cores with 32 GB memory)
Install IBM Cloud Private
To install IBM Cloud Private and IBM Cloud Private CLI, see Installing IBM Cloud Private and Installing the IBM Cloud Private CLI.
Install DSX Local on the ICP catalog
To install DSX Local on the ICP catalog, complete the following steps:
- Enter the following command to ensure that
bx pranddockerare authenticated:bx pr login -a https://<cluster_ip>:8443 --skip-ssl-validation docker login <cluster_name>:8500 - In the IBM Cloud Private CLI tool, enter the following
command:
shell bx pr load-ppa-archive --archive dsx-icp.tar.gzwhere
dsx-icp.tar.gzrepresents the DSX Local installation TAR file. - Go to and click Sync Repositories.
- Go to and verify that the
ibm-dsx-prodchart now displays. - In the IBM Cloud Private App Center, select the user and click Configure
Client to configure
kubectl. Copy and paste into your terminal. The user must have administrator privileges for the following actions. - Set the DSX Local images scope to
globalby entering the following command withkubectlauthenticated:for image in $(kubectl get images | tail -n +2 | awk '{ print $1; }'); do kubectl get image $image -o yaml | sed 's/scope: namespace/scope: global/' | kubectl apply -f -; doneIgnore the following warning if you see it: Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply.
Set up storage
Use only one of the following storage options.
- Dynamic provisioning
- If you are using storage dynamic provisioning with GlusterFS, simply ensure that the appropriate
storage class exists. This can be checked with:
kubectl get storageclasses | grep glusterfs. If nothing shows up, then consult your cluster administrator about the availability of GlusterFS. - NFS storage
-
Alternatively, if you are using NFS as the storage type, then you must set up the Persistent Volumes (PVs) using the following information:
- NFS server IP address
- NFS mount path
NFS also requires you to create five directories in the NFS mount path:
- cloudant
- redis
- spark-metrics
- user-home
- influxdb
In the IBM Cloud Private App Center, go to and create the following PVs with this information:
Cloudant:
General Tab:
| Name | Capacity | Access Mode | Storage Type |
|---|---|---|---|
| cloudant-repo-pv | 10Gi | Read write many | NFS |
Labels:
| Name | Value |
|---|---|
| assign-to | <namespace>-cloudant |
Parameters:
| Key | Value |
|---|---|
| server | NFS_SERVER_IP |
| path | NFS_MOUNT_PATH/cloudant |
Redis:
General:
| Name | Capacity | Access Mode | Storage Type |
|---|---|---|---|
| redis-repo-pv | 10Gi | Read write many | NFS |
Labels:
| Name | Value |
|---|---|
| assign-to | <namespace>-redis |
Parameters:
| Key | Value |
|---|---|
| server | NFS_SERVER_IP |
| path | NFS_MOUNT_PATH/redis |
Spark Metrics:
General:
| Name | Capacity | Access Mode | Storage Type |
|---|---|---|---|
| spark-metrics-pv | 50Gi | Read write many | NFS |
Labels:
| Name | Value |
|---|---|
| assign-to | <namespace>-spark-metrics |
Parameters:
| Key | Value |
|---|---|
| server | NFS_SERVER_IP |
| path | NFS_MOUNT_PATH/spark-metrics |
User Home:
The size of these PVs should adapt to your needs: 100Gi is the minimum size that you should have, and 1TB is the recommended size.
General:
| Name | Capacity | Access Mode | Storage Type |
|---|---|---|---|
| user-home-pv | 100Gi | Read write many | NFS |
Labels:
| Name | Value |
|---|---|
| assign-to | <namespace>-user-home |
Parameters:
| Key | Value |
|---|---|
| server | NFS_SERVER_IP |
| path | NFS_MOUNT_PATH/user-home |
Influxdb:
General:
| Name | Capacity | Access Mode | Storage Type |
|---|---|---|---|
| influxdb-pv | 10Gi | Read write many | NFS |
Labels:
| Name | Value |
|---|---|
| assign-to | <namespace>-influxdb |
Parameters:
| Key | Value |
|---|---|
| server | NFS_SERVER_IP |
| path | NFS_MOUNT_PATH/influxdb |
Configure DSX Local on ICP
To configure DSX Local on ICP, complete the following steps:
- In the IBM Cloud Private App Center, go to and click Create Namespace to create a namespace for DSX Local. Ensure the namespace does not exceed 12 characters.
- Go to , and select ibm-dsx-prod, and select Configure.
- Name the release in
Release Name. For theNamespaceto deploy to, select the namespace you created. - Recommendation: Set the correct amount of worker nodes with the
runtimes.workerNodesparameter, and ensure theruntimes.preloadRuntimesis set to true or checked. This will speed up the launch of notebooks after deployment by preloading notebooks images in each worker node. - Changing any of the following values is entirely optional. If using Dynamic Provisioning for storage, check the persistence parameters.
Common parameters
| Parameter | Description | Default Value |
|---|---|---|
| image.pullPolicy | Image Pull Policy | IfNotPresent |
| persistence.useDynamicProvisioning | Use Dynamic PV Provisioning | false |
| persistence.storageClassName | StorageClass to use for the PVs | (None) |
| dsxservice.externalPort | Port where DSX Local is exposed | 31843 |
| sparkContainer.workerReplicas | Count of spark worker replicas | 3 |
| runtimes.workerNodes | Number of worker nodes | 3 |
| runtimes.preloadRuntimes | Should runtime images be preloaded | true |
Persistence Parameters
If persistence.useDynamicProvisioning has been set to true, the
.storageClassName must be set to the appropriate one, unless the
default StorageClass provides Dynamic Provisioning already.
If using NFS, the persistence.size of each should match what was created in the
previous step if you are not using Dynamic provisioning.
| Prefix/Suffix | name | persistence.existingClaimName | persistence.size |
|---|---|---|---|
| userHomePVC | user-home-pvc | (None) | 100Gi |
| cloudantSrvPVC | cloudant-srv-mount | (None) | 10Gi |
| redisPVC | redis-mount | (None) | 10Gi |
| sparkMetricsPVC | spark-metrics-pvc | (None) | 50Gi |
Description:
-
*.nameThe name of the PVC. -
*.persistence.existingClaimNameUse an already existing PVC. -
*.persistence.sizeThe minimum size of the persistent volume to attach to/request.
Containers Parameters
Image Parameters
Default parameters values for the images and tag to use in each container in the format
<prefix>.<suffix>. This should not be modified, unless there is a
specific reason to do so.
| Prefix/Suffix | image.repository | image.tag |
|---|---|---|
| cloudantRepo | privatecloud-cloudant-repo | v3.13.428 |
| dsxConnectionBack | dsx-connection-back | 1.0.4 |
| dsxCore | dsx-core | v3.13.10 |
| dsxScriptedML | privatecloud-dsx-scripted-ml | v0.01.2 |
| filemgmt | filemgmt | 1.0.2 |
| hdpzeppelinDsxD8a2ls2x | hdpzeppelin-dsx-d8a2ls2x | v1.0.10 |
| jupyterDsxD8a2ls2x | jupyter-dsx-d8a2ls2x | v1.0.11 |
| jupyterDsxD8a3ls2x | jupyter-dsx-d8a3ls2x | v1.0.7 |
| jupyterGpuPy35 | jupyter-gpu-py35 | v1.0.9 |
| mlOnlineScoring | privatecloud-ml-online-scoring | v3.13.6 |
| mlPipelinesApi | privatecloud-ml-pipelines-api | v3.13.4 |
| mllib | ml-libs | v3.13.30 |
| nginxRepo | privatecloud-nginx-repo | v3.13.6 |
| pipeline | privatecloud-pipeline | v3.13.3 |
| portalMachineLearning | privatecloud-portal-machine-learning | v3.13.20 |
| portalMlaas | privatecloud-portal-mlaas | v3.13.17 |
| redisRepo | privatecloud-redis-repo | v3.13.431 |
| repository | privatecloud-repository | v3.13.2 |
| rstudio | privatecloud-rstudio | v3.13.8 |
| spark | spark | 1.5.1 |
| sparkClient | spark-client | v1.0.2 |
| sparkaasApi | sparkaas-api | v1.3.14 |
| spawnerApiK8s | privatecloud-spawner-api-k8s | v3.13.5 |
| usermgmt | privatecloud-usermgmt | v3.13.5 |
| utilsApi | privatecloud-utils-api | v3.13.5 |
| wmlBatchScoring | wml-batch-scoring | v3.13.2 |
| wmlIngestion | privatecloud-wml-ingestion | v3.13.2 |
Resources Parameters
Default parameters values for the CPU and memory to use in each container in the format
<prefix>.<suffix>
| Prefix/Suffix | resources.requests.cpu | resources.limits.cpu | resources.requests.memory | resources.limits.memory |
|---|---|---|---|---|
| cloudantRepo | 500m | 1000m | 1024Mi | 2048Mi |
| dsxConnectionBack | 500m | 1000m | 128Mi | 256Mi |
| dsxCore | 1000m | 2000m | 512Mi | 1024Mi |
| filemgmt | 500m | 1000m | 256Mi | 512Mi |
| mlOnlineScoring | 200m | 500m | 1024Mi | 2048Mi |
| mlPipelinesApi | 200m | 500m | 128Mi | 256Mi |
| nginxRepo | 500m | 1000m | 256Mi | 512Mi |
| pipeline | 200m | 500m | 1024Mi | 2048Mi |
| portalMachineLearning | 100m | 300m | 128Mi | 512Mi |
| portalMlaas | 100m | 300m | 256Mi | 512Mi |
| redisRepo | 500m | 1000m | 256Mi | 512Mi |
| repository | 100m | 300m | 512Mi | 1024Mi |
| spark | 500m | 1000m | 2048Mi | 4096Mi |
| spawnerApiK8s | 200m | 500m | 128Mi | 256Mi |
| usermgmt | 500m | 1000m | 256Mi | 512Mi |
| wmlBatchScoring | 100m | 300m | 512Mi | 1024Mi |
| wmlIngestion | 100m | 300m | 512Mi | 1024Mi |
Replicas Parameters
In addition to the number of spark workers, there are some services that also offer the option to have several instance of the same service running for High Availability (HA). This can be adjusted depending on workload and resources available:
| Prefix/Suffix | replicas | Description |
|---|---|---|
| dsxConnectionBack | 3 | Connection to additional services |
| dsxCore | 3 | Main Webapp portal |
| mlPipelinesApi | 2 | Machine Learning Pipeline |
| nginxRepo | 3 | Main Proxy that gets exposed |
| portalMachineLearning | 2 | Projects Machine Learning portal |
| portalMlaas | 2 | Published Machine Learning portal |
| usermgmt | 2 | User management services |
Deploy DSX Local on ICP
Complete the deployments and when cloudant, redis,
usermgmt, dsx-core, and ibm-nginx are up and
ready, you can access the DSX Local client UI by visiting:
https://MASTER_NODE_IP:dsxservice.externalPort/.
When the namespace has been deployed, you can check the progress in the IBM Cloud Private App Center at .
See IBM Data Science Experience Local for more documentation on how to use DSX Local.
Uninstall DSX Local from ICP
To uninstall DSX Local from ICP, complete the following steps:
- In the IBM Cloud Private App Center, go to and delete the DSX Local helm release. If you get an error then omit it and refresh the page. Wait a few minutes for the components of DSX Local to be removed.
- Because runtimes and jobs created post-DSX deployment are not deleted with the release, they need to be removed manually. From the Workloads page, delete the remaining deployments, jobs, and cronjobs belonging to the namespace to which DSX Local was installed.
- If using NFS as the storage, go to your NFS server and delete the content of the directories that were used for the PVs.
- In some cases, some of the PVs and PVCs might remain behind. Go to and delete any remaining PVs and PVCs that were associated with DSX.