Deploy DSX Local into IBM Cloud Private

You can deploy IBM Data Science Experience Local into IBM Cloud Private, IBM's new Kubernetes-based private cloud, by using the IBM Cloud Private CLI.

Steps to complete:

Verify software requirements
Install IBM Cloud Private
Install DSX Local on the ICP catalog
Set up storage
Configure DSX Local on ICP
Deploy DSX Local on ICP

You can also uninstall DSX Local from ICP.

Verify software requirements

DSX Local is delivered as an integrated set of pods and Kubernetes services. DSX pods use kube-dns to discover each other by a fixed name, so it is important that each independent copy of DSX gets deployed in a separate Kube namespace.

Software requirements:

IBM Cloud Private Version 2.1 or later (GlusterFS requires Version 2.1.0.2 or later)
IBM Cloud Private CLI
Kubectl command line
Shared storage (GlusterFS or NFS)
Three worker nodes (minimum 8 cores with 32 GB memory)

Install IBM Cloud Private

To install IBM Cloud Private and IBM Cloud Private CLI, see Installing IBM Cloud Private and Installing the IBM Cloud Private CLI.

Install DSX Local on the ICP catalog

To install DSX Local on the ICP catalog, complete the following steps:

Enter the following command to ensure that bx pr and docker are authenticated:

bx pr login -a https://<cluster_ip>:8443 --skip-ssl-validation
docker login <cluster_name>:8500

In the IBM Cloud Private CLI tool, enter the following command:
```
shell
bx pr load-ppa-archive --archive dsx-icp.tar.gz
```
where dsx-icp.tar.gz represents the DSX Local installation TAR file.
Go to Manage > Helm Repositories and click Sync Repositories.
Go to Catalog > Helm Charts and verify that the ibm-dsx-prod chart now displays.
In the IBM Cloud Private App Center, select the user and click Configure Client to configure kubectl. Copy and paste into your terminal. The user must have administrator privileges for the following actions.
Set the DSX Local images scope to global by entering the following command with kubectl authenticated:
```
for image in $(kubectl get images | tail -n +2 |
awk '{ print $1; }'); do kubectl get image $image -o yaml | sed
's/scope: namespace/scope: global/' | kubectl apply -f -; done
```
Ignore the following warning if you see it: Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply.

Set up storage

Use only one of the following storage options.

Dynamic provisioning

If you are using storage dynamic provisioning with GlusterFS, simply ensure that the appropriate storage class exists. This can be checked with:

kubectl get storageclasses | grep
glusterfs

. If nothing shows up, then consult your cluster administrator about the availability of GlusterFS.

NFS storage

Alternatively, if you are using NFS as the storage type, then you must set up the Persistent Volumes (PVs) using the following information:

NFS server IP address
NFS mount path

NFS also requires you to create five directories in the NFS mount path:

cloudant
redis
spark-metrics
user-home
influxdb

In the IBM Cloud Private App Center, go to Platform > Storage and create the following PVs with this information:

Cloudant:

General Tab:

Name	Capacity	Access Mode	Storage Type
cloudant-repo-pv	10Gi	Read write many	NFS

Labels:

Name	Value
assign-to	<namespace>-cloudant

Parameters:

Key	Value
server	NFS_SERVER_IP
path	NFS_MOUNT_PATH/cloudant

Redis:

General:

Name	Capacity	Access Mode	Storage Type
redis-repo-pv	10Gi	Read write many	NFS

Labels:

Name	Value
assign-to	<namespace>-redis

Parameters:

Key	Value
server	NFS_SERVER_IP
path	NFS_MOUNT_PATH/redis

Spark Metrics:

General:

Name	Capacity	Access Mode	Storage Type
spark-metrics-pv	50Gi	Read write many	NFS

Labels:

Name	Value
assign-to	<namespace>-spark-metrics

Parameters:

Key	Value
server	NFS_SERVER_IP
path	NFS_MOUNT_PATH/spark-metrics

User Home:

The size of these PVs should adapt to your needs: 100Gi is the minimum size that you should have, and 1TB is the recommended size.

General:

Name	Capacity	Access Mode	Storage Type
user-home-pv	100Gi	Read write many	NFS

Labels:

Name	Value
assign-to	<namespace>-user-home

Parameters:

Key	Value
server	NFS_SERVER_IP
path	NFS_MOUNT_PATH/user-home

Influxdb:

General:

Name	Capacity	Access Mode	Storage Type
influxdb-pv	10Gi	Read write many	NFS

Labels:

Name	Value
assign-to	<namespace>-influxdb

Parameters:

Key	Value
server	NFS_SERVER_IP
path	NFS_MOUNT_PATH/influxdb

Configure DSX Local on ICP

To configure DSX Local on ICP, complete the following steps:

In the IBM Cloud Private App Center, go to Manage > Namespaces and click Create Namespace to create a namespace for DSX Local. Ensure the namespace does not exceed 12 characters.
Go to Catalog > Helm Charts, and select ibm-dsx-prod, and select Configure.
Name the release in Release Name. For the Namespace to deploy to, select the namespace you created.
Recommendation: Set the correct amount of worker nodes with the runtimes.workerNodes parameter, and ensure the runtimes.preloadRuntimes is set to true or checked. This will speed up the launch of notebooks after deployment by preloading notebooks images in each worker node.
Changing any of the following values is entirely optional. If using Dynamic Provisioning for storage, check the persistence parameters.

Common parameters

Parameter	Description	Default Value
image.pullPolicy	Image Pull Policy	IfNotPresent
persistence.useDynamicProvisioning	Use Dynamic PV Provisioning	false
persistence.storageClassName	StorageClass to use for the PVs	(None)
dsxservice.externalPort	Port where DSX Local is exposed	31843
sparkContainer.workerReplicas	Count of spark worker replicas	3
runtimes.workerNodes	Number of worker nodes	3
runtimes.preloadRuntimes	Should runtime images be preloaded	true

Persistence Parameters

If persistence.useDynamicProvisioning has been set to true, the .storageClassName must be set to the appropriate one, unless the default StorageClass provides Dynamic Provisioning already.

If using NFS, the persistence.size of each should match what was created in the previous step if you are not using Dynamic provisioning.

Prefix/Suffix	name	persistence.existingClaimName	persistence.size
userHomePVC	user-home-pvc	(None)	100Gi
cloudantSrvPVC	cloudant-srv-mount	(None)	10Gi
redisPVC	redis-mount	(None)	10Gi
sparkMetricsPVC	spark-metrics-pvc	(None)	50Gi

Description:

*.name The name of the PVC.
*.persistence.existingClaimName Use an already existing PVC.
*.persistence.size The minimum size of the persistent volume to attach to/request.

Containers Parameters

Image Parameters

Default parameters values for the images and tag to use in each container in the format <prefix>.<suffix>. This should not be modified, unless there is a specific reason to do so.

Prefix/Suffix	image.repository	image.tag
cloudantRepo	privatecloud-cloudant-repo	v3.13.428
dsxConnectionBack	dsx-connection-back	1.0.4
dsxCore	dsx-core	v3.13.10
dsxScriptedML	privatecloud-dsx-scripted-ml	v0.01.2
filemgmt	filemgmt	1.0.2
hdpzeppelinDsxD8a2ls2x	hdpzeppelin-dsx-d8a2ls2x	v1.0.10
jupyterDsxD8a2ls2x	jupyter-dsx-d8a2ls2x	v1.0.11
jupyterDsxD8a3ls2x	jupyter-dsx-d8a3ls2x	v1.0.7
jupyterGpuPy35	jupyter-gpu-py35	v1.0.9
mlOnlineScoring	privatecloud-ml-online-scoring	v3.13.6
mlPipelinesApi	privatecloud-ml-pipelines-api	v3.13.4
mllib	ml-libs	v3.13.30
nginxRepo	privatecloud-nginx-repo	v3.13.6
pipeline	privatecloud-pipeline	v3.13.3
portalMachineLearning	privatecloud-portal-machine-learning	v3.13.20
portalMlaas	privatecloud-portal-mlaas	v3.13.17
redisRepo	privatecloud-redis-repo	v3.13.431
repository	privatecloud-repository	v3.13.2
rstudio	privatecloud-rstudio	v3.13.8
spark	spark	1.5.1
sparkClient	spark-client	v1.0.2
sparkaasApi	sparkaas-api	v1.3.14
spawnerApiK8s	privatecloud-spawner-api-k8s	v3.13.5
usermgmt	privatecloud-usermgmt	v3.13.5
utilsApi	privatecloud-utils-api	v3.13.5
wmlBatchScoring	wml-batch-scoring	v3.13.2
wmlIngestion	privatecloud-wml-ingestion	v3.13.2

Resources Parameters

Default parameters values for the CPU and memory to use in each container in the format <prefix>.<suffix>

Prefix/Suffix	resources.requests.cpu	resources.limits.cpu	resources.requests.memory	resources.limits.memory
cloudantRepo	500m	1000m	1024Mi	2048Mi
dsxConnectionBack	500m	1000m	128Mi	256Mi
dsxCore	1000m	2000m	512Mi	1024Mi
filemgmt	500m	1000m	256Mi	512Mi
mlOnlineScoring	200m	500m	1024Mi	2048Mi
mlPipelinesApi	200m	500m	128Mi	256Mi
nginxRepo	500m	1000m	256Mi	512Mi
pipeline	200m	500m	1024Mi	2048Mi
portalMachineLearning	100m	300m	128Mi	512Mi
portalMlaas	100m	300m	256Mi	512Mi
redisRepo	500m	1000m	256Mi	512Mi
repository	100m	300m	512Mi	1024Mi
spark	500m	1000m	2048Mi	4096Mi
spawnerApiK8s	200m	500m	128Mi	256Mi
usermgmt	500m	1000m	256Mi	512Mi
wmlBatchScoring	100m	300m	512Mi	1024Mi
wmlIngestion	100m	300m	512Mi	1024Mi

Replicas Parameters

In addition to the number of spark workers, there are some services that also offer the option to have several instance of the same service running for High Availability (HA). This can be adjusted depending on workload and resources available:

Prefix/Suffix	replicas	Description
dsxConnectionBack	3	Connection to additional services
dsxCore	3	Main Webapp portal
mlPipelinesApi	2	Machine Learning Pipeline
nginxRepo	3	Main Proxy that gets exposed
portalMachineLearning	2	Projects Machine Learning portal
portalMlaas	2	Published Machine Learning portal
usermgmt	2	User management services

Deploy DSX Local on ICP

Complete the deployments and when cloudant, redis, usermgmt, dsx-core, and ibm-nginx are up and ready, you can access the DSX Local client UI by visiting: https://MASTER_NODE_IP:dsxservice.externalPort/.

Recommendation: Close the dialog and do not click See releases, as this will reset the configuration to the default values if you exit this screen.

When the namespace has been deployed, you can check the progress in the IBM Cloud Private App Center at Workloads > Deployments.

See IBM Data Science Experience Local for more documentation on how to use DSX Local.

Uninstall DSX Local from ICP

To uninstall DSX Local from ICP, complete the following steps:

In the IBM Cloud Private App Center, go to Workloads > Helm Releases and delete the DSX Local helm release. If you get an error then omit it and refresh the page. Wait a few minutes for the components of DSX Local to be removed.
Because runtimes and jobs created post-DSX deployment are not deleted with the release, they need to be removed manually. From the Workloads page, delete the remaining deployments, jobs, and cronjobs belonging to the namespace to which DSX Local was installed.
If using NFS as the storage, go to your NFS server and delete the content of the directories that were used for the PVs.
In some cases, some of the PVs and PVCs might remain behind. Go to Manage > Storage and delete any remaining PVs and PVCs that were associated with DSX.