Post-installation tasks for the Watson Studio service

To finish setting up the Watson Studio service after installation, complete the appropriate tasks.

Optional tasks

You can perform the following optional tasks to enhance Watson Studio. You must have the appropriate permissions on the OpenShift cluster.

Task	User role
Set the scaling for the service	Project administrator
Set a new limit for the number of projects each user can create	Project administrator
Set the time zone for the master node	System administrator
Enable Visual Studio Code support	Project and System administrator
Installing pre-trained NLP models for Python-based notebook runtimes	Project administrator
Using Livy to connect to a Spark cluster	System administrator

Note: If Watson Studio is installed and running behind a firewall, that firewall must be configured to allow WebSocket connections (`wss://`). Enabling WebSocket connections is required when you're using notebooks and RStudio.

To set a new limit for the number of projects each user can create

By default, there is a limit of 200 projects that each user can create. You can increase or decrease this limit by specifying a new limit value in the Common Core Services operator as a Number or String. Replace <project_limit> with the new limit value, and run:

oc patch ccs ccs-cr -n ${CPD_INSTANCE_PROJECT} --type merge --patch '{"spec":{"projects_created_per_user_limit": <project_limit>}}'

To set the time zone after installing your service

If the service will be installed on a remote machine that runs in a different time zone than the master node, the time zone for the master node is overwritten by the time zone for the installer node. This time zone discrepancy results in scheduled jobs that don’t run at the correct time.

Edit the timezone configmap, and then change the time zone string to the cluster time zone.
Modify data.masterTimezone in the configmap and use the following command:
```
oc edit configmap timezone
```
and add the tz database code format associated with the master node time zone.

Note: If you are using Red Hat OpenShift Container Platform 4.x, use the Coordinated Universal Time (UTC) time zone as the value of the `data.masterTimezone` in the configmap.
If there is a pre-existing schedule, go to the Job details page in the UI and edit the schedule once to pick up the updated time zone.

To enable Visual Studio Code support

VS Code support is not enabled by default, because it has additional requirements regarding storage. Support for VS Code can be enabled by patching the ws CR:

oc patch -n ${PROJECT_CPD_INST_OPERANDS} ws/ws-cr --type=merge --patch '{"spec":{"tools":{"enable_vscode":true,"storage_size":"5Gi"}}}'

Important: The sizing of the PVC depends on the number of users that want to access their runtime by using VS Code. Allocate at least 500-600 MB per user, to make sufficient space available for installed extensions.

To check if patching is finished, run this command:

cpd-cli manage get-cr-status \
--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
--components=ws

It should take between 5 to 10 minutes for all required changes to be performed.

Updating existing custom runtime definitions (earlier than 4.7)

If you are using custom images or a custom runtime definition for JupyterLab, update all manually created runtime definitions to add the following entries to the volumes field

{
    "volume": "tools-data",
    "mountPath": "/tools-data",
    "claimName": "tools-data-pvc",
    "subPath": "users/$user_id",
    "optional": true
},
{
   "volume": "axshelld",
  "type": "secret",
  "mountPath": "/etc/cp4d/keys",
  "secret": {
    "defaultMode": 420,
    "secretName": "ax-shelld-secret"
  },
    "optional": true
}

Update the runtime definitions using the Cloud Pak for Data API:

To generate the required platform access token see Generating an API authorization token.

List the IDs and names of all available runtime definitions on the cluster:

curl -k -X GET -H "Authorization: ZenApiKey ${TOKEN}" "$cpd_url/v2/runtime_definitions" | jq -r '.resources[] | .metadata.guid + " " + .entity.name'

Find the IDs and names of the runtime definition that need to be changed, then patch the content of the runtime definition and store it as a JSON file:

myRuntimeDefinition=runtime-22.1-py3.9
myRuntimeDefinitionID=0d76d114-c66d-5216-8442-2461b84af0da
curl -k -X GET -H "Authorization: ZenApiKey ${TOKEN}" "$cpd_url/v2/runtime_definitions/$myRuntimeDefinitionID?include=launch_configuration" | jq  '.entity.launch_configuration.volumes |= . + [{"volume": "tools-data","mountPath": "/tools-data","claimName": "tools-data-pvc","subPath": "users/$user_id","optional": true}, {"volume": "axshelld","type": "secret","mountPath": "/etc/cp4d/keys","secret": {"defaultMode": 420,"secretName": "ax-shelld-secret"},"optional": true}]'

Update the runtime definition by using the Cloud Pak for Data API:

curl -k -X PUT -H "Authorization: ZenApiKey ${TOKEN}" -H "Content-Type:application/json" "$cpd_url/v2/runtime_definitions/$myRuntimeDefinitionID" -d @./$myRuntimeDefinition.json

Installing pre-trained NLP models for Python-based notebook runtimes

Runtime 23.1 on Python 3.10 is installed by default with Watson Studio on Cloud Pak for Data 4.8.

You can optionally install the pre-trained NLP models for the Watson Natural Language Processing library by running the following command:

oc patch -n ${PROJECT_CPD_INST_OPERANDS} NotebookRuntime ibm-cpd-ws-runtime-231-py --type=merge --patch '{"spec":{"install_nlp_models":true}}'

Pre-trained NLP models are also supported with these optional runtimes:

ibm-cpd-ws-runtime-231-pygpu
ibm-cpd-ws-runtime-222-py
ibm-cpd-ws-runtime-222-pygpu

The pre-trained NLP models are large and take time to install. Use the following command to check the status of the notebook runtimes:

oc get -n ${PROJECT_CPD_INST_OPERANDS} NotebookRuntime

The pre-trained NLP models are available only when the status column for the notebook runtimes changes to Completed.

Using Livy to connect to a Spark cluster

If you need to use livy to connect to a Spark cluster that is FIPS-enabled, you must load the digest package prior to loading the sparklyr package. To load the digest package, run the following command:

Important: If you load the digest package, RStudio will no longer be FIPS compliant.

library(digest, lib.loc='/opt/not-FIPS-compliant/R/library')
library(sparklyr)

Parent topic: IBM Watson Studio