Upgrading Watson OpenScale
A project administrator can upgrade the Watson OpenScale service on IBM® Cloud Pak for Data.
Before you begin
Required role: To complete this task, you must be an administrator of the project (namespace) where Watson OpenScale is installed.
Before you upgrade Watson OpenScale, ensure that:
- The Cloud Pak for Data control plane is already upgraded on your Red Hat® OpenShift® cluster. For details, see Upgrading IBM Cloud Pak for Data.
- A cluster administrator has completed the steps in Preparing to upgrade Watson OpenScale.
- Watson OpenScale is backed up. For details, see Backing up and restoring Watson OpenScale.
- The cluster meets the minimum requirements for Watson OpenScale. For details, see System requirements for services.
- You completed the steps in Preparing to install and upgrade services.
If you are upgrading multiple services on your cluster, you must run the upgrades one at a time and wait until the upgrade completes before upgrading another service. You cannot run the upgrades in parallel.
./cpd-cli upgrade --help
Procedure
- Complete the appropriate steps to upgrade Watson OpenScale on your environment:
- Verifying that the upgrade completed successfully
- Checking for available patches
- Upgrading existing service instances
- Complete the tasks listed in What to do next
Upgrading on clusters connected to the internet
From your installation node:
- Change to the directory where you placed the Cloud Pak for Data command-line interface and the repo.yaml file.
- Log in to your Red Hat OpenShift cluster as a project
administrator:
oc login OpenShift_URL:port
- Run the
following command to see a preview of what will change when you upgrade the
service.Important: If you are using the internal Red Hat OpenShift registry and you are using the default self-signed certificate, specify the
--insecure-skip-tls-verify
flag to prevent x509 errors../cpd-cli upgrade \ --repo ./repo.yaml \ --assembly aiopenscale \ --arch Cluster_architecture \ --namespace Project \ --storageclass Storage_class_name \ --transfer-image-to Registry_location \ --cluster-pull-prefix Registry_from_cluster \ --ask-pull-registry-credentials \ --ask-push-registry-credentials \ --latest-dependency \ --dry-run
Important: By default, this command gets the latest assembly. If you want to upgrade to a specific version of Watson OpenScale, add the following line to your command after the--assembly
flag:--version Assembly_version \
The
--latest-dependency
flag gets the latest version of the dependent assemblies. If you remove the--latest-dependency
flag, the installer will either leave the dependent assemblies at the current version or get the minimum version of the dependent assemblies.Ensure that you use the same flags that your cluster administrator used when they completed Preparing to upgrade Watson OpenScale. If your cluster administrator used the
--version
flag, ensure that you specify the same version of the assembly.Replace the following values:
Variable Replace with Assembly_version The version of Watson OpenScale that you want to install. The assembly versions are listed in System requirements for services.Cluster_architecture Specify the architecture of your cluster hardware: - For x86-64 hardware, remove this flag or specify x86_64
- For POWER hardware, specify ppc64le
Project Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services. Storage_class_name Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services. Refresh 2 or later If you are using the 3.5.2 version of the
cpd-cli
, remove the--storageclass
flag from your command. Thecpd-cli upgrade
command uses the storage class that was specified during installation.Registry_location Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services. Registry_from_cluster Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services. - Rerun the previous command without the
--dry-run
flag to upgrade the service.
Upgrading on air-gapped clusters
From your installation node:
- Change to the directory where you placed the Cloud Pak for Data command-line interface.
- Log in to your Red Hat OpenShift cluster as a project
administrator:
oc login OpenShift_URL:port
- Run
the following command to see a preview of what will change when you upgrade the
service.Important: If you are using the internal Red Hat OpenShift registry:
- Do not specify the
--ask-pull-registry-credentials
parameter. - If you are using the default self-signed certificate, specify the
--insecure-skip-tls-verify
flag to prevent x509 errors.
./cpd-cli upgrade \ --assembly aiopenscale \ --arch Cluster_architecture \ --namespace Project \ --storageclass Storage_class_name \ --cluster-pull-prefix Registry_from_cluster \ --ask-pull-registry-credentials \ --load-from Image_directory_location \ --latest-dependency \ --dry-run
Note: If the assembly was downloaded using thedelta-images
command, remove the--latest-dependency
flag from the command. If you don't remove the--latest-dependency
flag you will get an error indicating that the flag cannot be used.Ask your cluster administrator whether they specified the
--latest-dependency
flag when they completed Preparing to upgrade Watson OpenScale. If they ran theadm
command with the--latest-dependency
flag, you must also run theinstall
command with the flag.Replace the following values:
Variable Replace with Cluster_architecture Specify the architecture of your cluster hardware: - For x86-64 hardware, remove this flag or specify x86_64
- For POWER hardware, specify ppc64le
Project Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services. Storage_class_name Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services. Refresh 2 or later If you are using the 3.5.2 version of the
cpd-cli
, remove the--storageclass
flag from your command. Thecpd-cli upgrade
command uses the storage class that was specified during installation.Registry_from_cluster Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services. Image_directory_location Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services. - Do not specify the
- Rerun the previous command without the
--dry-run
flag to upgrade the service.
Verifying that the upgrade completed successfully
From your installation node:
- Run the following
command:
./cpd-cli status \ --assembly aiopenscale \ --namespace Project
Replace Project with the value you used in the preceding commands.
- If the upgrade completed successfully, the status of the assembly and the modules in the assembly is Ready.
- If the upgrade failed, contact IBM Support for assistance.
Checking for available patches
Determine whether there are any patches available the version of Watson OpenScale that you installed:
- Clusters connected to the internet
- Run the following command to check for
patches:
./cpd-cli status \ --repo ./repo.yaml \ --namespace Project \ --assembly aiopenscale \ --patches \ --available-updates
- Air-gapped clusters
- See the list of Available patches for Watson OpenScale.
If you need to apply patches to the service, follow the guidance in Applying patches.
Upgrading existing service instances
After you upgrade Watson OpenScale, the service instances that are associated with the installation must also be upgraded. This task must be completed by a Cloud Pak for Data administrator or a service instance administrator.
To upgrade service instances, you must have a
cpd-cli
profile on your local machine. Ensure that your profile points to the
instance of Cloud Pak for Data where the service instances exist.
Your profile enables Cloud Pak for Data to ensure that you have the
appropriate permissions to upgrade the service instances. For details, see Creating a cpd-cli profile.
./cpd-cli service-instance upgrade --help
From your installation node:
- Run the following command to see the list of
service
instances:
./cpd-cli service-instance list \ --profile Profile_name
Replace Profile_name with the name of your local profile.
The command returns a list of all of the service instances that you have access to.
- Run the following command to upgrade all of
the service
instances:
./cpd-cli service-instance upgrade \ --service-type service-type \ --profile Profile_name \ --watch \ --all
Replace Profile_name with the name of your local profile.
- Verify that the service instances were updated
and are ready to use:
- Run the following command to see a list
of the service
instances:
./cpd-cli service-instance list \ --service-type service-type \ --profile Profile_name
Replace Profile_name with the name of your local profile.
The command returns a list of all of the service-type service instances that you have access to.
- Run the following command for
each service instance to verify that the instance is ready to
use:
./cpd-cli service-instance status Instance_name\ --profile Profile_name
Replace the following values:
Variable Replace with Instance_name The name of the service instance for which you want to see the status. Profile_name The name of your local profile. Confirm that the service status is
Running
.
- Run the following command to see a list
of the service
instances:
Verifying migration and mitigating exceptions
- Retrieve migration logs.
- Use the
ssh root@<SERVER>
command to log in to the cluster infrastructure node. For example, enter the following command:ssh root@islapr36-inf.fyre.ibm.com
- To log in to the namespace as the kubernetes admin, run the following
command:
oc login https://<SERVER>:6443 -u kubeadmin -p <PASSWORD> --insecure-skip-tls-verify=true -n namespace1
- Find the migration job
pod:
POD=`oc get pod -n namespace1 | grep aiopenscale-ibm-aios-post-upgrade-migration-job | awk '{print $1}'`
- Get the pod log into file migration_logs.log by running the following command
oc logs -n namespace1 $POD > migration_logs.log
- Use the
- Check migration logs for failures.
- Open the migration_logs.log file and make sure that there are no exception or exception stacktraces.
- Look for a stacktrace or message that indicates migration failed for one or more subscription.
For example, the following example shows a stacktrace that records an
error:
Traceback (most recent call last): File "/opt/ibm/migration/scripts/sql/versions/20191119_001_fairness_v2.py", line 614, in __get_fairness_metrics_for_dates metrics = __get_response(token, metrics_url) File "/opt/ibm/migration/scripts/sql/versions/20191119_001_fairness_v2.py", line 644, in __get_response response = json.loads(response.text)
- Look for any message that starts with
Failed
orException
. - Look for text that reads
Rolling back for
.
Note: You can ignore any log message that starts withFailed to enable payload logging
. - If there were intermittent failures, you must re-run the migration job.
- Use the
ssh root@<SERVER>
command to log in to the cluster infrastructure node. For example, enter the following command:ssh root@islapr36-inf.fyre.ibm.com
- To log in to the namespace as the kubernetes admin, run the following
command:
oc login https://<SERVER>:6443 -u kubeadmin -p <PASSWORD> --insecure-skip-tls-verify=true -n namespace1
- Find the aios migration job aiopenscale-ibm-aios-post-upgrade-migration-job by running the
following
command:
kubectl get job aiopenscale-ibm-aios-post-upgrade-migration-job -o yaml > aiopenscale-ibm-aios-post-upgrade-migration-job.yaml
- Edit the aiopenscale-ibm-aios-post-upgrade-migration-job.yaml file to
remove the following lines:
metadata.uid
metadata.resourceVersion
spec.selector
controller-uid
- the lines that follow the
status:
line, including thestatus:
line itself
- Edit the aiopenscale-ibm-aios-post-upgrade-migration-job.yaml file to
insert or prepend the hashtag (#) character before the following lines, which can be found in the
section of the job.yaml file. When you finish, the lines should look like the
following examples:
#kubectl -n namespace1 patch statefulsets aiopenscale-ibm-aios-kafka...
#kubectl -n namespace1 patch statefulsets aiopenscale-ibm-aios-zookeeper...
- Delete the existing migration job by running the following command:
kubectl delete -f aiopenscale-ibm-aios-post-upgrade-migration-job.yaml
- Run the migration job again by running the following
command:
kubectl create -f aiopenscale-ibm-aios-post-upgrade-migration-job.yaml
- Use the
What to expect after upgrade
Because the upgrade from version 3.0.1 to 3.5.0 of Watson OpenScale represents a major overhaul of the service you must keep in mind the following known issues and behaviors:
- Any model deployments from Watson™ Machine Learning are not migrated and are reported in failed state. You must re-deploy these models and add them to Watson OpenScale.
- When fairness metrics are migrated from Watson OpenScale version 3.0.1 to 3.5.0, you cannot view fairness metrics details for certain deployments. You can, however, see the time series graph, but the drill-down metrics for a point in time might not show up in cases where the feature column is of a numeric type and has fewer than 50 distinct values. For explainability, it generates explanations for old and new scoring IDs for all Non-Watson Machine Learning subscriptions, such as AWS, Azure Service, Azure Studio and Custom machine learning providers.
- For SPSS® Modeler and Watson Machine Learning subscriptions, no explanations are generated for old scoring IDs because it is missing a value for the ProbabilityVector field. To address this issue, you must send a scoring request again.
- When the Watson Machine Learning service is upgraded from
version 3.0.1 to 3.5, certain deployments might not be migrated. When you attempt to get
explanations for failed deployments, you might see the following error:
Error: Score request failed for deployment id =2b3fbea0-f1a8-4856-aac0-eed6fd05243c, server engine : watson_machine_learningwith status =500, reason =The service is experiencing some downstream errors
. You must re-create the subscription in the new service environment. - When you edit the monitor configuration of a migrated pre-production deployment and attempt to
run evaluation on the deployment, you see the following error during fairness evaluation:
An unexpected bias error occurred. Error in fetching manual_labeling records for data set *** of service instance 00000000-0000-0000-0000-000000000000.
To fix this issue, you must re-create the subscription of this deployment. - After you upgrade from Watson OpenScale to a later version, the instance version displays the original version in the dashboard. This is a known issue and does not accurately reflect the version. Newly installed instances continue to reflect the correct version number.
What to do next
- The service is ready to use. For details, see Validating and monitoring AI models with Watson OpenScale.