Upgrading Watson OpenScale

A project administrator can upgrade the Watson OpenScale service on IBM® Cloud Pak for Data.

Before you begin

Required role: To complete this task, you must be an administrator of the project (namespace) where Watson OpenScale is installed.

Before you upgrade Watson OpenScale, ensure that:

Common core services Watson OpenScale requires the Cloud Pak for Data common core services. The common core services are installed once in a given Red Hat OpenShift project. If the common core services are not installed in the project where you plan to install Watson OpenScale or if the common core services are not at the correct version, the common core services will be automatically installed or upgraded when you upgrade Watson OpenScale. If the common core services need to be installed or upgraded, it might take longer to upgrade Watson OpenScale. For more information on the common core services, see:

If you are upgrading multiple services on your cluster, you must run the upgrades one at a time and wait until the upgrade completes before upgrading another service. You cannot run the upgrades in parallel.

Tip: For a list of all available options, enter the following command:
./cpd-cli upgrade --help

Procedure

  1. Complete the appropriate steps to upgrade Watson OpenScale on your environment:
  2. Verifying that the upgrade completed successfully
  3. Checking for available patches
  4. Upgrading existing service instances
  5. Complete the tasks listed in What to do next

Upgrading on clusters connected to the internet

From your installation node:

  1. Change to the directory where you placed the Cloud Pak for Data command-line interface and the repo.yaml file.
  2. Log in to your Red Hat OpenShift cluster as a project administrator:
    oc login OpenShift_URL:port
  3. Run the following command to see a preview of what will change when you upgrade the service.
    Important: If you are using the internal Red Hat OpenShift registry and you are using the default self-signed certificate, specify the --insecure-skip-tls-verify flag to prevent x509 errors.
    ./cpd-cli upgrade \
    --repo ./repo.yaml \
    --assembly aiopenscale \
    --arch Cluster_architecture \  
    --namespace Project \
    --storageclass Storage_class_name \
    --transfer-image-to Registry_location \
    --cluster-pull-prefix Registry_from_cluster \
    --ask-pull-registry-credentials \
    --ask-push-registry-credentials \
    --latest-dependency \
    --dry-run
    Important: By default, this command gets the latest assembly. If you want to upgrade to a specific version of Watson OpenScale, add the following line to your command after the --assembly flag:
    --version Assembly_version \

    The --latest-dependency flag gets the latest version of the dependent assemblies. If you remove the --latest-dependency flag, the installer will either leave the dependent assemblies at the current version or get the minimum version of the dependent assemblies.

    Ensure that you use the same flags that your cluster administrator used when they completed Preparing to upgrade Watson OpenScale. If your cluster administrator used the --version flag, ensure that you specify the same version of the assembly.

    Replace the following values:

    Variable Replace with
    Assembly_version
    The version of Watson OpenScale that you want to install. The assembly versions are listed in System requirements for services.
    Cluster_architecture Specify the architecture of your cluster hardware:
    • For x86-64 hardware, remove this flag or specify x86_64
    • For POWER hardware, specify ppc64le
    Project Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services.
    Storage_class_name Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services.

    Refresh 2 or later If you are using the 3.5.2 version of the cpd-cli, remove the --storageclass flag from your command. The cpd-cli upgrade command uses the storage class that was specified during installation.

    Registry_location Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services.
    Registry_from_cluster Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services.
  4. Rerun the previous command without the --dry-run flag to upgrade the service.

Upgrading on air-gapped clusters

From your installation node:

  1. Change to the directory where you placed the Cloud Pak for Data command-line interface.
  2. Log in to your Red Hat OpenShift cluster as a project administrator:
    oc login OpenShift_URL:port
  3. Run the following command to see a preview of what will change when you upgrade the service.
    Important: If you are using the internal Red Hat OpenShift registry:
    • Do not specify the --ask-pull-registry-credentials parameter.
    • If you are using the default self-signed certificate, specify the --insecure-skip-tls-verify flag to prevent x509 errors.
    ./cpd-cli upgrade \
    --assembly aiopenscale \
    --arch Cluster_architecture \
    --namespace Project \
    --storageclass Storage_class_name \
    --cluster-pull-prefix Registry_from_cluster \
    --ask-pull-registry-credentials \
    --load-from Image_directory_location \
    --latest-dependency \
    --dry-run
    Note: If the assembly was downloaded using the delta-images command, remove the --latest-dependency flag from the command. If you don't remove the --latest-dependency flag you will get an error indicating that the flag cannot be used.

    Ask your cluster administrator whether they specified the --latest-dependency flag when they completed Preparing to upgrade Watson OpenScale. If they ran the adm command with the --latest-dependency flag, you must also run the install command with the flag.

    Replace the following values:

    Variable Replace with
    Cluster_architecture Specify the architecture of your cluster hardware:
    • For x86-64 hardware, remove this flag or specify x86_64
    • For POWER hardware, specify ppc64le
    Project Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services.
    Storage_class_name Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services.

    Refresh 2 or later If you are using the 3.5.2 version of the cpd-cli, remove the --storageclass flag from your command. The cpd-cli upgrade command uses the storage class that was specified during installation.

    Registry_from_cluster Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services.
    Image_directory_location Use the value provided by your cluster administrator. You should have obtained this information when you completed Preparing to install and upgrade services.
  4. Rerun the previous command without the --dry-run flag to upgrade the service.

Verifying that the upgrade completed successfully

From your installation node:

  1. Run the following command:
    ./cpd-cli status \
    --assembly aiopenscale \
    --namespace Project

    Replace Project with the value you used in the preceding commands.

    • If the upgrade completed successfully, the status of the assembly and the modules in the assembly is Ready.
    • If the upgrade failed, contact IBM Support for assistance.

Checking for available patches

Determine whether there are any patches available the version of Watson OpenScale that you installed:

Clusters connected to the internet
Run the following command to check for patches:
./cpd-cli status \
--repo ./repo.yaml \
--namespace Project \ 
--assembly aiopenscale \
--patches \
--available-updates 
Air-gapped clusters
See the list of Available patches for Watson OpenScale.

If you need to apply patches to the service, follow the guidance in Applying patches.

Upgrading existing service instances

After you upgrade Watson OpenScale, the service instances that are associated with the installation must also be upgraded. This task must be completed by a Cloud Pak for Data administrator or a service instance administrator.

To upgrade service instances, you must have a cpd-cli profile on your local machine. Ensure that your profile points to the instance of Cloud Pak for Data where the service instances exist. Your profile enables Cloud Pak for Data to ensure that you have the appropriate permissions to upgrade the service instances. For details, see Creating a cpd-cli profile.

Tip: For a list of all available options, enter the following command:
./cpd-cli service-instance upgrade --help

From your installation node:

  1. Run the following command to see the list of service instances:
    ./cpd-cli service-instance list \
    --profile Profile_name

    Replace Profile_name with the name of your local profile.

    The command returns a list of all of the service instances that you have access to.

  2. Run the following command to upgrade all of the service instances:
    ./cpd-cli service-instance upgrade \
    --service-type service-type \
    --profile Profile_name \
    --watch \
    --all

    Replace Profile_name with the name of your local profile.

  3. Verify that the service instances were updated and are ready to use:
    1. Run the following command to see a list of the service instances:
      ./cpd-cli service-instance list \
      --service-type service-type \
      --profile Profile_name

      Replace Profile_name with the name of your local profile.

      The command returns a list of all of the service-type service instances that you have access to.

    2. Run the following command for each service instance to verify that the instance is ready to use:
      ./cpd-cli service-instance status Instance_name\
      --profile Profile_name

      Replace the following values:

      Variable Replace with
      Instance_name The name of the service instance for which you want to see the status.
      Profile_name The name of your local profile.

      Confirm that the service status is Running.

Verifying migration and mitigating exceptions

After you upgrade the service, you must review the migration job log to verify that all subscriptions are migrated. If intermittent failures mean that not all subscriptions successfully migrated, you must re-run the migration script to complete the migration of subscriptions that did not happen in the initial run.
Note: Historical debias metrics are not migrated as part of an upgrade. If you need debias metrics to be migrated, you must contact IBM support for help.
  1. Retrieve migration logs.
    1. Use the ssh root@<SERVER> command to log in to the cluster infrastructure node. For example, enter the following command: ssh root@islapr36-inf.fyre.ibm.com
    2. To log in to the namespace as the kubernetes admin, run the following command:
      oc login https://<SERVER>:6443 -u kubeadmin -p <PASSWORD> --insecure-skip-tls-verify=true -n namespace1
    3. Find the migration job pod:
      POD=`oc get pod -n namespace1 | grep aiopenscale-ibm-aios-post-upgrade-migration-job | awk '{print $1}'`
    4. Get the pod log into file migration_logs.log by running the following command
      oc logs -n namespace1 $POD > migration_logs.log
  2. Check migration logs for failures.
    1. Open the migration_logs.log file and make sure that there are no exception or exception stacktraces.
    2. Look for a stacktrace or message that indicates migration failed for one or more subscription. For example, the following example shows a stacktrace that records an error:
      Traceback (most recent call last):
      File "/opt/ibm/migration/scripts/sql/versions/20191119_001_fairness_v2.py", line 614, in __get_fairness_metrics_for_dates metrics = __get_response(token, metrics_url)
      File "/opt/ibm/migration/scripts/sql/versions/20191119_001_fairness_v2.py", line 644, in __get_response response = json.loads(response.text)
    3. Look for any message that starts with Failed or Exception.
    4. Look for text that reads Rolling back for.
    Note: You can ignore any log message that starts with Failed to enable payload logging.
  3. If there were intermittent failures, you must re-run the migration job.
    1. Use the ssh root@<SERVER> command to log in to the cluster infrastructure node. For example, enter the following command: ssh root@islapr36-inf.fyre.ibm.com
    2. To log in to the namespace as the kubernetes admin, run the following command:
      oc login https://<SERVER>:6443 -u kubeadmin -p <PASSWORD> --insecure-skip-tls-verify=true -n namespace1
    3. Find the aios migration job aiopenscale-ibm-aios-post-upgrade-migration-job by running the following command:
      kubectl get job aiopenscale-ibm-aios-post-upgrade-migration-job -o yaml > aiopenscale-ibm-aios-post-upgrade-migration-job.yaml
    4. Edit the aiopenscale-ibm-aios-post-upgrade-migration-job.yaml file to remove the following lines:
      • metadata.uid
      • metadata.resourceVersion
      • spec.selector
      • controller-uid
      • the lines that follow the status: line, including the status: line itself
    5. Edit the aiopenscale-ibm-aios-post-upgrade-migration-job.yaml file to insert or prepend the hashtag (#) character before the following lines, which can be found in the spec > template > spec > command section of the job.yaml file. When you finish, the lines should look like the following examples:
      • #kubectl -n namespace1 patch statefulsets aiopenscale-ibm-aios-kafka...
      • #kubectl -n namespace1 patch statefulsets aiopenscale-ibm-aios-zookeeper...
    6. Delete the existing migration job by running the following command:
      kubectl delete -f  aiopenscale-ibm-aios-post-upgrade-migration-job.yaml
    7. Run the migration job again by running the following command:
      kubectl create -f aiopenscale-ibm-aios-post-upgrade-migration-job.yaml

What to expect after upgrade

Because the upgrade from version 3.0.1 to 3.5.0 of Watson OpenScale represents a major overhaul of the service you must keep in mind the following known issues and behaviors:

  • Any model deployments from Watson™ Machine Learning are not migrated and are reported in failed state. You must re-deploy these models and add them to Watson OpenScale.
  • When fairness metrics are migrated from Watson OpenScale version 3.0.1 to 3.5.0, you cannot view fairness metrics details for certain deployments. You can, however, see the time series graph, but the drill-down metrics for a point in time might not show up in cases where the feature column is of a numeric type and has fewer than 50 distinct values. For explainability, it generates explanations for old and new scoring IDs for all Non-Watson Machine Learning subscriptions, such as AWS, Azure Service, Azure Studio and Custom machine learning providers.
  • For SPSS® Modeler and Watson Machine Learning subscriptions, no explanations are generated for old scoring IDs because it is missing a value for the ProbabilityVector field. To address this issue, you must send a scoring request again.
  • When the Watson Machine Learning service is upgraded from version 3.0.1 to 3.5, certain deployments might not be migrated. When you attempt to get explanations for failed deployments, you might see the following error: Error: Score request failed for deployment id =2b3fbea0-f1a8-4856-aac0-eed6fd05243c, server engine : watson_machine_learningwith status =500, reason =The service is experiencing some downstream errors. You must re-create the subscription in the new service environment.
  • When you edit the monitor configuration of a migrated pre-production deployment and attempt to run evaluation on the deployment, you see the following error during fairness evaluation: An unexpected bias error occurred. Error in fetching manual_labeling records for data set *** of service instance 00000000-0000-0000-0000-000000000000. To fix this issue, you must re-create the subscription of this deployment.
  • After you upgrade from Watson OpenScale to a later version, the instance version displays the original version in the dashboard. This is a known issue and does not accurately reflect the version. Newly installed instances continue to reflect the correct version number.

What to do next