Online upgrade of IBM Cloud Pak for Watson AIOps (CLI method)

Use these instructions to upgrade IBM Cloud Pak® for Watson AIOps 3.7.0 or later to 4.1.2.

This procedure can be used on an online deployment of IBM Cloud Pak for Watson AIOps 3.7.0 or later, and can still be used if the deployment has had hotfixes applied. If you have an offline deployment, follow the instructions in Upgrading IBM Cloud Pak for Watson AIOps (offline).

If you are upgrading Event Manager, then you must use the topic Upgrading and rolling back Opens in a new tab in the appropriate version of the IBM® Netcool® Operations Insight® documentation.

Before you begin

Warnings:

  • Custom patches, labels, and manual adjustments to IBM Cloud Pak for Watson AIOps resources are lost when IBM Cloud Pak for Watson AIOps is upgraded, and must be manually reapplied after upgrade. For more information, see Manual adjustments are not persisted.
  • If you previously increased the size of the Kafka PVC directly, then you must follow the correct procedure that is supplied in Increasing the Kafka PVC to ensure that the size is updated by the operator. Failure to do so before upgrading IBM Cloud Pak for Watson AIOps causes the operator to attempt to restore a lower default value for the Kafka PVC, and causes an error in your IBM Cloud Pak for Watson AIOps deployment.

Restrictions:

  • You cannot use these instructions to upgrade deployments of IBM Cloud Pak for Watson AIOps 3.6.2 or earlier. For more information, see Upgrade paths.
  • The upgrade cannot be removed or rolled back.
  • If you are planning to upgrade to Red Hat OpenShift Container Platform 4.12 as part of an upgrade to IBM Cloud Pak for Watson AIOps 4.1.2, you must complete the IBM Cloud Pak for Watson AIOps upgrade before you upgrade to Red Hat OpenShift Container Platform 4.12.

Upgrade procedure

Follow these steps to upgrade your online IBM Cloud Pak for Watson AIOps deployment.

  1. Ensure cluster readiness
  2. Configure automatic catalog polling
  3. Update foundational services
  4. Update the operator subscription
  5. Verify the deployment
  6. Post upgrade actions

1. Ensure cluster readiness

Recommended: Take a backup before upgrading. For more information, see Backup and restore.

  1. Ensure that your cluster still meets all of the prerequisites for deployment. For more information, see Planning.

    If you are upgrading from IBM Cloud Pak for Watson AIOps 4.1.0 or higher, then you can skip this step as you will already have performed it. From IBM Cloud Pak for Watson AIOps 4.1.0, the storage requirements for Kafka have increased to 300 GB (3 persistent volumes (PVs) of 100 GB each) for production deployments, and to 60 GB for starter deployments. Your PVs are already configured with volume expansion enabled, as stated in the Storage class requirements, but you must ensure that there is adequate space for the Kafka PVs to expand before you commence upgrade.

    Note: IBM Cloud Pak for Watson AIOps requires that Red Hat OpenShift Container Platform must be version 4.10.46 or higher.

  2. Run the IBM Cloud Pak for Watson AIOps prerequisite checker script.

    The prerequisite checker script ensures that your Red Hat OpenShift Container Platform cluster is correctly set up for an IBM Cloud Pak for Watson AIOps upgrade. When you run the prerequisite checker script, you must run the script in the same project (namespace) that IBM Cloud Pak for Watson AIOps is installed in.

    For more information about the script, including how to download and run it, see github.com/IBM Opens in a new tab.

    Important: The prerequisite checker script might show inadequate resources in the Resource Summary because the script does not account for resources already being in use by the upgrading deployment. This can be ignored, as can the following message: [ FAIL ] Small or Large Profile Install Resources.

    Example output:

    # ./prereq.sh 
    Using project "cp4waiops" on server "https://myserver.mycluster.mydomain:6443".
    
    Starting IBM Cloud Pak for Watson AIOps AI Manager prerequisite checker v4.1...
    
    [INFO] =================================Openshift Container Platform Version Check=================================
    [INFO] Checking OCP Version. Compatible Versions of OCP are v4.10.46+ and v4.12.x
    [INFO] OCP version 4.12.18 is compaitble
    [INFO] =================================Openshift Container Platform Version Check=================================
    
    [INFO] =================================Entitlement Pull Secret=================================
    [INFO] Checking whether the Entitlement secret or Global pull secret is configured correctly.
    [INFO] Checking if the job 'cp4waiops-entitlement-key-test-job' already exists.
    [INFO] The job with name 'cp4waiops-entitlement-key-test-job' was not found, so moving ahead and creating it.
    [INFO] Creating the job 'cp4waiops-entitlement-key-test-job'
    job.batch/cp4waiops-entitlement-key-test-job created
    [INFO] Verifying if the job 'cp4waiops-entitlement-key-test-job' completed successfully..
    [INFO] SUCCESS! Entitlement secret is configured correctly.
    job.batch "cp4waiops-entitlement-key-test-job" deleted
    [INFO] =================================Entitlement Pull Secret=================================
    
    [INFO] =================================Storage Provider=================================
    [INFO] Checking storage providers
    [INFO] No IBM Storage Fusion Found... Skipping configuration check.
    
    [INFO] No Portworx StorageClusters found with "Running" or "Online" status. Skipping configuration check for Portworx.
    [INFO] Openshift Data Foundation found.
    [INFO] No IBM Cloud Storage found... Skipping configuration check for IBM Cloud Storage Check.
    
    Checking Openshift Data Foundation Configuration...
    Verifying if Red Hat Openshift Data Foundation pods are in "Running" or "Completed" status
    [INFO] Pods in openshift-storage project are "Running" or "Completed"
    [INFO] ocs-storagecluster-ceph-rbd exists.
    [INFO] ocs-storagecluster-cephfs exists.
    [INFO] No warnings or failures found when checking for Storage Providers.
    [INFO] =================================Storage Provider=================================
    
    [INFO] =================================Small or Large Profile Install Resources=================================
    [INFO] Checking for cluster resources
    
    [INFO] ==================================Resource Summary=====================================================
    [INFO]                                       Nodes   |      vCPU      |     Memory(GB)
    [INFO] Small profile(available/required)  [  14 / 3 ]   [  255 / 62 ]       [  479 / 140 ]
    [INFO] Large profile(available/required)  [  14 / 10 ]   [  255 / 162 ]       [  479 / 360 ]
    [INFO] ==================================Resource Summary=====================================================
    [INFO] Cluster currently has resources available to create a large profile of Cloud Pak for Watson AIOps AI Manager
    [INFO] =================================Small or Large Profile Install Resources=================================
    
    
    [INFO] =================================Prerequisite Checker Tool Summary=================================
          [  PASS  ] Openshift Container Platform Version Check
          [  PASS  ] Entitlement Pull Secret
          [  PASS  ] Storage Provider
          [  PASS  ] Small or Large Profile Install Resources
    [INFO] =================================Prerequisite Checker Tool Summary=================================
    

  1. Delete any evicted connector-orchestrator pods.

    1. Run the following command to check if there are any evicted connector-orchestrator pods.

      oc get pods -n <namespace> | grep connector-orchestrator
      

      Where <namespace> is the namespace (project) that your IBM Cloud Pak for Watson AIOps installation is deployed in.

    2. Cleanup any evicted connector-orchestrator pods.

      If the previous command returned any pods with a STATUS of Evicted, then run the following command to delete each of them.

      oc delete pod -n <namespace> <connector_orchestrator>
      

      Where

      • <namespace> is the namespace (project) that your IBM Cloud Pak for Watson AIOps installation is deployed in.
      • <connector_orchestrator> is a pod returned in the previous step.

2. Configure automatic catalog polling

Ensure that your catalog is set to automatically poll for the latest images.

Your ibm-operator-catalog CatalogSource object can be configured to automatically poll for the latest catalog version, and to retrieve it if one is available. Polling for updates is enabled by configuring the polling attribute, spec.updateStrategy.registryPoll.

You might have already elected to automatically accept updates by adding the polling attribute to your ibm-operator-catalog YAML when you installed IBM Cloud Pak for Watson AIOps, installed an IBM Cloud Pak for Watson AIOps hotfix from IBM support Opens in a new tab, or when you installed another IBM Cloud Pak®.

Use the following steps to check whether you already have a polling attribute set, and to configure it if you do not.

Note: ibm-operator-catalog also contains the catalogs for other IBM Cloud Paks. If you have multiple IBM Cloud Paks installed on your cluster and you enable the polling attribute, then automatic update is configured for all of them.

  1. Run the following command to view and edit your ibm-operator-catalog CatalogSource instance.

    oc edit catalogsource ibm-operator-catalog -n openshift-marketplace
    
  2. If there is not a spec.updateStrategy section, or spec.image is not set to icr.io/cpopen/ibm-operator-catalog:latest, then update the YAML to have the following contents, and save it.

    apiVersion: operators.coreos.com/v1alpha1
    kind: CatalogSource
    metadata:
      name: ibm-operator-catalog
      namespace: openshift-marketplace
    spec:
      displayName: ibm-operator-catalog
      publisher: IBM Content
      sourceType: grpc
      image: icr.io/cpopen/ibm-operator-catalog:latest
      updateStrategy:
        registryPoll:
          interval: 45m
    

3. Update foundational services

IBM Cloud Pak® foundational services, which is part of your IBM Cloud Pak for Watson AIOps deployment, must be at version 3.23 or higher before you upgrade IBM Cloud Pak for Watson AIOps.

Use the following steps to verify that your ibm-common-service-operator subscription is set to version 3.23 or higher, and to set it to a qualifying version if it is not.

  1. Run the following command to find out what version of foundational services you have installed.

    oc get csv -A | grep ibm-common-service-operator
    

    If the version returned is v3.23 or higher, then you do not need to update foundational services and you must skip the rest of this section and proceed to step 4, Update the operator subscription.

  2. Download the Common Services upgrade script, upgrade_common_services.sh, from github.com/IBM Opens in a new tab.

  3. Run the following command from the directory that you downloaded the Common Services upgrade script to. This script must be run by a user with cluster-admin privilege.

    ./cp4waiops-samples/upgrade/upgrade_common_services.sh -a -c v3.23
    

    Important: You must only run this script if your version of foundational services is less than v3.23.

  4. When upgrade_common_services.sh completes, run the following commands to verify that the ibm-common-service-operator channel is set to version 3.23 or higher in the subscription and in the ClusterServiceVersion (CSV) before you continue.

    1. Check the subscription.

      oc get subscription ibm-common-service-operator -n ibm-common-services -o jsonpath='{.spec.channel}'
      

      Example output:

      oc get subscription ibm-common-service-operator -n ibm-common-services -o jsonpath='{.spec.channel}'
      'v3.23'
      

    2. Check the CSV.

      oc get csv -A | grep ibm-common-service-operator
      

      Example output:

      oc get csv -A | grep ibm-common-service-operator
      ibm-common-services   ibm-common-service-operator.v3.23.7   IBM Cloud Pak foundational services   3.23.7   Succeeded
      cp4waiops             ibm-common-service-operator.v3.23.7   IBM Cloud Pak foundational services   3.23.7   Succeeded
      

  5. The foundational services upgrade commences, and will take approximately 30 - 60 minutes.

    You can run the following command to check the status of ZenService. When the foundational services upgrade is complete, this command will have a STATUS of Completed. Do not proceed until the upgrade has completed.

    oc get zenservice -A -o custom-columns='KIND:.kind,NAME:.metadata.name,NAMESPACE:.metadata.namespace,VERSION:status.currentVersion,STATUS:.status.zenStatus,PROGRESS:.status.Progress,MESSAGE:.status.ProgressMessage'
    

    Example output from a successful foundational services upgrade:

    KIND         NAME                 NAMESPACE   VERSION   STATUS      PROGRESS   MESSAGE
    ZenService   iaf-zen-cpdservice   cp4waiops   4.8.0     Completed   100%       The Current Operation Is Completed
    

4. Update the operator subscription

If you are upgrading from IBM Cloud Pak for Watson AIOps 4.1.0 or higher, skip this section as the operator subscription is already correctly set. Proceed to section 5, Verify the deployment.

Update the spec.channel value of the IBM Cloud Pak for Watson AIOps subscription to the release that you want to upgrade to, v4.1.

oc patch subscription.operators.coreos.com ibm-aiops-orchestrator -n <namespace> --type=json -p='[{'op': 'replace', 'path': '/spec/channel', 'value': 'v4.1'}]'

Where <namespace> is the namespace (project) that your IBM Cloud Pak for Watson AIOps subscription is deployed in. This is your IBM Cloud Pak for Watson AIOps project if your deployment is namespace scoped, or openshift-operators if your deployment has a cluster wide scope.

5. Verify the deployment

5.1 Check the version

Verify that your IBM Cloud Pak for Watson AIOps deployment is successfully upgraded. Run the following command and check that the VERSION that is returned is 4.1.2.

oc get csv -l operators.coreos.com/ibm-aiops-orchestrator.<namespace> -n <namespace>

Where <namespace> is the namespace (project) that your IBM Cloud Pak for Watson AIOps installation is deployed in if your deployment is namespace scoped, or openshift-operators if your deployment has a cluster wide scope.

Example output:

oc get csv -l operators.coreos.com/ibm-aiops-orchestrator.cp4waiops -n cp4waiops

NAME                           DISPLAY                                    VERSION  REPLACES                       PHASE
ibm-aiops-orchestrator.v4.1.2  IBM Cloud Pak for Watson AIOps AI Manager  4.1.2    ibm-aiops-orchestrator.v4.1.1  Succeeded

5.2 Check the deployment

Run the following command to check that the PHASE of your deployment is Updating.

oc get installations.orchestrator.aiops.ibm.com -n <namespace>

Where <namespace> is the namespace (project) that your IBM Cloud Pak for Watson AIOps installation is deployed in.

Example output:

NAME                  PHASE     LICENSE    STORAGECLASS   STORAGECLASSLARGEBLOCK   AGE
ibm-cp-watson-aiops   Updating  Accepted   rook-cephfs    rook-ceph-block          3m

It takes around 60-90 minutes for the upgrade to complete (subject to the speed with which images can be pulled). When installation is complete and successful, the PHASE of your installation changes to Running. If your installation phase does not change to Running, then use the following command to find out which components are not ready:

oc get installation.orchestrator.aiops.ibm.com -o yaml | grep 'Not Ready'

Example output:

lifecycleservice: Not Ready
zenservice: Not Ready

To see details about why a component is Not Ready run the following command, where <component> is the component that is not ready, for example zenservice.

oc get <component> -o yaml

(Optional) You can also download and run a status checker script to see information about the status of your deployment. For more information about how to download and run the script, see github.com/IBMOpens in a new tab.

If the installation fails, or is not complete and is not progressing, then see Troubleshooting installation and upgrade and Known Issues to help you identify any installation problems.

5.3 Verify that the LifecycleTrigger job is complete

Run the following command, and verify that the output contains reason: Ready and status: "True" before continuing to the next step.

oc get LifecycleTrigger aiops -o yaml

Example output:

status:
  conditions:
  - lastTransitionTime: "2024-03-21T03:04:53Z"
    message: Job '84086bada206a5b4b31cc297e3206835' in state 'RUNNING'
    observedGeneration: 2
    reason: Ready
    status: "True"
    type: LifecycleTriggerReady

6. Post upgrade actions

  1. If you previously setup backup or restore on your deployment, then you must follow the instructions in Upgrading IBM Cloud Pak for Watson AIOps backup and restore artifacts.

  2. If the EXPIRY_SECONDS environment variable was set for configuring log anomaly alerts, the environment variable was not retained in the upgrade. After the upgrade is completed, set the environment variable again. For more information about setting the variable, see Configuring expiry time for log anomaly alerts.

  3. If the Access Control page displays custom roles with deprecated permissions after upgrade, see Custom roles with deprecated permissions after upgrade.

  4. If you have Netcool or Metrics connections configured, then you must follow the instructions in After upgrade, Netcool and Metrics connections have a red status in the Connections UI.

  5. (Optional) A new field is available in IBM Cloud Pak for Watson AIOps 4.1.0 or higher that you can use to specify the terminology for collections of topology resources as application or service. The default is application. If you want to use service as the terminology for your topology resource collections, then run the following command to patch your custom resource.

    oc patch installations.orchestrator.aiops.ibm.com/<namespace> --type merge -p '{"spec":{"topologyModel":"service"}}'
    

    Where <namespace> is the namespace (project) that your IBM Cloud Pak for Watson AIOps installation is deployed in.

  6. (Optional) Delete the persistent volume claim (PVC) for training job state data that is no longer required. For more information, see Deleting a persistent volume claim.

  7. In a proactive ChatOps channel, if you click the Change request ticket URL, but details don't display in the ServiceNow instance, edit the information for the x_ibm_waiops.admin user. For more information, see IBM Change Risk Assessment tab in ServiceNow not displaying change risk assessment details.