Online upgrade of IBM Cloud Pak for AIOps (CLI method)

Use these instructions to upgrade an online deployment of IBM Cloud Pak® for AIOps 4.8.0 or later to 4.9.0.

Overview

This procedure is for deployments of IBM Cloud Pak for AIOps on Red Hat® OpenShift® Container Platform. This procedure can be used on an online deployment of IBM Cloud Pak for AIOps 4.8.0 or later, and can still be used if the deployment has had hotfixes applied.

You cannot use these instructions to upgrade deployments of IBM Cloud Pak for AIOps 4.7.1 or earlier. For more information, see Upgrade paths.

If you have an offline deployment of IBM Cloud Pak for AIOps on Red Hat OpenShift, follow the instructions in Upgrading IBM Cloud Pak for AIOps (offline).

If you have a deployment of IBM Cloud Pak for AIOps on Linux, follow the instructions in Online upgrade of IBM Cloud Pak for AIOps on Linux.

Important: If IBM® Netcool® Operations Insight® is deployed on the same cluster as IBM Cloud Pak for AIOps, ensure that Netcool Operations Insight is at version 1.6.13 or later before you upgrade IBM Cloud Pak for AIOps. Failure to do so may result in a broken Netcool Operations Insight deployment, as the IBM Cloud Pak for AIOps upgrade process updates a shared component that is used by both applications.

Before you begin

Notes:

  • Ensure that you are on a version of Red Hat OpenShift that your current and target versions of IBM Cloud Pak for AIOps both support. If you already have a qualifying version of Red Hat OpenShift but you want to upgrade it, then complete the IBM Cloud Pak for AIOps upgrade first. For more information, see Guidance for upgrades that require an Red Hat OpenShift upgrade.
  • Ensure that you are logged in to your Red Hat OpenShift cluster with oc login for any steps that use the Red Hat OpenShift command-line interface (CLI).
  • Red Hat OpenShift requires a user with cluster-admin privileges for the following operations:

Warnings:

  • Custom patches, labels, and manual adjustments to IBM Cloud Pak for AIOps resources are lost when IBM Cloud Pak for AIOps is upgraded, and must be manually reapplied after upgrade. For more information, see Manual adjustments are not persisted.
  • The upgrade cannot be removed or rolled back.
  • If you previously increased the size of a PVC directly, then you must follow the correct procedure that is supplied in Scaling up storage to ensure that the size is updated by the operator. Failure to do so before upgrading IBM Cloud Pak for AIOps causes the operator to attempt to restore a lower default value for the PVC, and causes an error in your IBM Cloud Pak for AIOps deployment.

Upgrade procedure

Follow these steps to upgrade your online IBM Cloud Pak for AIOps deployment.

  1. Ensure cluster readiness
  2. Create the catalog
  3. Update the operator subscription
  4. Verify the deployment
  5. Post upgrade actions

1. Ensure cluster readiness

Recommended: Take a backup before upgrading. For more information, see Backup and restore.

  1. Ensure that your cluster still meets all of the prerequisites for deployment. For more information, see Planning.

  2. If you still have waiops_var.sh from when you installed IBM Cloud Pak for AIOps, then run the following command from the directory that the script is in. This sets the environment variables that are used later, and renames the environment file so that it has the project in its name, which is required for all deployments from v4.9.0 onwards.

    . ./waiops_var.sh
    mv  ./waiops_var.sh ./waiops_var_${PROJECT_CP4AIOPS}.sh
    

    If you do not have waiops_var.sh, then run the following commands to set the environment variables that you need for upgrade.

    export PROJECT_CP4AIOPS=<project>
    export INSTALL_MODE_NAMESPACE=<install_namespace>
    

    Where

    • <project> is the namespace (project) that your IBM Cloud Pak for AIOps subscription is deployed in.
    • <install_namespace> is ${PROJECT_CP4AIOPS} if your deployment is namespace scoped, or openshift-operators if your deployment has a cluster-wide scope.
  3. Run a prerequisite checker script to ensure that your Red Hat OpenShift Container Platform cluster is correctly set up for an IBM Cloud Pak for AIOps upgrade.

    If you have a Starter deployment, then skip the rest of this step and run the Verify cluster readiness step in the starter installation topic Online starter installation of IBM Cloud Pak for AIOps (CLI). If your precheck passes, then proceed to step 4, Delete any evicted connector-orchestrator pods.

    Download the prerequisite checker script from github.com/IBM Opens in a new tab, and run it with the following command:

    ./prereq.sh -n ${PROJECT_CP4AIOPS} --ignore-allocated
    

    Important: If you are installing on a multi-zone cluster, then also specify the -m flag to assess whether there are sufficient resources to withstand a zone outage.

    Example output:

    ./prereq.sh -n cp4aiops --ignore-allocated
    
    [INFO] Starting IBM Cloud Pak for AIOps prerequisite checker v4.9...
    
    CLI: oc
    The file '/var/folders/rk/fhqh_50x6vn77pg72xn18pm80000gn/T/tmp.jl8yNKnB9h' has been created.
    
    [INFO] =================================Platform Version Check=================================
    [INFO] Checking Platform Type....
    [INFO] You are using Openshift Container Platform
    [INFO] OCP version 4.17.9 is compatible but only nodes with x86_64 (amd64) architectures are supported at this time.
    [INFO] =================================Platform Version Check=================================
    
    [INFO] =================================Storage Provider=================================
    [INFO] Checking storage providers
    [INFO] No IBM Storage Fusion Found... Skipping configuration check.
    [INFO] No IBM Storage Fusion HCI System... Skipping configuration check.
    
    [INFO] No Portworx StorageClusters found with "Running" or "Online" status. Skipping configuration check for Portworx.
    [INFO] Openshift Data Foundation found.
    [INFO] No IBM Cloud Storage found... Skipping configuration check for IBM Cloud Storage Check.
    
    Checking Openshift Data Foundation Configuration...
    Verifying if Red Hat Openshift Data Foundation pods are in "Running" or "Completed" status
    [INFO] Pods in openshift-storage project are "Running" or "Completed"
    [INFO] ocs-storagecluster-ceph-rbd exists.
    [INFO] ocs-storagecluster-cephfs exists.
    [INFO] No warnings or failures found when checking for Storage Providers.
    [INFO] =================================Storage Provider=================================
    
    [INFO] =================================Cert Manager Check=================================
    [INFO] Checking for Cert Manager operator
    
    [INFO] Successfully functioning cert-manager found.
    
    CLUSTERSERVICEVERSION              NAMESPACE
    ibm-cert-manager-operator.v4.2.11  ibm-cert-manager
    
    [INFO] =================================Cert Manager Check=================================
    
    [INFO] =================================Licensing Service Operator Check=================================
    [INFO] Checking for Licensing Service operator
    
    [INFO] Successfully functioning licensing service operator found.
    
    CLUSTERSERVICEVERSION           NAMESPACE
    ibm-licensing-operator.v4.2.11  ibm-licensing
    
    [INFO] =================================Licensing Service Operator Check=================================
    
    [INFO] =================================Production Install Resources=================================
    [INFO] Checking for cluster resources
    
    [INFO] ==================================Resource Summary=====================================================
    [INFO]                                                                  vCPU               |          Memory(GB)      
    [INFO] Production (HA) Base (available/required)                      [  288 / 136 ]              [  578 / 310 ]
    [INFO]     (+ Log Anomaly Detection & Ticket Analysis)                [  288 / 162 ]              [  578 / 368 ]
    [INFO] ==================================Resource Summary=====================================================
    [INFO] Maximum instances of Base Production(HA): 1
    [INFO] Maximum instances of Extended Production(HA): 1
    [INFO] =================================Production Install Resources=================================
    
    [INFO] =================================ResourceQuota Check=================================
    [INFO] No CPU or Memory limits found in any ResourceQuota. Assuming no restrictions.
    [INFO] =================================ResourceQuota Check=================================
    
    
    [INFO] =================================Prerequisite Checker Tool Summary=================================
          [  PASS  ] Platform Version Check 
          [  PASS  ] Storage Provider
          [  PASS  ] Production (HA) Base Install Resources
          [  PASS  ] ResourceQuota Check
          [  PASS  ] Cert Manager Operator Installed
          [  PASS  ] Licensing Service Operator Installed
    [INFO] =================================Prerequisite Checker Tool Summary=================================
    
    Storing results of prereq in configmap/aiops-prereq
    configmap/aiops-prereq configured
    

    Note: The prerequisite checker script might show inadequate resources in the Resource Summary because the script does not account for resources already being in use by the upgrading deployment. This can be ignored, as can the following message: [ FAIL ] Small or Large Profile Install Resources.

  4. Delete any evicted connector-orchestrator pods.

    1. Run the following command to check if there are any evicted connector-orchestrator pods.

      oc get pods -n ${PROJECT_CP4AIOPS} | grep connector-orchestrator
      
    2. Cleanup any evicted connector-orchestrator pods.

      If the previous command returned any pods with a STATUS of Evicted, then run the following command to delete each of them.

      oc delete pod -n ${PROJECT_CP4AIOPS} <connector_orchestrator>
      

      Where <connector_orchestrator> is a pod returned in the previous step.

2. Create the catalog

Create a new ibm-aiops-catalog that uses the new v4.9.0 digest, and which has the other changes that are required for IBM Cloud Pak for AIOps 4.9.0.

  1. Run the following command:

    cat << EOF | oc apply -f -
    apiVersion: operators.coreos.com/v1alpha1
    kind: CatalogSource
    metadata:
      name: ibm-aiops-catalog
      namespace: ${INSTALL_MODE_NAMESPACE}
    spec:
      displayName: ibm-aiops-catalog
      publisher: IBM Content
      sourceType: grpc
      image: icr.io/cpopen/ibm-aiops-catalog@sha256:1652f745216637eed1f25eddab55181d8b55538c56f5c427a7122d61d467b23e
      grpcPodConfig:
        securityContextConfig: restricted
    EOF
    
  2. Verify that the ibm-aiops-catalog, ibm-cert-manager-catalog and ibm-licensing-catalog CatalogSource objects are in the output that is returned by the following command:

    oc get CatalogSources -n openshift-marketplace
    oc get CatalogSource -n ${INSTALL_MODE_NAMESPACE}
    

    Example output:

    oc get CatalogSources -n openshift-marketplace
    NAME                     DISPLAY                      TYPE   PUBLISHER   AGE
    ibm-cert-manager-catalog ibm-cert-manager             grpc   IBM         2m 
    ibm-licensing-catalog    IBM License Service Catalog  grpc   IBM         2m
    
    oc get CatalogSource -n cp4aiops
    NAME                     DISPLAY                      TYPE   PUBLISHER   AGE
    ibm-aiops-catalog        ibm-aiops-catalog            grpc   IBM         2m 
    

Note: During the upgrade of Cloud Pak for AIOps, Kubernetes jobs might fail and re-run. If a job succeeds on the second or third attempt, there can be one or two pods in Error state and one pod in the Completed state. If the job fails repeatedly, the attempt is abandoned, and the logs from failed pods are used to determine the cause of the failure. When you determine the cause for the failure, you can delete the job, and the operator can recreate it to reattempt the operations.

3. Update the operator subscription

  1. Update the spec.sourceNamespace value of the IBM Cloud Pak for AIOps subscription to use the new CatalogSource that you created in the previous step.

    oc patch subscription.operators.coreos.com ibm-aiops-orchestrator -n ${INSTALL_MODE_NAMESPACE} --type=json -p='[{'op': 'replace', 'path': '/spec/sourceNamespace', 'value': '"${INSTALL_MODE_NAMESPACE}"'}]'
    
  2. Update the spec.channel value of the IBM Cloud Pak for AIOps subscription to the release that you want to upgrade to, v4.9.

    oc patch subscription.operators.coreos.com ibm-aiops-orchestrator -n ${INSTALL_MODE_NAMESPACE} --type=json -p='[{'op': 'replace', 'path': '/spec/channel', 'value': 'v4.9'}]'
    
  3. If you are installing in AllNamespaces mode, run the following command to refresh the connectors' secret:

    oc delete secret cp4waiops-connectors-deploy-cert-secret -n "${PROJECT_CP4AIOPS}"
    

    For more information about installation modes, see Operator installation mode.

4. Verify the deployment

4.1 Check the deployment

Run the following command to check that the PHASE of your deployment is Updating.

oc get installations.orchestrator.aiops.ibm.com -n ${PROJECT_CP4AIOPS}

Example output:

NAME           PHASE     LICENSE    STORAGECLASS   STORAGECLASSLARGEBLOCK   AGE
ibm-cp-aiops   Updating  Accepted   rook-cephfs    rook-ceph-block          3m

It takes around 60-90 minutes for the upgrade to complete (subject to the speed with which images can be pulled). When installation is complete and successful, the PHASE of your installation changes to Running. If your installation phase does not change to Running, then use the following command to find out which components are not ready:

oc get installation.orchestrator.aiops.ibm.com -o yaml -n ${PROJECT_CP4AIOPS} | grep 'Not Ready'

Example output:

lifecycleservice: Not Ready
zenservice: Not Ready

To see details about why a component is Not Ready run the following command, where <component> is the component that is not ready, for example zenservice.

oc get <component> -o yaml -n ${PROJECT_CP4AIOPS}

(Optional) You can also download and run a status checker script to see information about the status of your deployment. For more information about how to download and run the script, see github.com/IBMOpens in a new tab.

If the upgrade fails, or is not complete and is not progressing, then see Troubleshooting installation and upgrade and Known Issues to help you identify any installation problems.

Important: Wait for the deployment to enter a Running phase before continuing to the next step.

4.2 Check the version

Run the following command and check that the VERSION that is returned is 4.9.0.

oc get csv -l operators.coreos.com/ibm-aiops-orchestrator.${INSTALL_MODE_NAMESPACE} -n ${INSTALL_MODE_NAMESPACE}

Example output:

oc get csv -l operators.coreos.com/ibm-aiops-orchestrator.cp4aiops -n cp4aiops

NAME                           DISPLAY                  VERSION  REPLACES                       PHASE
ibm-aiops-orchestrator.v4.9.0  IBM Cloud Pak for AIOps  4.9.0    ibm-aiops-orchestrator.v4.8.1  Succeeded

5. Post upgrade actions

  1. Run the following command to remove the obsolete catalog.

    oc delete catalogsources.operators.coreos.com ibm-aiops-catalog -n openshift-marketplace
    
  2. If you previously set up backup or restore on your deployment, then you must follow the instructions in Upgrading IBM Cloud Pak for AIOps backup and restore artifacts.

  3. If the EXPIRY_SECONDS environment variable was set for configuring log anomaly alerts, the environment variable was not retained in the upgrade. After the upgrade is completed, set the environment variable again. For more information about setting the variable, see Configuring expiry time for log anomaly alerts.

  4. If you have an IBM Db2 integration configured to enable Visualizing dashboards and reports with IBM Cognos® Analytics, then you must use the following steps to update your existing IBM® Db2® schema.

    Note: It is recommended that you backup your IBM Db2 database before upgrading. For more information, see Backing up and restoring Db2 in the IBM Db2 documentation.

    1. Connect to your IBM Db2 database with the credentials that you configured in Creating an IBM Db2 integration.

      db2 CONNECT TO <dbname> USER <user> USING <password>
      

      Where

      • <dbname> is the name of your IBM Cognos Analytics IBM Db2 database
      • <user> and <password> are the database username and password for your IBM Cognos Analytics IBM Db2 database
    2. Download the IBM Db2 schema upgrade script upgrade.sql from github.com/IBM Opens in a new tab to a directory of your choice, and run it with the following command:

      db2 -t -vf <path>/upgrade.sql
      

      Where <path> is the directory that you downloaded the upgrade script to.

    3. Verify that the database tables are updated so that UUID is not an auto-generated field in the database table definition.

      db2 describe table ALERTS_REPORTER_STATUS;
      db2 describe table INCIDENTS_REPORTER_STATUS;
      

      Expected output for the UUID column:

      Column name Data type schema  Data type name Column Length Scale Nulls
      ----------- ----------------  -------------- ------------- ----- ------
      UUID        SYSIBM             VARCHAR       255           0     No
      

      If the database tables are not updated, then try the following steps:

      1. Rerun the IBM Db2 schema upgrade script upgrade.sql:

        db2 -t -vf <path>/upgrade.sql
        

        Where <path> is the directory that you downloaded the upgrade script to.

      2. Save the IBM Db2 integration again.

        Follow the instructions in Editing an IBM Db2 integration, and update the description with a text of your choice.

    4. Edit the dashboard policy to add a new entry into the mapping, and then save your changes.

      1. Click the navigation icon in the upper-left of the screen to go to the main navigation menu.

      2. In the main navigation menu, click Operate > Automations.

      3. Click the hamburger icon next to the dashboard policy, and select to edit it.

      4. Select Database connection, and then follow steps 10 and 11 in Populate an external database to add one of the following mappings, and then save your changes.

        Add the first line for an alert trigger entity, or add the second line for an incident trigger entity

        "UUID": $join(["cfd95b7e-3bc7-4006-a4a8-a73a79c71255", alert.id], "_")
        "UUID": $join(["cfd95b7e-3bc7-4006-a4a8-a73a79c71255", incident.id], "_")
        

        Important: You must re-save the policy to persist your changes. To enable the policy's Save button, update the description with a text of your choice.