Online upgrade of IBM Cloud Pak for AIOps (console method)

Use these instructions to upgrade an online deployment of IBM Cloud Pak® for AIOps 4.7.0 or later to 4.8.0.

Overview

This procedure is for deployments of IBM Cloud Pak for AIOps on Red Hat® OpenShift® Container Platform. This procedure can be used on an online deployment of IBM Cloud Pak for AIOps 4.7.0 or later, and can still be used if the deployment has had hotfixes applied.

You cannot use these instructions to upgrade deployments of IBM Cloud Pak for AIOps 4.6.1 or earlier. For more information, see Upgrade paths.

If you have an offline deployment of IBM Cloud Pak for AIOps on Red Hat OpenShift, follow the instructions in Upgrading IBM Cloud Pak for AIOps (offline).

If you have a deployment of IBM Cloud Pak for AIOps on Linux, follow the instructions in Online upgrade of IBM Cloud Pak for AIOps on Linux.

Important: If IBM® Netcool® Operations Insight® is deployed on the same cluster as IBM Cloud Pak for AIOps, ensure that Netcool Operations Insight is at version 1.6.13 or later before you upgrade IBM Cloud Pak for AIOps. Failure to do so may result in a broken Netcool Operations Insight deployment, as the IBM Cloud Pak for AIOps upgrade process updates a shared component that is used by both applications.

Before you begin

Notes:

  • Ensure that you are on a version of Red Hat OpenShift that your current and target versions of IBM Cloud Pak for AIOps both support. If you already have a qualifying version of Red Hat OpenShift but you want to upgrade it, then complete the IBM Cloud Pak for AIOps upgrade first. For more information, see Guidance for upgrades that require an Red Hat OpenShift upgrade.
  • Some steps must still be performed with the Red Hat® OpenShift® Container Platform command line interface (CLI). Ensure that you are logged in to your Red Hat OpenShift Container Platform cluster with oc login for any steps that use the OpenShift command-line interface (CLI).
  • Red Hat OpenShift Container Platform requires a user with cluster-admin privileges for the following operations:

Warnings:

  • Custom patches, labels, and manual adjustments to IBM Cloud Pak for AIOps resources are lost when IBM Cloud Pak for AIOps is upgraded, and must be manually reapplied after upgrade. For more information, see Manual adjustments are not persisted.
  • The upgrade cannot be removed or rolled back.
  • If you previously increased the size of a PVC directly, then you must follow the correct procedure that is supplied in Resizing storage to ensure that the size is updated by the operator. Failure to do so before upgrading IBM Cloud Pak for AIOps causes the operator to attempt to restore a lower default value for the PVC, and causes an error in your IBM Cloud Pak for AIOps deployment.

Database migration during upgrade

The IBM Cloud Pak for AIOps 4.7.x to v4.8 upgrade process migrates the Elasticsearch database to an OpenSearch database.

Extra resources are needed during the upgrade, and are shown in the following table. These resources can be released after the upgrade is complete.

Table 1. Extra resources required during upgrade
Size vCPU Memory Disk
Starter 2 8Gi 100Gi
Production 6 24Gi 300Gi

If IBM Sales representatives and Business Partners supplied you with a custom profile, then you must calculate the additional resource requirements as follows:

  1. Run the following commands to view your custom profile ConfigMap.

    export PROJECT_CP4AIOPS=<project>
    oc get configmap -n "${PROJECT_CP4AIOPS}" $(oc get installation.orchestrator.aiops.ibm.com -n "${PROJECT_CP4AIOPS}" -o jsonpath='{.items[0].status.customProfileConfigmap}') -o yaml
    

    Where <project> is the namespace (project) that IBM Cloud Pak for AIOps is deployed in.

  2. Locate the section for elasticsearch in the output, and calculate the additional resources that are needed for the duration of upgrade.

    The additional vCPU needed is replicas multiplied by requests.cpu, and the additional memory needed is replicas multiplied bu requests.memory.

    Example output for an elasticsearch section, where upgrade will need 15 additional vCPU (5 x 3) and 40960 Mi memory (5 x 8192).

    elasticsearch:
         replicas: 5
         resources:
           requests:
             cpu: 3000m
             memory: 8192Mi
           limits:
             cpu: 6000m
             memory: 8192Mi
    

The migration causes approximately 10 minutes of downtime for the following IBM Cloud Pak for AIOps components: log integrations, ticket integrations, log anomaly model training, and the Insights Dashboard. If alert archiving to Elasticsearch is enabled on your deployment, then the downtime for the migration might be extended depending on the quantity of alerts stored.

During upgrade, you can run the following command to monitor the status of the migration:

oc describe installation -n <project>

Where <project> is the project that your IBM Cloud Pak for AIOps installation is deployed in.

Example output for a migration in progress:

Conditions:
    Last Transition Time:            2024-10-29T16:58:55Z
    Message:                         Foreground Elasticsearch migration in progress
    Observed Generation:             2
    Reason:                          ForegroundMigrationInProgress
    Status:                          True
    Type:                            MigratingElastic

When migration is complete, Message has a value of Elasticsearch migration completed.

Note: Some integrations might be marked as Not Running while the migration is in progress. This is expected behavior, and the integrations must not be altered until the migration is complete.

After all services resume function, historic log data is migrated in the background. Historic log data is not accessible for training new models while this migration occurs. The duration of this background migration depends on the quantity of historic log data stored.

Upgrade procedure

Follow these steps to upgrade your online IBM Cloud Pak for AIOps deployment.

  1. Ensure cluster readiness
  2. Update the catalog
  3. Update the operator subscription
  4. Verify the deployment
  5. Post upgrade actions

1. Ensure cluster readiness

Recommended: Take a backup before upgrading. For more information, see Backup and restore.

  1. Ensure that your cluster still meets all of the prerequisites for deployment. For more information, see Planning.

    Ensure that your cluster has the additional resources that are required for upgrade, as outlined in Database migration during upgrade.

  2. Run the IBM Cloud Pak for AIOps prerequisite checker script.

    Run the prerequisite checker script to ensure that your Red Hat OpenShift Container Platform cluster is correctly set up for an IBM Cloud Pak for AIOps upgrade.

    Download the prerequisite checker script from github.com/IBM Opens in a new tab, and run it with the following command:

    ./prereq.sh -n <project> --ignore-allocated
    

    Where <project> is the project that your IBM Cloud Pak for AIOps installation is deployed in.

    Important: If you are installing on a multi-zone cluster, then also specify the -m flag to assess whether there are sufficient resources to withstand a zone outage.

    Note: The prerequisite checker script might show inadequate resources in the Resource Summary because the script does not account for resources already being in use by the upgrading deployment. This can be ignored, as can the following message: [ FAIL ] Small or Large Profile Install Resources.

    Example output:

    # ./prereq.sh -n cp4aiops --ignore-allocated
    [INFO] Starting IBM Cloud Pak for AIOps prerequisite checker v4.8...
    
    CLI: oc
    
    [INFO] =================================Platform Version Check=================================
    [INFO] Checking Platform Type....
    [INFO] You are using Openshift Container Platform
    [INFO] OCP version 4.16.7 is compatible but only nodes with AMD64 architectures are supported at this time. 
    [INFO] =================================Platform Version Check=================================
    
    [INFO] =================================Storage Provider=================================
    [INFO] Checking storage providers
    [INFO] No IBM Storage Fusion Found... Skipping configuration check.
    
    [INFO] No Portworx StorageClusters found with "Running" or "Online" status. Skipping configuration check for Portworx.
    [INFO] Openshift Data Foundation found.
    [INFO] No IBM Cloud Storage found... Skipping configuration check for IBM Cloud Storage Check.
    
    Checking Openshift Data Foundation Configuration...
    Verifying if Red Hat Openshift Data Foundation pods are in "Running" or "Completed" status
    [INFO] Pods in openshift-storage project are "Running" or "Completed"
    [INFO] ocs-storagecluster-ceph-rbd exists.
    [INFO] ocs-storagecluster-cephfs exists.
    [INFO] No warnings or failures found when checking for Storage Providers.
    [INFO] =================================Storage Provider=================================
    
    [INFO] =================================Cert Manager Check=================================
    [INFO] Checking for Cert Manager operator
    
    [INFO] Successfully functioning cert-manager found.
    
    CLUSTERSERVICEVERSION             NAMESPACE
    ibm-cert-manager-operator.v4.2.8  ibm-cert-manager
    
    [INFO] =================================Cert Manager Check=================================
    
    [INFO] =================================Licensing Service Operator Check=================================
    [INFO] Checking for Licensing Service operator
    
    [INFO] Successfully functioning licensing service operator found.
    
    CLUSTERSERVICEVERSION          NAMESPACE
    ibm-licensing-operator.v4.2.8  ibm-licensing
    
    [INFO] =================================Licensing Service Operator Check=================================
    
    [INFO] =================================Starter or Production Install Resources=================================
    [INFO] Checking for cluster resources
    
    [INFO] ==================================Resource Summary=====================================================
    [INFO]                                                     Nodes     |     vCPU       |  Memory(GB)
    [INFO] Starter (Non-HA) Base (available/required)       [  9 / 3 ]   [  144 / 47 ]    [  289 / 123 ]
    [INFO]     (+ Log Anomaly Detection & Ticket Analysis)  [  9 / 3 ]   [  144 / 55 ]    [  289 / 136 ]
    
    [INFO] Production (HA) Base (available/required)        [  9 / 6 ]   [  144 / 136 ]   [  289 / 310 ]
    [INFO]     (+ Log Anomaly Detection & Ticket Analysis)  [  9 / 6 ]   [  144 / 162 ]   [  289 / 368 ]
    [INFO] ==================================Resource Summary=====================================================
    [INFO] Cluster currently has resources available to create a Starter (Non-HA) install of Cloud Pak for AIOps
    
    [INFO] =================================Prerequisite Checker Tool Summary=================================
       [  PASS  ] Platform Version Check 
       [  PASS  ] Storage Provider
       [  PASS  ] Starter (Non-HA) Base Install Resources
       [  FAIL  ] Production (HA) Base Install Resources
       [  PASS  ] Cert Manager Operator Installed
       [  PASS  ] Licensing Service Operator Installed
    [INFO] =================================Prerequisite Checker Tool Summary=================================
    

    Note: If you are not using IBM Cloud Pak® foundational services Cert Manager, then ignore any errors that are returned by the Cert Manager check.

  3. Delete any evicted connector-orchestrator pods.

    1. From the Project list, select the project (namespace) that IBM Cloud Pak for AIOps is deployed in if your deployment is namespace scoped, or openshift-operators if your deployment has a cluster-wide scope.

    2. From the left menu, select Workloads > Pods, and search for connector-orchestrator.

    3. If there are any connector-orchestrator pods with a STATUS of Evicted, then select Delete Pod from the three dots menu at the end of the row.

  4. Run the following command from the command line to ensure the safety of the Postgres database during upgrade.

    Important: Failure to run this step might lead to a complete loss of data.

    oc label clusters.postgresql.k8s.enterprisedb.io -n <project> $(oc get installations.orchestrator.aiops.ibm.com -n <project> -o jsonpath='{.items[].metadata.name}')-edb-postgres operator.ibm.com/opreq-do-not-uninstall='true'
    

    Where <project> is the project that your IBM Cloud Pak for AIOps installation is deployed in.

2. Update the catalog

Update the ibm-aiops-catalog to use the v4.8.0 digest.

  1. Go to Administration > Cluster Settings > Configuration > OperatorHub > Sources.

  2. Select ibm-aiops-catalog, and then click the YAML tab.

  3. Update the value of image to be icr.io/cpopen/ibm-aiops-catalog@sha256:58c0082ad5e9de6bc2869b528e847702f6c183a06ac49a82d64ad8efd339f753

  4. Go to Administration > Cluster Settings > Configuration > OperatorHub > Sources.

  5. Verify that the ibm-aiops-catalog, ibm-cert-manager-catalog and ibm-licensing-catalog CatalogSource objects are present, and that the image for ibm-aiops-catalog is icr.io/cpopen/ibm-aiops-catalog@sha256:58c0082ad5e9de6bc2869b528e847702f6c183a06ac49a82d64ad8efd339f753.

Note: During the upgrade of Cloud Pak for AIOps, Kubernetes jobs might fail and re-run. If a job succeeds on the second or third attempt, there can be one or two pods in Error state and one pod in the Completed state. If the job fails repeatedly, the attempt is abandoned, and the logs from failed pods are used to determine the cause of the failure. When you determine the cause for the failure, you can delete the job, and the operator can recreate it to reattempt the operations.

3. Update the operator subscription

Use the following steps to update the spec.channel value of the IBM Cloud Pak for AIOps subscription to v4.8.

  1. Log in to your OpenShift cluster's console.

  2. Select Home > Search.

  3. From the Project list, select the project (namespace) that your IBM Cloud Pak for AIOps subscription is deployed in. This is your IBM Cloud Pak for AIOps project if your deployment is namespace scoped, or openshift-operators if your deployment has a cluster wide scope.

  4. In the Resources list, select SUB Subscription. A list of subscriptions is displayed.

  5. Click the subscription that has a Name of ibm-aiops-orchestrator. A new window with the subscription details for ibm-aiops-orchestrator is displayed.

  6. Click the value in the Update channel box. A new window called Change Subscription update channel is displayed.

  7. Change the channel to v4.8 and click Save.

  8. If you are installing with a cluster wide scope (AllNamespaces mode), use the following steps to refresh the connectors' secret:

    For more information about installation modes, see Operator installation mode.

    1. Select Home > Search.

    2. In the Resources list, select Secret. A list of secrets is displayed.

    3. Click the three dots at the end of the row that has a name of cp4waiops-connectors-deploy-cert-secret, and click Delete Secret.

4. Verify the deployment

Use the following procedure to check the status of your upgraded IBM Cloud Pak for AIOps deployment. It takes around 60-90 minutes for the upgrade to complete (subject to the speed with which images can be pulled).

  1. Log in to your OpenShift cluster's console.

  2. Click Operators > Installed Operators.

  3. From the Project list, select the project (namespace) that IBM Cloud Pak for AIOps is deployed in if your deployment is namespace scoped, or openshift-operators if your deployment has a cluster-wide scope.

  4. Locate IBM Cloud Pak for AIOps in the list, and verify that the annotation underneath it shows 4.8.0.

  5. Select IBM Cloud Pak for AIOps and then click the IBM Cloud Pak for AIOps tab.

  6. Under Installations, look for the entry with the name that you specified for your IBM Cloud Pak for AIOps instance, and verify that it has a Status of Phase: Running, which means that your deployment is complete and successful.

    (Optional): If you want to see more detail about the status of your deployment's components, select the entry with the name that you specified for your IBM Cloud Pak for AIOps instance, and then switch to the YAML view. Scroll down to the Status section near the end of the YAML. A component's installation is complete and successful when the component has a value of Ready.

    Example YAML:

    status:
      size: small
      customProfileConfigmap: aiops-custom-size-profile
      customProfileValidationStatus: >-
        Custom profile configmap not found, continue installation process without
        customization
      storageclasslargeblock: rook-ceph-rbd
      componentstatus:
        issueresolutioncore: Ready
        kafka: Ready
        aiopsanalyticsorchestrator: Ready
        aiopsedge: Ready
        tunnel: Ready
        lifecycleservice: Ready
        zenservice: Ready
        flinkcluster: Ready
        cluster: Ready
        elasticsearchcluster: Ready
        aiopsui: Ready
        redissentinel: Ready
        <...>
    

    (Optional) You can also download and run a status checker script to see information about the status of your deployment. For more information about how to download and run the script, see github.com/IBMOpens in a new tab.

    If the upgrade fails, or is not complete and is not progressing, then see Troubleshooting installation and upgrade and Known Issues to help you identify any installation problems.

    Important: Wait for the deployment to enter a Running phase before continuing to the next step.

  7. Verify that the database migration was successful.

    1. Switch to the YAML view. Scroll down to the Status section near the end of the YAML. Review the conditions section and verify that the MigratingElastic condition has a status of False and a reason of MigrationComplete.

      Example YAML for a successful migration:

      conditions:
      - lastTransitionTime: '2024-11-28T06:51:58Z'
        message: Elasticsearch migration completed
        observedGeneration: 3
        reason: MigrationComplete
        status: 'False'
        type: MigratingElastic
      

    2. Verify that all data is available in IBM Cloud Pak for AIOps as expected.

    Important: If database migration is not complete, review the troubleshooting entry Database migration is stuck. If this does not resolve the problem, then do not continue to the next step and contact IBM Support.

5. Post upgrade actions

  1. Delete the persistent volumes that were used for storing data in Elasticsearch, the database that was migrated from.

    Run the following command:

    oc delete pvc -l app.kubernetes.io/managed-by=ibm-elastic-operator -n <project>
    

    Where <project> is the project that your IBM Cloud Pak for AIOps installation is deployed in.

  2. If you previously set up backup or restore on your deployment, then you must follow the instructions in Upgrading IBM Cloud Pak for AIOps backup and restore artifacts.

  3. If the EXPIRY_SECONDS environment variable was set for configuring log anomaly alerts, the environment variable was not retained in the upgrade. After the upgrade is completed, set the environment variable again. For more information about setting the variable, see Configuring expiry time for log anomaly alerts.

  4. If you have a metric integration configured that stops working after upgrade, then you must follow the instructions in After upgrade, a metric integration goes into a failed state.