Online upgrade of IBM Cloud Pak for AIOps on Linux

Use these instructions to upgrade an online deployment of IBM Cloud Pak for AIOps 4.12.0 to 4.13.0.

Overview

This procedure is for deployments of IBM Cloud Pak for AIOps on Linux. This procedure can be used on an online deployment of IBM Cloud Pak for AIOps 4.12.0 , and can still be used if the deployment has hotfixes applied.

You cannot use these instructions to upgrade deployments of IBM Cloud Pak for AIOps 4.11.1 or earlier. For more information, see Upgrade paths.

If you have an offline deployment of IBM Cloud Pak for AIOps on Linux, follow the instructions in Offline upgrade of IBM Cloud Pak for AIOps on Linux.

If you have a deployment of IBM Cloud Pak for AIOps on Red Hat OpenShift Container Platform, follow the instructions in Upgrading IBM Cloud Pak for AIOps on OpenShift.

Upgrade paths

IBM Cloud Pak for AIOps uses a versioning system of X.Y.Z, where X=version, Y=release, and Z=patch. For example, IBM Cloud Pak for AIOps 4.11.1 is version 4, release 10, patch 1. IBM Cloud Pak for AIOps upgrades can be release upgrades or patch upgrades.

You can upgrade by a maximum of one release at a time. If the release that you want to upgrade from is more than one release behind the release that you are upgrading to, then you must upgrade sequentially. For example, you cannot upgrade directly from IBM Cloud Pak for AIOps 4.8.0 to v4.13.0, you must use the following sequence:

  1. Use the v4.9.1 documentation to upgrade from IBM Cloud Pak for AIOps 4.8.0 to v4.9.1.
  2. Use the v4.10.1 documentation to upgrade from IBM Cloud Pak for AIOps 4.9.1 to v4.10.1.
  3. Use the v4.11.1 documentation to upgrade from IBM Cloud Pak for AIOps 4.10.1 to v4.11.1.
  4. Use the v4.12.0 documentation to upgrade from IBM Cloud Pak for AIOps 4.11.1 to v4.12.0.
  5. Use this documentation, the v4.13.0 documentation, to upgrade from IBM Cloud Pak for AIOps 4.12.0 to v4.13.0.

Before you begin

Ensure that you meet the following prerequisites:
  • The worker nodes and the client machine that you are running the upgrade from have network connectivity to the control plane nodes.
  • You have the credentials for the root user. Root user must be used to upgrade IBM Cloud Pak for AIOps.

The aiopsctl tool is supported only on x86_64 (amd64) architecture.

Warning:
  • Custom patches, labels, and manual adjustments to IBM Cloud Pak for AIOps resources (such as increased CPU and memory values) are lost when IBM Cloud Pak for AIOps is upgraded, and must be manually reapplied after upgrade. Upgrade triggers a reconciliation which causes manually implemented adjustments to be reverted to their original default values.
  • The upgrade cannot be removed or rolled back.
  • If you use Instana to monitor IBM Cloud Pak for AIOps, take the following actions to avoid operational issues and installation and upgrade failures:

Upgrade considerations

The following table gives an indication of the duration of the v4.13.0 upgrade relative to previous upgrades of IBM Cloud Pak for AIOps, and whether more resources are needed.
Upgrade path Upgrade duration and downtime Resources required for upgrade
4.12.0 -> 4.13.0 Average More
Important: For more information about the resource requirements for IBM Cloud Pak for AIOps 4.13.0, see Overall cluster requirements.

1. Ensure cluster readiness

Recommended: Take a backup of IBM Cloud Pak for AIOps before you upgrade to v4.13.0. For more information, see Back up and restore (IBM Cloud Pak for AIOps on Linux).
  1. Ensure that your cluster still meets all the prerequisites for deployment. For more information, see Planning an installation of IBM Cloud Pak for AIOps on Linux.

  2. Set environment variables.

    Run the following command from the directory that the script is in, to set the environment variables that are used later.
    . ./aiops_var.sh
    Note: If you do not still have aiops_var.sh from when you installed IBM Cloud Pak for AIOps, then follow the instructions in Create environment variables.
  3. Delete any evicted connector-orchestrator pods.

    1. Run the following command to check whether there are any evicted connector-orchestrator pods.
      ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} oc get pods -n aiops | grep connector-orchestrator
    2. Cleanup any evicted connector-orchestrator pods.

      If the previous command returned any pods with a STATUS of Evicted, then run the following command to delete each of them.
      ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} oc delete pod -n aiops <connector_orchestrator>

      Where <connector_orchestrator> is a pod returned by the previous step.

2. Update the aiopsctl tool on the cluster nodes

Update the aiopsctl tool on all the cluster nodes.

Note: When the command to update the first control plane node is run, you are prompted whether you want to accept the upgrade. Subsequent nodes use the -y flag to bypass this prompt.
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl update --version v4.13.0

echo "Upgrading main control plane node ${CONTROL_PLANE_NODE}"
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl cluster node up --accept-license=${ACCEPT_LICENSE} --role=control-plane

echo "Upgrading additional control plane nodes"
for CP_NODE in "${ADDITIONAL_CONTROL_PLANE_NODES[@]}"; do
  ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${CP_NODE} aiopsctl update -y --version v4.13.0
  ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${CP_NODE} aiopsctl cluster node up -y --accept-license=${ACCEPT_LICENSE} --role=control-plane
done

echo "Upgrading worker nodes"
for WORKER_NODE in "${WORKER_NODES[@]}"; do
  ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${WORKER_NODE} aiopsctl update -y --version v4.13.0
  ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${WORKER_NODE} aiopsctl cluster node up -y --accept-license=${ACCEPT_LICENSE} --role=worker
done

3. Upgrade IBM Cloud Pak for AIOps

Run the following command to upgrade IBM Cloud Pak for AIOps to v4.13.0.
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl server up --load-balancer-host="${LOAD_BALANCER_HOST}" --mode "${DEPLOY_TYPE}"
Note: You are prompted whether you want to accept the upgrade.

The aiopsctl server up command checks for expired and soon to expire certificates. If any certificates are identified as expired or expiring soon, they must be renewed. You can use the cert-manager command line tool to renew the necessary certificates.

4. Verify your deployment

The upgrade takes around an hour to complete. If the upgrade is unsuccessful, an error message is displayed and a nonzero exit code is returned.

  1. Check the status of your deployment.

    Run the following command to check the status of the components of your IBM Cloud Pak for AIOps installation:
    aiopsctl status
    Example output for a healthy deployment:
    $ aiopsctl status
    o- [12 Aug 24 08:40 PDT] Getting cluster status
    Control Plane Node(s):
        test-server-1.acme.com Ready
        test-server-2.acme.com Ready
        test-server-3.acme.com Ready
    
    Worker Node(s):
        test-agent-1.acme.com Ready
        test-agent-2.acme.com Ready
        test-agent-3.acme.com Ready
        test-agent-4.acme.com Ready
        test-agent-5.acme.com Ready
        test-agent-6.acme.com Ready
        test-agent-7.acme.com Ready
    
    o- [13 Mar 26 08:40 PDT] Checking AIOps installation status
    
      17 Ready Components
        cassandra
        commonservice
        aimanager
        cluster.aiops-orchestrator-postgres
        aiopsui
        zenservice
        cluster.opensearch
        aiopsedge
        baseui
        lifecycletrigger
        lifecycleservice
        rediscp
        aiopsanalyticsorchestrator
        kafka
        issueresolutioncore
        zookeeper
        asm
    
      AIOps installation healthy
    
  2. Run the following command and check that the VERSION that is returned is 4.13.0.
    aiopsctl server version

If the upgrade fails, or is not complete and is not progressing, then see Troubleshooting installation and upgrade and Known Issues to help you identify any installation problems.

5. Post upgrade actions

  1. Trigger a metric anomaly detection training run.

    The detection of metric anomalies might fail silently without warning until a metric anomaly detection training run is done.

    If you have metric anomaly detection configured, then you must manually trigger a metric anomaly detection training run. For more information about triggering a training run, see Setting up training for metric anomaly detection.

    If you do not have metric anomaly detection configured, then proceed to the next step.

  2. If you previously took a backup of your deployment, it is recommended that you take a new back up. For more information, see Back up and restore (IBM Cloud Pak for AIOps on Linux).

  3. If the EXPIRY_SECONDS environment variable was set for configuring log anomaly alerts, the environment variable was not retained in the upgrade. After the upgrade is completed, set the environment variable again. For more information about setting the variable, see Configuring expiry time for log anomaly alerts.