Online upgrade of IBM Cloud Pak for AIOps on Linux
Use these instructions to upgrade an online deployment of IBM Cloud Pak for AIOps 4.12.0 to 4.13.0.
Overview
This procedure is for deployments of IBM Cloud Pak for AIOps on Linux. This procedure can be used on an online deployment of IBM Cloud Pak for AIOps 4.12.0 , and can still be used if the deployment has hotfixes applied.
You cannot use these instructions to upgrade deployments of IBM Cloud Pak for AIOps 4.11.1 or earlier. For more information, see Upgrade paths.
If you have an offline deployment of IBM Cloud Pak for AIOps on Linux, follow the instructions in Offline upgrade of IBM Cloud Pak for AIOps on Linux.
If you have a deployment of IBM Cloud Pak for AIOps on Red Hat OpenShift Container Platform, follow the instructions in Upgrading IBM Cloud Pak for AIOps on OpenShift.
Upgrade paths
IBM Cloud Pak for AIOps uses a versioning system of X.Y.Z, where X=version, Y=release, and Z=patch. For example, IBM Cloud Pak for AIOps 4.11.1 is version 4, release 10, patch 1. IBM Cloud Pak for AIOps upgrades can be release upgrades or patch upgrades.
You can upgrade by a maximum of one release at a time. If the release that you want to upgrade from is more than one release behind the release that you are upgrading to, then you must upgrade sequentially. For example, you cannot upgrade directly from IBM Cloud Pak for AIOps 4.8.0 to v4.13.0, you must use the following sequence:
- Use the v4.9.1 documentation to upgrade from IBM Cloud Pak for AIOps 4.8.0 to v4.9.1.
- Use the v4.10.1 documentation to upgrade from IBM Cloud Pak for AIOps 4.9.1 to v4.10.1.
- Use the v4.11.1 documentation to upgrade from IBM Cloud Pak for AIOps 4.10.1 to v4.11.1.
- Use the v4.12.0 documentation to upgrade from IBM Cloud Pak for AIOps 4.11.1 to v4.12.0.
- Use this documentation, the v4.13.0 documentation, to upgrade from IBM Cloud Pak for AIOps 4.12.0 to v4.13.0.
Before you begin
- The worker nodes and the client machine that you are running the upgrade from have network connectivity to the control plane nodes.
- You have the credentials for the root user. Root user must be used to upgrade IBM Cloud Pak for AIOps.
The aiopsctl tool is supported only on x86_64
(amd64) architecture.
- Custom patches, labels, and manual adjustments to IBM Cloud Pak for AIOps resources (such as increased CPU and memory values) are lost when IBM Cloud Pak for AIOps is upgraded, and must be manually reapplied after upgrade. Upgrade triggers a reconciliation which causes manually implemented adjustments to be reverted to their original default values.
- The upgrade cannot be removed or rolled back.
- If you use Instana to monitor IBM Cloud Pak for AIOps, take the following actions to avoid operational issues and installation and upgrade failures:
- Ensure that Instana AutoTrace is disabled for the IBM Cloud Pak for AIOps namespace. For more information, see Instana AutoTrace causes pod eviction and prevents install and upgrade.
- Increase the ephemeral storage for the Flink pods. For more information, see Monitoring with Instana can cause pod eviction if there is insufficient ephemeral storage.
Upgrade considerations
| Upgrade path | Upgrade duration and downtime | Resources required for upgrade |
|---|---|---|
| 4.12.0 -> 4.13.0 | Average | More |
Upgrade procedure
1. Ensure cluster readiness
-
Ensure that your cluster still meets all the prerequisites for deployment. For more information, see Planning an installation of IBM Cloud Pak for AIOps on Linux.
-
Set environment variables.
Run the following command from the directory that the script is in, to set the environment variables that are used later.. ./aiops_var.shNote: If you do not still haveaiops_var.shfrom when you installed IBM Cloud Pak for AIOps, then follow the instructions in Create environment variables. -
Delete any evicted
connector-orchestratorpods.-
Run the following command to check whether there are any evicted
connector-orchestratorpods.ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} oc get pods -n aiops | grep connector-orchestrator -
Cleanup any evicted connector-orchestrator pods.
If the previous command returned any pods with a STATUS ofEvicted, then run the following command to delete each of them.ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} oc delete pod -n aiops <connector_orchestrator>Where
<connector_orchestrator>is a pod returned by the previous step.
-
2. Update the aiopsctl tool on the cluster nodes
Update the aiopsctl tool on all the cluster
nodes.
-y flag
to bypass this prompt.ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl update --version v4.13.0
echo "Upgrading main control plane node ${CONTROL_PLANE_NODE}"
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl cluster node up --accept-license=${ACCEPT_LICENSE} --role=control-plane
echo "Upgrading additional control plane nodes"
for CP_NODE in "${ADDITIONAL_CONTROL_PLANE_NODES[@]}"; do
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${CP_NODE} aiopsctl update -y --version v4.13.0
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${CP_NODE} aiopsctl cluster node up -y --accept-license=${ACCEPT_LICENSE} --role=control-plane
done
echo "Upgrading worker nodes"
for WORKER_NODE in "${WORKER_NODES[@]}"; do
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${WORKER_NODE} aiopsctl update -y --version v4.13.0
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${WORKER_NODE} aiopsctl cluster node up -y --accept-license=${ACCEPT_LICENSE} --role=worker
done
3. Upgrade IBM Cloud Pak for AIOps
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl server up --load-balancer-host="${LOAD_BALANCER_HOST}" --mode "${DEPLOY_TYPE}"The aiopsctl server up command checks for expired
and soon to expire certificates. If any certificates are identified
as expired or expiring soon, they must be renewed. You can use the
cert-manager
command line tool to renew the necessary certificates.
4. Verify your deployment
The upgrade takes around an hour to complete. If the upgrade is unsuccessful, an error message is displayed and a nonzero exit code is returned.
-
Check the status of your deployment.
Run the following command to check the status of the components of your IBM Cloud Pak for AIOps installation:aiopsctl statusExample output for a healthy deployment:$ aiopsctl status o- [12 Aug 24 08:40 PDT] Getting cluster status Control Plane Node(s): test-server-1.acme.com Ready test-server-2.acme.com Ready test-server-3.acme.com Ready Worker Node(s): test-agent-1.acme.com Ready test-agent-2.acme.com Ready test-agent-3.acme.com Ready test-agent-4.acme.com Ready test-agent-5.acme.com Ready test-agent-6.acme.com Ready test-agent-7.acme.com Ready o- [13 Mar 26 08:40 PDT] Checking AIOps installation status 17 Ready Components cassandra commonservice aimanager cluster.aiops-orchestrator-postgres aiopsui zenservice cluster.opensearch aiopsedge baseui lifecycletrigger lifecycleservice rediscp aiopsanalyticsorchestrator kafka issueresolutioncore zookeeper asm AIOps installation healthy -
Run the following command and check that the VERSION that is returned is 4.13.0.
aiopsctl server version
If the upgrade fails, or is not complete and is not progressing, then see Troubleshooting installation and upgrade and Known Issues to help you identify any installation problems.
5. Post upgrade actions
-
Trigger a metric anomaly detection training run.
The detection of metric anomalies might fail silently without warning until a metric anomaly detection training run is done.
If you have metric anomaly detection configured, then you must manually trigger a metric anomaly detection training run. For more information about triggering a training run, see Setting up training for metric anomaly detection.
If you do not have metric anomaly detection configured, then proceed to the next step.
-
If you previously took a backup of your deployment, it is recommended that you take a new back up. For more information, see Back up and restore (IBM Cloud Pak for AIOps on Linux).
-
If the EXPIRY_SECONDS environment variable was set for configuring log anomaly alerts, the environment variable was not retained in the upgrade. After the upgrade is completed, set the environment variable again. For more information about setting the variable, see Configuring expiry time for log anomaly alerts.