Offline upgrade of IBM Cloud Pak for AIOps on Linux
Use these instructions to upgrade an offline deployment of IBM Cloud Pak® for AIOps 4.7.0 or later to 4.8.0.
Overview
This procedure is for deployments of IBM Cloud Pak for AIOps on Linux®. This procedure can be used on an online deployment of IBM Cloud Pak for AIOps 4.7.0 or later, and can still be used if the deployment has hotfixes applied.
You cannot use these instructions to upgrade deployments of IBM Cloud Pak for AIOps 4.6.1 or earlier. Upgrade of deployments of IBM Cloud Pak® for AIOps on Linux older than 4.7.0 is not supported. If you have a deployment of IBM Cloud Pak for AIOps older than 4.7.0, then you must uninstall it and then install 4.8.0.
If you have an online deployment of IBM Cloud Pak for AIOps on Linux, follow the instructions in Online upgrade of IBM Cloud Pak for AIOps on Linux.
If you have a deployment of IBM Cloud Pak for AIOps on Red Hat® OpenShift® Container Platform, follow the instructions in Upgrading IBM Cloud Pak for AIOps on OpenShift.
Before you begin
Ensure that you meet the following prerequisites:
- The worker nodes and the client machine that you are running the upgrade from have network connectivity to the control plane nodes.
- You have the credentials for the root user. Root user must be used to upgrade IBM Cloud Pak for AIOps.
Warnings:
- Custom patches, labels, and manual adjustments to IBM Cloud Pak for AIOps resources are lost when IBM Cloud Pak for AIOps is upgraded, and must be manually reapplied after upgrade. For more information, see Manual adjustments are not persisted.
- The upgrade cannot be removed or rolled back.
Upgrade procedure
Follow these steps to upgrade your offline IBM Cloud Pak for AIOps deployment.
1. Ensure cluster readiness
Recommended: Take a backup of IBM Cloud Pak for AIOps before you upgrade to v4.8.0. For more information, see Back up and restore (IBM Cloud Pak for AIOps on Linux).
-
Ensure that your cluster still meets all of the prerequisites for an air-gapped deployment. For more information, see Planning an installation of IBM Cloud Pak for AIOps on Linux and Set up the mirroring environment.
-
Set environment variables.
-
Set an environment variable for your deployment mode.
From IBM Cloud Pak for AIOps 4.8.0, deployment mode is configured with a command line argument instead of with a configuration file.
Edit the
aiops_var.sh
file that you created when you installed IBM Cloud Pak for AIOps, and update theDEPLOY_TYPE
environment variable to the required value. Set toextended
if you have an extended deployment with log anomaly detection and ticket analysis capabilities, or set tobase
if you have a base deployment without log anomaly detection and ticket analysis capabilities. -
Run the following command from the directory that the script is in, to set the environment variables that are used later.
. ./aiops_var.sh
Note: If you do not still have
aiops_var.sh
from when you installed IBM Cloud Pak for AIOps, then follow the instructions in Create environment variables. -
-
Delete any evicted
connector-orchestrator
pods.-
Run the following command to check whether there are any evicted
connector-orchestrator
pods.ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} oc get pods -n aiops | grep connector-orchestrator
-
Cleanup any evicted connector-orchestrator pods.
If the previous command returned any pods with a STATUS of
Evicted
, then run the following command to delete each of them.ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} oc delete pod -n aiops <connector_orchestrator>
Where
<connector_orchestrator>
is a pod returned by the previous step.
-
2. Mirror the images
-
Connect your bastion host to the internet and disconnect it from the air-gapped environment.
-
Run the following commands to download the new version of the
aiopsctl
tool and mirror the new images.AIOPSCTL_TAR="aiopsctl-linux_amd64.tar.gz" AIOPSCTL_INSTALL_URL="https://github.com/IBM/aiopsctl/releases/download/v4.8.0/${AIOPSCTL_TAR}" curl -LO "${AIOPSCTL_INSTALL_URL}" tar xf "${AIOPSCTL_TAR}" mv aiopsctl /usr/local/bin/aiopsctl aiopsctl bastion login cp.icr.io -u cp -p ${IBM_ENTITLEMENT_KEY} aiopsctl bastion login ${TARGET_REGISTRY} -u ${TARGET_REGISTRY_USER} -p ${TARGET_REGISTRY_PASSWORD} --insecure-skip-tls-verify="${SKIP_TLS}" unset IBM_ENTITLEMENT_KEY aiopsctl bastion mirror-images --registry ${TARGET_REGISTRY}
You can use the following command to monitor the mirroring process:
tail -f /root/.aiopsctl/logs/aiopsctl.log
-
Disconnect the bastion host from the internet, and connect it to the air-gapped environment.
3. Update the aiopsctl tool on the cluster nodes
Run the following commands to update the aiopsctl
tool on all the cluster nodes.
Note: When the command to update the first control plane node is run, you are prompted whether you want to accept the upgrade. Subsequent nodes use the -y
flag to bypass this prompt.
scp /usr/local/bin/aiopsctl ${TARGET_USER}@${CONTROL_PLANE_NODE}:/usr/local/bin/aiopsctl
echo "Upgrading main control plane node ${CONTROL_PLANE_NODE}"
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl cluster node up --accept-license=${ACCEPT_LICENSE} --role=control-plane --offline
echo "Upgrading additional control plane nodes"
for CP_NODE in "${ADDITIONAL_CONTROL_PLANE_NODES[@]}"; do
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} scp /usr/local/bin/aiopsctl ${TARGET_USER}@${CP_NODE}:/usr/local/bin/aiopsctl
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${CP_NODE} aiopsctl cluster node up --accept-license=${ACCEPT_LICENSE} -y --role=control-plane --offline
done
echo "Upgrading worker nodes"
for WORKER_NODE in "${WORKER_NODES[@]}"
do
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} scp /usr/local/bin/aiopsctl ${TARGET_USER}@${WORKER_NODE}:/usr/local/bin/aiopsctl
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${WORKER_NODE} aiopsctl cluster node up --accept-license=${ACCEPT_LICENSE} -y --role=worker --offline
done
4. Upgrade IBM Cloud Pak for AIOps
Run the following command to upgrade IBM Cloud Pak for AIOps to v4.8.0.
ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl server up --load-balancer-host="${LOAD_BALANCER_HOST}" --mode "${DEPLOY_TYPE}"
Note: You are prompted whether you want to accept the upgrade.
5. Verify your deployment
The upgrade takes around an hour to complete. If the upgrade is unsuccessful, an error message is displayed and a nonzero exit code is returned.
-
Check the status of your deployment.
Run the following command to check the status of the components of your IBM Cloud Pak for AIOps installation:
aiopsctl status
Example output for a healthy deployment:
$ aiopsctl status o- [12 Aug 24 08:40 PDT] Getting cluster status Control Plane Node(s): test-server-1.acme.com Ready test-server-2.acme.com Ready test-server-3.acme.com Ready Worker Node(s): test-agent-1.acme.com Ready test-agent-2.acme.com Ready test-agent-3.acme.com Ready test-agent-4.acme.com Ready test-agent-5.acme.com Ready test-agent-6.acme.com Ready test-agent-7.acme.com Ready o- [12 Aug 24 08:40 PDT] Checking AIOps installation status 15 Ready Components cluster aimanager lifecycletrigger aiopsanalyticsorchestrator baseui elasticsearchcluster aiopsedge rediscp asm zenservice aiopsui commonservice kafka issueresolutioncore lifecycleservice AIOps installation healthy
-
Run the following command and check that the VERSION that is returned is 4.8.0.
aiopsctl server version
If the upgrade fails, or is not complete and is not progressing, then see Troubleshooting installation and upgrade and Known Issues to help you identify any installation problems.
6. Post upgrade actions
-
If you previously took a backup of your deployment, it is recommended that you take a new back up. For more information, see Back up and restore (IBM Cloud Pak for AIOps on Linux).
-
If the EXPIRY_SECONDS environment variable was set for configuring log anomaly alerts, the environment variable was not retained in the upgrade. After the upgrade is completed, set the environment variable again. For more information about setting the variable, see Configuring expiry time for log anomaly alerts.
-
If you have a metric integration configured that stops working after upgrade, then you must follow the instructions in After upgrade, a metric integration goes into a failed state.
-
(Optional) You can use the following steps to remove unnecessary data from your Cloud Pak for AIOps environment:
Note: Use the following steps if high availability (HA) is enabled for your Cloud Pak for AIOps deployment.
Run the following commands from the control plane node.
-
Switch to the project (namespace) where Cloud Pak for AIOps is deployed.
oc project aiops
-
Verify the health of your Cloud Pak for AIOps deployment:
oc get installation -o go-template='$i:=index .items 0range $c,$s := $i.status.componentstatus$c": "$s"\n"end'
All the components need to be in
Ready
status. -
Delete the zookeeper data by running the following four commands:
oc exec iaf-system-zookeeper-0 – /opt/kafka/bin/zookeeper-shell.sh 127.0.0.1:12181 deleteall /flink/aiops/ir-lifecycle oc exec iaf-system-zookeeper-0 – /opt/kafka/bin/zookeeper-shell.sh 127.0.0.1:12181 deleteall /flink/aiops/ir-lifecycle2 oc exec iaf-system-zookeeper-0 – /opt/kafka/bin/zookeeper-shell.sh 127.0.0.1:12181 deleteall /flink/aiops/ir-lifecycle3 oc exec iaf-system-zookeeper-0 – /opt/kafka/bin/zookeeper-shell.sh 127.0.0.1:12181 deleteall /flink/aiops/cp4waiops-eventprocessor
-
Delete the Issue Resolution (IR) lifecycle metadata by running the following three commands:
img=$(oc get csv -o jsonpath='{.items[?(@.spec.displayName=="IBM AIOps AI Manager")].spec.install.spec.deployments[?(@.name=="aimanager-operator-controller-manager")].spec.template.metadata.annotations.olm\.relatedImage\.opencontent-minio-client}') minio=$(oc get flinkdeployment aiops-ir-lifecycle-flink -o jsonpath='{.spec.flinkConfiguration.s3\.endpoint}') oc delete job --ignore-not-found aiops-clean-s3 cat <<EOF | oc apply --validate -f - apiVersion: batch/v1 kind: Job metadata: name: aiops-clean-s3 spec: backoffLimit: 6 parallelism: 1 template: metadata: labels: component: aiops-clean-s3 name: clean-s3 spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/arch operator: In values: - amd64 containers: - command: - /bin/bash - -c - |- echo "Connecting to Minio server: $minio" try=0 while true; do mc alias set aiopss3 $minio \$(cat /config/accesskey) \$(cat /config/secretkey) if [ \$? -eq 0 ]; then break; fi try=\$(expr \$try + 1) if [ \$try -ge 30 ]; then exit 1; fi sleep 2 done /workdir/bin/mc rm -r --force aiopss3/aiops-ir-lifecycle/high-availability/ir-lifecycle x=$? /workdir/bin/mc ls aiopss3/aiops-ir-lifecycle/high-availability exit \$x image: $img imagePullPolicy: IfNotPresent name: clean-s3 resources: limits: cpu: 500m memory: 512Mi requests: cpu: 200m memory: 256Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: false runAsNonRoot: true volumeMounts: - name: s3-credentials mountPath: /config - name: s3-ca mountPath: /workdir/home/.mc/certs/CAs volumes: - name: s3-credentials secret: secretName: aimanager-ibm-minio-access-secret - name: s3-ca secret: items: - key: ca.crt path: ca.crt secretName: aimanager-certificate-secret restartPolicy: Never serviceAccount: aimanager-workload-admin serviceAccountName: aimanager-workload-admin EOF
-
Check the status of the job:
oc get po -l component=aiops-clean-s3
Verify that the status shows as
Completed
.
-