Offline upgrade of IBM Cloud Pak for Watson AIOps
Use these instructions to upgrade IBM Cloud Pak® for Watson AIOps 3.7.0 or later to 4.1.0.
This procedure can be used on an offline deployment of IBM Cloud Pak for Watson AIOps 3.7.0 or later, and can still be used if the deployment has had hotfixes applied. If you have an online deployment, follow the instructions in Upgrading IBM Cloud Pak for Watson AIOps (online).
Before you begin
- Ensure that you are logged in to your Red Hat® OpenShift® Container Platform cluster with
oc login
for any steps that use the OpenShift command-line interface (CLI). - Red Hat OpenShift Container Platform requires a user with
cluster-admin
privileges for the following operations:
Warnings:
- Custom patches, labels, and manual adjustments to IBM Cloud Pak for Watson AIOps resources are lost when IBM Cloud Pak for Watson AIOps is upgraded, and must be manually reapplied after upgrade. For more information, see Manual adjustments are not persisted.
- If you previously increased the size of the Kafka PVC directly, then you must follow the correct procedure that is supplied in Increasing the Kafka PVC to ensure that the size is updated by the operator. Failure to do so before upgrading IBM Cloud Pak for Watson AIOps causes the operator to attempt to restore a lower default value for the Kafka PVC, and causes an error in your IBM Cloud Pak for Watson AIOps deployment.
Restrictions:
- You cannot use these instructions to upgrade deployments of IBM Cloud Pak for Watson AIOps 3.6.2 or earlier. For example, you cannot upgrade from IBM Cloud Pak for Watson AIOps 3.6.0 or 3.6.2 to 4.1.0.
- The upgrade cannot be removed or rolled back.
- If you are planning to upgrade to Red Hat OpenShift Container Platform 4.12 as part of an upgrade to IBM Cloud Pak for Watson AIOps 4.1.0, you must complete the IBM Cloud Pak for Watson AIOps upgrade before you upgrade to Red Hat OpenShift Container Platform 4.12.
Upgrade procedure
Follow these steps to upgrade your offline IBM Cloud Pak for Watson AIOps deployment.
- Ensure cluster readiness
- Download CASE files
- Mirror images
- Update the catalog
- Update foundational services
- Scale down AIOpsEdge and AIOps orchestrator
- Create a network policy for log anomaly detection
- Update the operator subscription
- Verify the deployment
- Increase Kafka storage size
- Post upgrade actions
1. Ensure cluster readiness
Recommended: Take a backup before upgrading. For more information, see Backup and restore.
-
Ensure that your cluster still meets all of the prerequisites for an air-gapped deployment.
In IBM Cloud Pak for Watson AIOps 4.1.0, the storage requirements for Kafka have increased to 300 GB (3 persistent volumes (PVs) of 100 GB each) for production deployments, and to 60 GB for starter deployments. Your PVs are already configured with volume expansion enabled, as stated in the Storage class requirements, but you must ensure that there is adequate space for the Kafka PVs to expand before you commence upgrade. You will run the command to increase the storage allocation for Kafka in step 10, Increase Kafka storage size.
Review the steps for your installation approach:
- Bastion host: Prerequisites
- Portable device: Prerequisites
Note: IBM Cloud Pak for Watson AIOps requires that Red Hat OpenShift Container Platform mmust be version 4.10.46 or higher.
-
Install the IBM Catalog Management Plug-in for IBM Cloud Pak®, if you do not have it installled yet.
-
Bastion host: Install the IBM Catalog Management Plug-in for IBM Cloud Pak®.
-
Portable device: Install the IBM Catalog Management Plug-in for IBM Cloud Pak®.
-
-
Download scripts.
-
Download the prerequisite checker script and copy it to your air-gapped environment.
For more information about the script, including how to download and run it, see github.com/IBM
.
-
Download the Common Services upgrade script,
upgrade_common_services_airgap.sh
, and copy it to your air-gapped environment.For more information about the script, including how to download and run it, see github.com/IBM
.
-
Download the IBM Cloud Pak for Watson AIOps uninstall and cleanup script, and copy it to your air-gapped environment.
For more information about the script, including how to download and run it, see github.com/IBM
.
-
(Optional) Download the status checker script, and copy it to your air-gapped environment.
For more information about the script, including how to download and run it, see github.com/IBM
. The status checker script can be used in step 9. Verify the deployment to give information about the status of your deployment. The use of this script is optional, as status can be found directly from the
ibm-aiops-orchestrator
custom resource.
-
-
Run the IBM Cloud Pak for Watson AIOps prerequisite checker script.
Run the prerequisite checker script to ensure that your Red Hat OpenShift Container Platform cluster is correctly set up for an IBM Cloud Pak for Watson AIOps upgrade. When you run the prerequisite checker script, you must run the script in the same project (namespace) that IBM Cloud Pak for Watson AIOps is installed in.
2. Download CASE files
On your OpenShift cluster, rerun step 2 of the air-gap installation procedure Set environment variables and download CASE files to download the latest CASE files.
Follow the steps for your installation approach:
-
Bastion host: Download the CASE
-
Portable device: Download the CASE
3. Mirror images
Rerun step 3 of the air-gap installation procedure to mirror the updated images to the offline registry.
Follow the steps for your installation approach:
-
Bastion host: Mirror images
-
Portable device: Mirror images
4. Update the catalog
Rerun step 5.1 of the air-gap installation procedure Create the catalog source to update your catalog source.
Follow the steps for your installation approach:
-
Bastion host: Create the catalog source
-
Portable device: Create the catalog source
5. Update foundational services
IBM Cloud Pak® foundational services, which is part of your IBM Cloud Pak for Watson AIOps deployment, must be at version 3.23 or higher before you upgrade IBM Cloud Pak for Watson AIOps.
Use the following steps to verify that your ibm-common-service-operator
subscription is set to version 3.23 or higher, and to set it to a qualifying version if it is not.
Note: This section uses the scripts that you downloaded earlier in section 1, Ensure cluster readiness.
-
Run the following command to find out what version of foundational services you have installed.
oc get csv -A | grep ibm-common-service-operator
If the version returned is v3.23 or higher, then you do not need to update foundational services and you must skip the rest of this section and proceed to step 6, Scale down AIOpsEdge and AIOps orchestrator.
-
Run the following command from the directory that you downloaded the Common Services upgrade script to. This script must be run by a user with
cluster-admin
privilege../cp4waiops-samples/upgrade/upgrade_common_services_airgap.sh -a -c v3.23
Important: You must only run this script if your version of foundational services is less than v3.23.
-
When
upgrade_common_services_airgap.sh
completes, verify that theibm-common-service-operator
channel is set to version 3.23 or higher in the subscription and in the ClusterServiceVersion (csv).oc get subscription ibm-common-service-operator -n ibm-common-services -o jsonpath='{.spec.channel}{"\n"}' oc get csv -A | grep ibm-common-service-operator
The foundational services upgrade commences, and will take approximately 30 - 60 minutes.
You can run the following command to check the status of
ZenService
. When the foundational services upgrade is complete, this command will have a STATUS ofCompleted
. Do not proceed until the upgrade has completed.oc get zenservice -A -o custom-columns='KIND:.kind,NAME:.metadata.name,NAMESPACE:.metadata.namespace,VERSION:status.currentVersion,STATUS:.status.zenStatus,PROGRESS:.status.Progress,MESSAGE:.status.ProgressMessage'
Example output from a successful foundational services upgrade:
KIND NAME NAMESPACE VERSION STATUS PROGRESS MESSAGE ZenService iaf-zen-cpdservice cp4waiops 4.8.0 Completed 100% The Current Operation Is Completed
6. Scale down AIOpsEdge and AIOps orchestrator
-
Scale AIOpsEdge and AIOps orchestrator down to zero replicas.
Run the following commands:
oc scale deployment aiopsedge-operator-controller-manager -n <namespace> --replicas=0 oc scale deployment ibm-aiops-orchestrator-controller-manager -n <namespace> --replicas=0
Where
<namespace>
is the namespace (project) that your IBM Cloud Pak for Watson AIOps installation is deployed in. -
Wait for the AIOpsEdge operator and AIOps orchestrator pods to terminate.
Do not proceed until the following command does not return any pods.
oc get pod -n <namespace> | egrep "aiopsedge-operator-controller-manager|ibm-aiops-orchestrator-controller-manager"
Where
<namespace>
is the namespace (project) that your IBM Cloud Pak for Watson AIOps installation is deployed in.
7. Create a network policy for log anomaly detection
If you plan to use log anomaly for new or existing log connections, run the following commands. Replace the AIOPS_NAMESPACE
value with the name of the project in which Cloud Pak for Watson AIOps is installed.
AIOPS_NAMESPACE="cp4waiops"
cat << EOF | oc apply -n $AIOPS_NAMESPACE -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
app: flink
cluster: cp4waiops-eventprocessor-eve-29ee-ep
component: taskmanager
name: cp4waiops-eventprocessor-eve-29ee-ep-tm-patch
spec:
egress:
- {}
ingress:
- from:
- podSelector:
matchLabels:
app: flink
cluster: cp4waiops-eventprocessor-eve-29ee-ep
component: taskmanager
- podSelector:
matchLabels:
app: flink
cluster: cp4waiops-eventprocessor-eve-29ee-ep
component: jobmanager
- ports:
- port: 9248
protocol: TCP
- port: 6122
protocol: TCP
- port: 6121
protocol: TCP
podSelector:
matchLabels:
app: flink
cluster: cp4waiops-eventprocessor-eve-29ee-ep
component: taskmanager
policyTypes:
- Ingress
- Egress
EOF
cat << EOF | oc apply -n $AIOPS_NAMESPACE -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
app: flink
cluster: cp4waiops-eventprocessor-eve-29ee-ep
component: jobmanager
name: cp4waiops-eventprocessor-eve-29ee-ep-jm-patch
spec:
egress:
- {}
ingress:
- from:
- podSelector:
matchLabels:
app: flink
cluster: cp4waiops-eventprocessor-eve-29ee-ep
component: taskmanager
- podSelector:
matchLabels:
app: flink
cluster: cp4waiops-eventprocessor-eve-29ee-ep
component: jobmanager
- ports:
- port: 8081
protocol: TCP
- port: 6123
protocol: TCP
- port: 6125
protocol: TCP
- port: 8080
protocol: TCP
- port: 6124
protocol: TCP
- port: 9249
protocol: TCP
podSelector:
matchLabels:
app: flink
cluster: cp4waiops-eventprocessor-eve-29ee-ep
component: jobmanager
policyTypes:
- Ingress
- Egress
EOF
8. Update the operator subscription
Run the following commands to update the IBM Cloud Pak for Watson AIOps subscription and the operandregistry
resources to use the new channel, v4.1.
oc project <namespace>
oc patch operandregistry aiopsedge-base --type=json -p='[{'op': 'replace', 'path': '/spec/operators/1/channel', 'value': 'v4.1'}]'
oc patch operandregistry ibm-aiops --type=json -p='[
{'op': 'replace', 'path': '/spec/operators/0/channel', 'value': 'v4.1'},
{'op': 'replace', 'path': '/spec/operators/1/channel', 'value': 'v4.1'},
{'op': 'replace', 'path': '/spec/operators/2/channel', 'value': 'v4.1'},
{'op': 'replace', 'path': '/spec/operators/3/channel', 'value': 'v4.1'},
{'op': 'replace', 'path': '/spec/operators/4/channel', 'value': 'v4.1'},
{'op': 'replace', 'path': '/spec/operators/5/channel', 'value': 'v4.1'},
{'op': 'replace', 'path': '/spec/operators/7/channel', 'value': 'v4.1'},
{'op': 'replace', 'path': '/spec/operators/8/channel', 'value': 'v4.1'},
{'op': 'replace', 'path': '/spec/operators/9/channel', 'value': 'v4.1'},
{'op': 'replace', 'path': '/spec/operators/10/channel', 'value': 'v4.1'}]'
oc patch subscription.operators.coreos.com ibm-aiops-orchestrator -n <namespace> --type=json -p='[{'op': 'replace', 'path': '/spec/channel', 'value': 'v4.1'}]'
Where <namespace>
is the namespace (project) that your IBM Cloud Pak for Watson AIOps subscription is deployed in. This is your IBM Cloud Pak for Watson AIOps project if your deployment is namespace scoped, or openshift-operators
if your deployment has a cluster wide scope.
9. Verify the deployment
9.1 Check the version
Verify that your IBM Cloud Pak for Watson AIOps deployment is successfully upgraded. Run the following command and check that the VERSION that is returned is 4.1.0
.
oc get csv -l operators.coreos.com/ibm-aiops-orchestrator.<namespace> -n <namespace>
Where <namespace>
is the namespace (project) that your IBM Cloud Pak for Watson AIOps installation is deployed in if your deployment is namespace scoped, or openshift-operators
if your deployment has a cluster
wide scope.
Example output:
oc get csv -l operators.coreos.com/ibm-aiops-orchestrator.cp4waiops -n cp4waiops
NAME DISPLAY VERSION REPLACES PHASE
ibm-aiops-orchestrator.v4.1.0 IBM Cloud Pak for Watson AIOps AI Manager 4.1.0 ibm-aiops-orchestrator.v3.7.2 Succeeded
9.2 Check the deployment
Run the following command to check that the PHASE of your deployment is Updating
.
oc get installations.orchestrator.aiops.ibm.com -n <namespace>
Where <namespace>
is the namespace (project) that your IBM Cloud Pak for Watson AIOps installation is deployed in.
Example output:
NAME PHASE LICENSE STORAGECLASS STORAGECLASSLARGEBLOCK AGE
ibm-cp-watson-aiops Updating Accepted rook-cephfs rook-ceph-block 3m
It takes around 60-90 minutes for the upgrade to complete (subject to the speed with which images can be pulled). When installation is complete and successful, the PHASE of your installation changes to Running
.
If your installation phase does not change to Running
, then use the following command to find out which components are not ready:
oc get installation.orchestrator.aiops.ibm.com -o yaml | grep 'Not Ready'
Example output:
lifecycleservice: Not Ready
zenservice: Not Ready
To see details about why a component is Not Ready
run the following command, where <component>
is the component that is not ready, for example zenservice
.
oc get <component> -o yaml
(Optional) If you downloaded the status checker script earlier in step 1.3 Ensure cluster readiness, then you can also run this script to see information about the status of your deployment.
If the installation fails, or is not complete and is not progressing, then see Troubleshooting installation and upgrade and Known Issues to help you identify any installation problems.
10. Increase Kafka storage size
Run one of the following commands to increase the storage allocation for Kafka.
For production deployments:
oc patch kafka/iaf-system --type merge -p '{"spec":{"kafka":{"storage":{"size":"100Gi"}}}}'
For starter deployments:
oc patch kafka/iaf-system --type merge -p '{"spec":{"kafka":{"storage":{"size":"60Gi"}}}}'
11. Post upgrade actions
-
If you previously took a backup or scheduled a backup, then use the 4.1.0 backup scripts to take a new backup or schedule a new backup job. For more information, see Backup and restore.
-
If the EXPIRY_SECONDS environment variable was set for configuring log anomaly alerts, the environment variable was not retained in the upgrade. After the upgrade is completed, set the environment variable again. For more information about setting the variable, see Configuring expiry time for log anomaly alerts.
-
If the
Access Control
page displays custom roles with deprecated permissions after upgrade, see Custom roles with deprecated permissions after upgrade. -
(Optional) A new field is available in IBM Cloud Pak for Watson AIOps 4.1.0 that you can use to specify the terminology for collections of topology resources as
application
orservice
. The default isapplication
. If you want to useservice
as the terminology for your topology resource collections, then run the following command to patch your custom resource.oc patch installations.orchestrator.aiops.ibm.com/<namespace> --type merge -p '{"spec":{"topologyModel":"service"}}'
Where
<namespace>
is the namespace (project) that your IBM Cloud Pak for Watson AIOps installation is deployed in. -
(Optional) Delete the persistent volume claim (PVC) for training job state data that is no longer required. For more information, see Deleting a persistent volume claim.