Backing up IBM Cloud Pak for AIOps data
Learn how to back up data for IBM Cloud Pak for AIOps components.
The backup process uses Red Hat OpenShift APIs for Data Protection (OADP) for cluster object backups and volume backups. The backup script must be the same version as your deployment.
Note: If you also need to back up data for Infrastructure Automation components, the process and tools to use are the same. The only change that is required is when you are packaging the Helm chart. During this step you need to
configure the value of the enabledComponents
parameter to include '"IA"'
as well as '"AIOPS"'
. All other steps remain the same.
Prerequisites
-
Ensure that you are logged in to your Red Hat OpenShift cluster with
oc login
for any steps that use the Red Hat OpenShift command-line interface (CLI). -
If they are not already installed, install the backup and restore tools. You need to install the Red Hat® OpenShift® Container Platform APIs for Data Protection (OADP) in the Red Hat OpenShift Container Platform cluster. For more information, see Installing the backup and restore tools.
-
Provision S3 compliant object storage for your backups. You must have sufficient storage to backup the data that is contained in the datastores and persistent volume claims (PVCs) of the components that are backed up, as detailed in Backup process. The storage must also be accessible by any environment that a backup might need to be restored to. The backup process moves backup files to an S3 bucket, which must be in an S3 compatible location. For instance, you can use the following public or private cloud options for provisioning storage for your backups:
- MinIO
- Red Hat OpenShift Data Foundation object store
- IBM Cloud Object Storage
- AWS Cloud
Important: Ensure that OADP is configured to point to the same object storage (S3 bucket) that includes the backup that you plan to use.
Note: As your data grows, the capacity of your backup storage might need to grow, and you must ensure that sufficient storage for backups is provisioned to accommodate any growth in backup data.
Backup procedure
Follow these steps to backup IBM Cloud Pak for AIOps.
1. Prepare to backup IBM Cloud Pak for AIOps
-
Clone the IBM Cloud Pak for AIOps samples GitHub repository to retrieve the latest IBM Cloud Pak for AIOps backup and restore scripts by running the following command:
git clone https://github.com/IBM/cp4waiops-samples.git
If you have an offline (air-gapped) deployment of IBM Cloud Pak for AIOps, then copy the cloned scripts to your air-gapped environment.
-
Export the environment variables that you will need for the backup procedure.
If you are backing up an online deployment, set the following:
export OADP_NAMESPACE=<oadpNamespace> export PATH=<path> export AIOPS_NAMESPACE=<aiops_namespace>
If you are backing up an offline deployment, set the following:
export TARGET_REGISTRY_HOST=<target_registry_host> export TARGET_REGISTRY_PORT=<port> export TARGET_REGISTRY=$TARGET_REGISTRY_HOST:$TARGET_REGISTRY_PORT export TARGET_REGISTRY_USER=<username> export TARGET_REGISTRY_PASSWORD=<password> export EMAIL=<email> export PATH=<path> export OADP_NAMESPACE=<oadpNamespace> export AIOPS_NAMESPACE=<aiops_namespace>
Where:
<target_registry_host>
is the IP address or FQDN of the target registry that holds the backup and restore images, from Offline deployments only: Mirror the backup and restore images<port>
is the port_number of the target registry<username>
is the username for the target registry<password>
is the password for the target registry<email>
is the email for the target registry<path>
is the path to where you downloaded and extracted the IBM Cloud Pak for AIOps backup and restore files.<oadpNamespace>
is the OADP namespace<aiops_namespace>
is the namespace where IBM Cloud Pak for AIOps is installed.
-
Package the Helm Chart.
-
Change to the directory where you have the backup and recovery files downloaded.
cd ${PATH}/bcdr/4.8.0/backup
-
Set environment variables
-
Run the following command to find the name of your IBM Cloud Pak for AIOps installation.
oc get installation -n ${AIOPS_NAMESPACE}
Example output, where
ibm-cp-aiops
is the name of the IBM Cloud Pak for AIOps installation:NAME PHASE LICENSE STORAGECLASS STORAGECLASSLARGEBLOCK AGE ibm-cp-aiops Running Accepted rook-cephfs rook-ceph-rbd 34h
-
Run the following commands to find the name of your Elasticsearch backup PVC:
elasticsearch_cluster=$(oc get elasticsearchclusters.elasticsearch.opencontent.ibm.com -n ${AIOPS_NAMESPACE} -o jsonpath='{.items[0].metadata.name}') esbackupPVC="$elasticsearch_cluster-ibm-elasticsearch-es-server-snap"
-
-
Update the following parameters in the
values.yaml
file. The file is located in the./helm
directory.-
repository
- Do not edit this parameter. -
pullPolicy
- The policy for determining when to pull the image from the image registry server. For example, to force pull the image, use theAlways
policy.
-
schedule
- (optional when on-demand backups are activated) The Cron expression for the automated backup. For example, to take a backup once a day, use the0 0 * * *
Cron expression. -
backupStorageLocation
- The storage location where backed up data is stored. For example,bcdr-s3-location
.Use the
oc get backupstoragelocation -n ${OADP_NAMESPACE}
command to get thebackupStorageLocation
on the OpenShift cluster. -
backupNameSuffix
- The prefix for the backup name when the backup is created by using the job. Generally, it can be name of source cluster itself. For example,aiops-cluster-backup-106
. -
aiopsNamespace
- The namespace where IBM Cloud Pak for AIOps is installed. -
csNamespace
- The namespace where IBM Cloud Pak foundational services is installed. From IBM Cloud Pak for AIOps v4.4.0 this is the same as the namespace where IBM Cloud Pak for AIOps is installed. -
oadpNamespace
- The namespace where OADP is installed. -
redisBackupPod
- set this to backup-data-<installation_name>-redis-server where <installation_name> is the name that you found in the previous step. -
redisPVC
- set this to data-<installation_name>-redis-server- where <installation_name> is the name that you found in the previous step. -
redisSecret
- set this to <installation_name>-redis-secret where <installation_name> is the name that you found in the previous step. -
esBackupPVC
- set this to the value of$esbackupPV
that you found in the previous step. -
ttl
- The time-to-live setting for the backup. The backed up data is retained until thettl
value is reached (expires). For example720h0m0s
. -
enabledNamespaces
The namespaces of the components that need to be backed up. You can delete any unused namespaces from the list to reduce the time that is required for the backup process. If you have installed only IBM Cloud Pak for AIOps, then you require the namespaces that IBM Cloud Pak for AIOps and IBM Cloud Pak foundational services are installed in. From IBM Cloud Pak for AIOps v4.4.0, the IBM Cloud Pak foundational services namespace is the same as the IBM Cloud Pak for AIOps namespace, but this namespace must be listed twice, as in the following example.enabledNamespaces: - '"cp4aiops"' - '"cp4aiops"'
-
enabledComponents
- Required. The list of components to back up.The backup and restore processes support backing up and restoring both the IBM Cloud Pak for AIOps and Infrastructure Automation components. These components can be installed together or independtly. You can specify to back up both, or only one of these components. Any other value is ignored and a corresponding error message is generated.
To specify Infrastructure Automation, set the value
'"IA"'
.To specify IBM Cloud Pak for AIOps, set the value
'"AIOPS"'
.The following configuration shows both components listed:
enabledComponents: - '"IA"' - '"AIOPS"'
-
-
Package the Helm Chart.
helm package ./helm
-
-
Create a network policy.
For more information about creating a
NetworkPolicy
, see Creating a network policy.-
Create a network policy file called
policy-bcdr.yaml
with the following contents:apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: bcdr-np namespace: <aiopsNamespace> spec: podSelector: matchLabels: ibm-es-server: aiops-ibm-elasticsearch-es-server ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: <oadpNamespace> policyTypes: - Ingress
Where:
<aiopsNamespace>
is the value of $AIOPS_NAMESPACE<oadpNamespace>
is the value of $OADP_NAMESPACE
-
Apply the network policy to your cluster to allow the required ingress traffic.
oc apply -f policy-bcdr.yaml -n ${AIOPS_NAMESPACE}
-
2. Setup an automated backup of IBM Cloud Pak for AIOps
Run the following steps to trigger an automated backup.
-
Change to the
backup
directory where the script is located:cd ${PATH}/bcdr/4.8.0/backup
-
Deploy the backup job by running the following command:
helm install backup-job clusterbackup-0.1.0.tgz
-
Optional - update the
schedule
parameter in thevalues.yaml
file (located in the./helm
directory). This uses a cron expression to schedule the automated backup. For example, to take a backup once a day, use the0 0 * * *
Cron expression.
3. Setup on-demand backups of IBM Cloud Pak for AIOps
This step is optional. Use only when you do not want to wait untill the next scheduled backup job.
Prerequisite: The deployment of an automated backup job is a prerequisite for the on-demand job. Only after you initiate an automated backup job can you then trigger an on-demand backup.
-
Deploy the on-demand backup job by running the following command:
oc create job --from=cronjob/backup-job on-demand-backup-job -n ${OADP_NAMESPACE}
-
Check the on-demand backup pods status by running the following command:
oc get pods -n ${OADP_NAMESPACE}
-
Check the on-demand backup job logs by running the following command:
oc logs -f <on-demand-backup-job-***> -n ${OADP_NAMESPACE}
-
Find and export the name of your backup.
You can see the backup name after the on-demand backup job is complete. For example, you might see the backup name
aiops-cluster-backup-106-1622193915
in the on-demand backup job log as follows:Waiting for backup aiops-cluster-backup-106-1622193915 to complete
Export the name of your backup by running the following command:
export BACKUP_NAME=<backup_name>
Where
<backup_name>
is the name of the backup. -
Check the backup status by running the following command:
velero get backup ${BACKUP_NAME} -n ${OADP_NAMESPACE}
When the backup is finished, the status is
Completed
, such as in the following sample output:aiops-backup-1654171690-testfest34 Completed 2022-06-02 05:08:11 -0700 PDT 29d bcdr-data-protection-app-1 cp4aiops.ibm.com/backup=t
To see more details, you can also:
-
Check the backup pods' status by running the following command:
oc get pods -n ${OADP_NAMESPACE}
-
Check the backup job logs by running the following command:
oc logs -f <backup-job-***>
-
-
Review the backup result summary that is included within the logs of the backup job.
Note: A
backup-result-config
ConfigMap is created to include this result. To view this ConfigMap, run the following command:oc get cm backup-result-config -n ${AIOPS_NAMESPACE} -o yaml
Troubleshooting
- Backup process is stuck in an In progress state
- Classifier and layout pods remain not ready after running a backup
- Helm install backup job command failed
Backup process is stuck in an In progress state
If your backup process remain stuck in an In progress state for an unexpected duration, complete the following steps. This procedure stops the backup process so that you can try the backup again.
-
Delete the Velero pod by running the following command:
oc delete pod <pod> -n ${OADP_NAMESPACE}
-
Delete the backup that is stuck in the In progress state:
velero delete backup <backup> -n ${OADP_NAMESPACE}
Where
<backup>
is the backup that you want to delete. The backup process should begin again. -
Wait for the process to complete and verify that the backup is created.
Classifier and layout pods remain not ready after running a backup
After you complete a backup, you might notice that the classifier (aiops-ir-analytics-classifier
) and layout (aiops-topology-layout
) pods do not change to a ready state, which can cause AIOpsAnalyticsOrchestrator
to be stuck in an updating phase. This issue can occur after most backups. To resolve this issue, you need to either restart the pods or increase the resource limits for the pods.
- For the classifier (
aiops-ir-analytics-classifier
) pod, restart the pod. - For the layout (
aiops-topology-layout
) pod, restart the pod. If the restart does not result in the pod becoming ready, increase the resource limits for the pod. Doubling thecpu
andmemory
limits and request allocations for the pods can result in the pod starting and entering a ready state.
Helm install backup job command failed
When you are running the helm install backup-job clusterbackup-0.1.0.tgz
command, you might encounter the command failing with an error that is similar to the following error:
Error: admission webhook "trust.hooks.securityenforcement.admission.cloud.ibm.com" denied the request:
Deny "icr.io/cpopen/cp4waiops/cp4aiops-bcdr@sha256:294a42a851a2717ebbc68528ab3c6bcb1ba48114ff058f1c1b537dc6aa167355", no matching repositories in ClusterImagePolicy and no ImagePolicies in the "velero" namespace
If you encounter this error, complete the following steps to resolve the issue:
-
Uninstall the
backup-job
job by running the following command:helm uninstall backup-job -n ${OADP_NAMESPACE}
-
Export an environment variable for the image.
For an online deployment:
export REGISTRY=icr.io/cpopen/cp4waiops/cp4aiops-bcdr export BCDR_IMAGE=${REGISTRY}/<bcdr_image>
For an offline deployment:
export REGISTRY=$TARGET_REGISTRY export BCDR_IMAGE=${REGISTRY}/<bcdr_image>
Where
<bcdr_image>
is the name of the backup and restore image, as given in the backup helm chartvalues.yaml
file, in the formcp4waiops-bcdr@{digest}
. An example value for BCDR_IMAGE isicr.io/cpopen/cp4waiops/cp4waiops-bcdr@sha256:294a42a851a2717ebbc68528ab3c6bcb1ba48114ff058f1c1b537dc6aa167355
. -
Create a
backup-image-policy.yaml
file and add the following content within the file:apiVersion: securityenforcement.admission.cloud.ibm.com/v1beta1 kind: ClusterImagePolicy metadata: name: backup-image-policy spec: repositories: - name: ${BCDR_IMAGE} policy:
-
Apply the policy by running the following command:
oc apply -f backup-image-policy.yaml
-
Deploy the backup job by running the following command:
helm install backup-job clusterbackup-0.1.0.tgz