Backup and restore for cloud native analytics

The backup and restore function supports backing up cloud native analytics policies to an external location by using Secure Copy Protocol (SCP). It is then possible to use those backups to restore them on another deployment, which might be on another cluster or even the same cluster.

The backup and restore function is disabled by default. If you did not enable it during your deployment, then you must first enable it in the NOI custom resource (CR) YAML file under the spec section, as in the following example:
spec:
  backupRestore:
    enableAnalyticsBackups: true
Note: You can edit the NOI CR YAML file in one of the following two ways.
  • Edit the file from the command line with the oc edit noi for a full cloud deployment, or the oc edit noihybrid command for a hybrid deployment.
  • Edit the deployment from the Red Hat® OpenShift® Container Platform Operator Lifecycle Manager (OLM) console: Operators > Installed Operators > IBM Cloud Pak for AIOps Event Manager. Click the NOI or NOIHybrid tab and select your deployment. Then, click the YAML tab to edit and save the YAML file. Your changes are auto-deployed.
If you need to send backups to an external system by using SCP, complete the following steps:
  • Create a Public Private Key Pair, on the primary cluster, by using ssh-keygen. For more information, see How to Use ssh-keygen to Generate a New SSH Key external icon in the SSH documentation. When you generate the keys, two files are created, as in the following examples.
    $HOME/.ssh/id_rsa
    $HOME/.ssh/id_rsa.pub
    The public file is the file that you share with the backup cluster.
  • Add the public key to the $HOME/.ssh/authorized_keys file on the cluster that you plan to send the backups to. Copy the contents of the file on the primary cluster, which contains the public key (for example, $HOME/.ssh/id_rsa.pub) and add it to the $HOME/.ssh/authorized_keys on the target/backup file. To test if the file update is successful, use rsh from the source system to log in to the backup cluster by using the private key. For example, from the primary cluster, run the following command.
    ssh -i $HOME/.ssh/id_rsa root@api.bkupcluster.xyz.com
    If the transfer of the key is successful, you are not prompted for a password.

The private key is placed in a Kubernetes secret for use by the cron job to connect into the target system.

Backup

The following table describes what configuration parameters are used for backups in the IBM® Netcool® Operations Insight® Custom Record definition.
Section name Property name Description Default value
backupRestore enableAnalyticsBackups If set to true, the cronjob that does the backups is activated. false
helmValuesNOI ibm-noi-bkuprestore.noibackuprestore.backupDestination.hostname (Optional) The destination hostname of the machine where the backups are copied to. false
helmValuesNOI ibm-noi-bkuprestore.noibackuprestore.backupDestination.username

(Optional) The username on the destination hostname that does the SCP copy.

false
helmValuesNOI ibm-noi-bkuprestore.noibackuprestore.backupDestination.directory The directory on the destination hostname that receives the backups. (Optional) false
helmValuesNOI ibm-noi-bkuprestore.noibackuprestore.backupDestination.secretName

(Optional) The Kubernetes secret name, which contains the private ssh key that is used to do the SCP. The secret key privatekey must be used to store the ssh private key.

It needs to be set up up front if you want to use SCP before the installation of Netcool Operations Insight.

false
helmValuesNOI ibm-noi-bkuprestore.noibackuprestore.schedule It is the Cron schedule format that is used to determine how often the backups are taken. For more information about this used approach for running scheduled runs, see cron external link.

Every 3 minutes

helmValuesNOI ibm-noi-bkuprestore.noibackuprestore.claimName (Optional) The PVC claim name that is used to store the backups. An empty value implies no use of Kubernetes persistent storage.
Note: Valid for primary deployment only.

This property must be specified before the NOI deployment if Kubernetes persistent storage is needed.

false
helmValuesNOI ibm-noi-bkuprestore.noibackuprestore.maxbackups (Optional) The maximum number of historic policy backups to keep on the persistent volume to preserve storage space. 10
The following table describes what configuration parameters are used for backups in the postgresql section of the Netcool Operations Insight Custom Record definition.
Section name Property name Description Default value
postgresql.bootstrap enabled Use this property to determine whether to bootstrap a new cluster from a preexisting backup. false
postgresql.bootstrap clusterName

This property is the name of the cluster to bootstrap from. The cluster must include an existing backup that is located in your destinationPath property. For example, if you had a previously running cluster that successfully took backups before, the value to provide for this property is what you originally provided in spec.postgresql.backups.serverName.

evtmanager-noi-postgres-cluster
postgresql.bootstrap destinationPath This property follows the "s3://${BUCKET_NAME}" format. s3://ceph-bkt-18d99a17-38ee-4798-accb-a39077bd1abd
postgresql.bootstrap endpointURL This example property uses node port. http://worker0.detrayer.cp.xyz.com:32252
postgresql.bootstrap # endpointURL If you use a local S3 bucket, the example property is http://$AWS_HOST:$AWS_PORT. http://rook-ceph-rgw-my-store.rook-ceph.svc.8080
postgresql.bootstrap.s3Credentials secretName This property is the name of the secret that contains the relevant S3 credentials. ceph-bucket
postgresql.bootstrap.s3Credentials keyNameAccessKeyID This property is the name of the key in the secret with a value that matches the access key ID. AWS_ACCESS_KEY_ID
postgresql.bootstrap.s3Credentials keyNameAccessSecretKey This property is the name of the key in the secret with a value that matches the access secret key. AWS_SECRET_ACCESS_KEY
postgresql.bootstrap.s3Credentials keyNameAccessSessionToken This optional property is the name of the key in the secret with a value that matches the access session token.  
postgresql.bootstrap.wal walMaxParallel This property indicates the number of jobs to use when bootstrapping the cluster. This property has bandwidth implications. 1
postgresql.bootstrap.wal encryption Use the bucket default encryption. Options are default, AES256, or aws:kms. default
postgresql.bootstrap.wal compression

Options are none, gzip, bzip2, or snappy.

Each option has implications for speed and size.

none
postgresql.backups enabled

Enable or disable backups.

false
postgresql.backups data The following settings refer to the actual data in the database and not the Write-Ahead Logging (WAL) files:
  • Use default to use the bucket default encryption. Options are default, AES256, or aws:kms. The bucket must support the encryption mode. If you're unsure, use default.
  • The compression options are none, gzip, bzip2, or snappy. Each option has different implications for speed and size.
  • The jobs property indicates the number of jobs to use when backing up the Postgres data. This property has bandwidth implications.
  • Use the encryption property to use encryption.
default
postgresql.backups destinationPath

This property is the S3 bucket name. If you use the Backing up and restoring for EDB Postgres guide, it is the value of the BUCKET_NAME variable.

"s3://ceph-bkt-18d99a17-38ee-4798-accb-a39077bd1abd"
postgresql.backups endpointURL

This property is the endpoint URL. If you use the Backing up and restoring for EDB Postgres guide, it is the URL of one of the workers of the cluster that has the S3 bucket hosted on it followed by the external port that is defined in the node port service.

"http://worker0.destrayer.cp.xyz.com:32252"
postgresql.backups # endpointURL This example uses an S3 bucket that is co-located on the same cluster as the Netcool Operations Insight installation. "http://rook-ceph-rgw-my-store.rook-ceph.svc:8080" # "http://$AWS_HOST:$AWS_PORT"
postgresql.backups retentionPolicy This property indicates how long to store backups. 12m
postgresql.backups serverName

This property is the folder name where the backups from the cluster go.

If you are bootstrapping from an existing backup, the value that is provided must be distinct from the value that is provided for the spec.postgresql.bootstrap.clusterName property. A common convention is to use restoredCluster if you are bootstrapping a new cluster from a backup. The new backups from that bootstrapped cluster go into the restoredCluster directory in your S3 bucket.

If you are not bootstrapping a cluster from an existing backup, meaning if you are creating a new Netcool Operations Insight installation or upgrading to version 1.6.11for the first time, use the evtmanager-noi-postgres-cluster value.

restoredCluster
postgresql.backups.s3credentials secretName This property is the name of the secret that contains the relevant S3 credentials key. ceph-bucket
postgresql.backups.s3credentials keyNameAccessKeyID This property is the name of the key in the secret that has a value that matches the access ID key. AWS_ACCESS_KEY_ID
postgresql.backups.s3credentials keyNameAccessSecretKey This property is the name of the key in the secret with a value of the access secret key. AWS_SECRET_ACCESS_KEY
postgresql.backups.s3credentials keyNameAccessSessionToken This property is optional and matches the name of the key in the secret with a value of the access session token.  
postgresql.backups.wal encryption Use the bucket default encryption with this property. Options are default, AES256, or aws:kms. The bucket must support the encryption mode. If you are unsure, use default. default
postgresql.backups.wal compression

Options are none, gzip, bzip2, or snappy.

Each option has implications for speed and size.

none
postgresql.backups.wal walMaxParallel Set this property to the number of jobs to use when backing up the WAL. This property has bandwidth implications. 1
postgresql.backups.endpointCA enabled Set this property to true to use a custom certificate authority (CA) certificate. false
postgresql.backups.endpointCA name This property is the name of the custom CA certificate secret.

The secret must include a key that is named cacert and that has a value of the Base64-encoded CA certificate.

 
postgresql.backups.onetimeBackup enabled Enable the taking of a one-time backup. You need to manually clean these properties up when you uninstall. true
postgresql.backups.scheduledBackup enabled Set this property to determine whether to enable taking scheduled backups. false
postgresql.backups.scheduledBackup immediate Set this property to determine whether to start taking backups immediately. true
postgresql.backups.scheduledBackup schedule

This property indicates the schedule for backups. It uses the same syntax as Kubernetes job schedules.

"0 0 0 * * *"
postgresql.backups.scheduledBackup suspend

Set this property to determine whether to suspend the taking of scheduled backups.

Set this property to true if you want to pause the taking of backups.

false
postgresql.backups.scheduledBackup backupOwnerReference

Use this property for the OwnerReference value for the derivative backup custom resources (CRs) that are created from this ScheduledBackup CR.

  • If the property is set to none, no OwnerReference is used, and you need to manually delete the derivative backup CRs when you uninstall Netcool Operations Insight.
  • If the property is set to self, the OwnerReference for the Backup CRs is the ScheduledBackup CR. When one ScheduledBackup CR is deleted, all derivative Backup CRs are automatically deleted.
  • If the property is set to cluster, the OwnerReference for the Backup CRs is the Cluster CR. If the Cluster CR is deleted, such as when the Netcool Operations Insight CR is deleted, the Backup CRs are automatically deleted.
none
Create and send a backup by using SCP
To create the secret that is used for the backup, use the ssh private key that was created earlier. Run the following command on the primary cluster to create the secret.
oc create secret generic <secret_key_name> --from-file=privatekey=<home_directory of your_user_id>/.ssh/<generated_private_key> --namespace <namespace>
Where:
  • <secret name> is a name of your choice, for example: evtmanager-backuprestore-secret
  • <your user home directory> is the home dirctory of you user, for example: /root
  • <generated_private_key> is the private key that you generated, for example: id_rsa
  • <namespace> is the namespace where IBM Netcool Operations Insight on Red Hat OpenShift is deployed.
Example:
oc create secret generic ocd318-backup-key
        --from-file=privatekey=/root/.ssh/id_rsa --namespace noicase318
The secret is then used in the deployment YAML on the primary cluster.
An example of basic configuration of backup on the primary cluster:
helmValuesNOI:
    ibm-noi-bkuprestore.noibackuprestore.backupDestination.directory: <home_directory of your_user_id>/tmp/backups
    ibm-noi-bkuprestore.noibackuprestore.backupDestination.hostname: <hostname>.xyz.com
    ibm-noi-bkuprestore.noibackuprestore.backupDestination.secretName: <secret_key_name>
    ibm-noi-bkuprestore.noibackuprestore.backupDestination.username: <your_user_id>
    ibm-noi-bkuprestore.noibackuprestore.schedule: '*/3 * * * *'
The backup command copies the backups to the target cluster (for example hadr-inf.xyz.com), every three minutes and places the backup files in the <home_directory of your_user_id>/tmp/backups directory. The directory must exist on the target cluster. The indentation of the helmValuesNOI fields is important. If your NOI deployment already has a helmValuesNOI section, then add the new fields to it. Example with the backup values added:
  spec:
    helmValuesNOI:
      ibm-noi-bkuprestore.noibackuprestore.backupDestination.directory: /tmp/backups
      ibm-noi-bkuprestore.noibackuprestore.backupDestination.hostname: api.bkupcluster.xyz.com
      ibm-noi-bkuprestore.noibackuprestore.backupDestination.secretName: ocd318-backup-key
      ibm-noi-bkuprestore.noibackuprestore.backupDestination.username: root
      ibm-noi-bkuprestore.noibackuprestore.schedule: '*/3 * * * *'

Restore

  1. Install podman on your system (yum install podman), then log in to the Red Hat OpenShift Container Platform registry with your entitlement registry key, by running the following command:
    podman login -u cp -p var cp.icr.io
  2. Restoring the file on the target component can be done by using the noi-backuprestore-service docker image. To find the exact image to use for your deployment, locate the backuprestore cronjob. Run the oc get cronjobs | grep bkuprestore command from the primary cluster, where you are triggering the backups.
    Example:
    [root@api.primcluster.cp.fyre.ibm.com ~]# oc get cronjobs | grep bkuprestore
    evtmgr0-ibm-noi-bkuprestore-noibckup           */1 * * * *     False     0        25s             29h
    The image for backuprestore is in the YAML cronjob:
    oc get cronjob <release name>-ibm-noi-bkuprestore-noibckup -o yaml | grep icr
    Example:
    [root@api.primcluster.cp.fyre.ibm.com ~]# oc get cronjob evtmgr0-ibm-noi-bkuprestore-noibckup -o yaml | grep icr
                image: [cp.icr.io/cp/noi/noi-backuprestore-service@sha256:a28fb6c0cbdadda6f378a756ab3e8d8a629a3fd749c6d9343066545c1e374881]
    Pull this image on the backup cluster:
    # podman pull cp.icr.io/cp/noi/noi-backuprestore-service@sha256:a28fb6c0cbdadda6f378a756ab3e8d8a629a3fd749c6d9343066545c1e374881
    Trying to pull cp.icr.io/cp/noi/noi-backuprestore-service@sha256:a28fb6c0cbdadda6f378a756ab3e8d8a629a3fd749c6d9343066545c1e374881...
    Getting image source signatures
    Checking if image destination supports signatures
    Copying blob 8a81fb36b007 skipped: already exists  
    Copying blob 9d59a1377311 skipped: already exists  
    Copying blob d4cecfd56161 skipped: already exists  
    Copying blob 7562fe716fa4 skipped: already exists  
    Copying blob 0c10cd59e10e skipped: already exists  
    Copying blob 8a0eb7365b1a skipped: already exists  
    Copying blob 256777fbf05c skipped: already exists  
    Copying blob c8158b20c85a skipped: already exists  
    Copying blob 1b658ca76caf skipped: already exists  
    Copying blob 1e48335a1994 skipped: already exists  
    Copying blob a2f93eeba1ac skipped: already exists  
    Copying blob 47e0bdc406b5 skipped: already exists  
    Copying blob 66cf77cb242d skipped: already exists  
    Copying blob 4f4fb700ef54 skipped: already exists  
    Copying config d2436a7745 done  
    Writing manifest to image destination
    Storing signatures
    d2436a77456b2a230f3f603c9c42fa712c64408ae97a065b184b7d78ca866e89
    
  3. Create a directory to contain the configuration and policies that you want to upload to the target NOI deployment (for example /root/tmp/restore). Create a file called target.env in the restore directory. This file contains the credentials of the system that you want to restore to.
    Note: The target.env file must be located in the restore directory. The restore directory should be the parent directory of where the policies sub-directory is located.
    Example target.env file:
    export username=system 
    export password=<NOI deployment system auth password> 
    export tenantid=cfd95b7e-3bc7-4006-a4a8-a73a79c71255 
    export policysvcurl=https://<openshift noi deployment route endpoint  
    export inputdir=/input/policies 
    The username is a fixed value. The password value for <NOI deployment system auth password> can be obtained by using the following command.
    oc get secret <name>-systemauth-secret -o jsonpath --template '{.data.password}' | base64 --decode; echo
    The tenantid is a fixed value. The inputdir value is fixed and should be /input/policies. The policysvcurl value is https:// followed by the OpenShift route endpoint fully qualified hostname and can be obtained by running the following command:
    oc get routes -o=jsonpath='{range .items[*]}{.spec.host}{"\n"}{end}' | sort -u | grep netcool
  4. Copy the policies backup file into the <restoredir>. The file that is generated by the backup has the format cneapolicies-yyyy-MM-dd-mm:ss:SS:Z.tar.gz.
  5. Create a directory called policies in the <restoredir> directory. The <restoredir> is the directory where the target.env file resides. Extract the policy backup file in the <restoredir> by running the following command:
    tar xvf <'your backup tar gzip'> --force-local --directory policies 
    Note: The policy file name might need to be enclosed with single quotation symbols (').
  6. Restore by running the docker command:
    podman run -t -i --env LICENSE=accept --network host --user <your_user_id> --privileged -v <restoredir>:/input:ro <backuprestoreimage> /app/scripts/run.sh
    Where:
    • <your_user_id> is the user owning the <restoredir> and the backup images.
    • <backuprestoreimage> is the image from the bkuprestore cronjob.
    Note: Before you run the command, the user must be logged in to the podman location for reference to the backup and restore image.
    When running the command, you should see a stream of policy data output on the terminal showing that the individual policies are being written to the target system. On successful completion, you will see the following messages for each analytics policy type output on the screen.

    Successfully activated policy batch in destination PRS: groupid = <analytics_type>

    Example:
    # podman run -t -i --env LICENSE=accept --network host --user root --privileged -v `pwd`:/input:ro  [cp.icr.io/cp/noi/noi-backuprestore-service@sha256:a28fb6c0cbdadda6f378a756ab3e8d8a629a3fd749c6d9343066545c1e374881]/app/scripts/run.sh
    Running extraction script
    Restoring policies
    Note: Using npm emulator
    {
      tenantid: 'cfd95b7e-3bc7-4006-a4a8-a73a79c71255',
      policysvcurl: 'https://netcool-evtmgr0.apps.kcaiops42.xyz.com'
    } Running in policy load
    <lines deleted>
    Successfully activated policy batch in destination PRS:  groupid = topological-correlation
    Successfully activated policy batch in destination PRS:  groupid = scope
    Successfully activated policy batch in destination PRS:  groupid = topological-enrichment
    Successfully activated policy batch in destination PRS:  groupid = self_monitoring
  7. Confirm that the policies were added to the manage policies UI.