Offline backup and restore to a different cluster with the IBM Software Hub OADP utility

A Red Hat® OpenShift® Container Platform cluster administrator can create an offline backup and restore it to a different cluster with the IBM Software Hub OADP utility.

Before you begin

Do the following tasks before you back up and restore a IBM Software Hub deployment.

  1. Check whether the services that you are using support platform backup and restore by reviewing Services that support backup and restore. You can also run the following command:
    cpd-cli oadp service-registry check \
    --tenant-operator-namespace ${PROJECT_CPD_INST_OPERATORS} \
    --verbose \
    --log-level debug

    If a service is not supported, check if one of the following alternatives is available:

  2. Install the software that is needed to back up and restore IBM Software Hub with the OADP utility.

    For more information, see Installing backup and restore software.

  3. Check that your IBM Software Hub deployment meets the following requirements:
    • The minimum deployment profile of IBM Cloud Pak foundational services is Small.

      For more information about sizing IBM Cloud Pak foundational services, see Hardware requirements and recommendations for foundational services.

    • All services are installed at the same IBM Software Hub release.

      You cannot back up and restore a deployment that is running service versions from different IBM Software Hub releases.

    • The control plane is installed in a single project (namespace).
    • The IBM Software Hub instance is installed in zero or more tethered projects.
    • IBM Software Hub operators and the IBM Software Hub instance are in a good state.

Overview

You can create Restic backups on an S3-compatible object store. Restic is a file system copying technique that is used by OpenShift APIs for Data Protection (OADP), based on the Restic open source project. Under OADP, Restic backups support producing backups only to S3-compatible object stores.

Backing up an IBM Software Hub deployment and restoring it to the same cluster involves the following high-level steps:

  1. Preparing to back up IBM Software Hub
  2. Creating an offline backup
  3. Preparing to restore IBM Software Hub
  4. Restoring IBM Software Hub to a different cluster
  5. Completing post-restore tasks

1. Preparing to back up IBM Software Hub

Complete the following prerequisite tasks before you create an offline backup. Some tasks are service-specific, and need to be done only when those services are installed.

1.1 Creating environment variables

Create the following environment variables so that you can copy commands from the documentation and run them without making any changes.

Environment variable Description
OC_LOGIN Shortcut for the oc login command.
CPDM_OC_LOGIN Shortcut for the cpd-cli manage login-to-ocp command.
PROJECT_CPD_INST_OPERATORS The project where the IBM Software Hub instance operators are installed.
PROJECT_CPD_INST_OPERANDS The project where IBM Software Hub control plane and services are installed.
PROJECT_SCHEDULING_SERVICE The project where the scheduling service is installed.

This environment variable is needed only when the scheduling service is installed.

PROJECT_CPD_INSTANCE_TETHERED_LIST The list of tethered projects.

This environment variable is needed only when some services are installed in tethered projects.

PROJECT_CPD_INSTANCE_TETHERED The tethered project where a service is installed.

This environment variable is needed only when a service is installed in a tethered project.

OADP_PROJECT The project (namespace) where OADP is installed.
TENANT_OFFLINE_BACKUP_NAME The name that you want to use for the offline backup.

1.2 Checking the version of OADP utility components

Check that you installed the correct version of OADP components.
  1. Check that the OADP operator version is 1.4.x:
    oc get csv -A | grep "OADP Operator"
  2. Check that the cpd-cli oadp version is 5.1.0:
    cpd-cli oadp version

1.3 Optional: Estimating how much storage to allocate for backups

You can estimate the amount of storage that you need to allocate for backups.

Note: Do not use this feature in production environments.

To use this feature, you must install the cpdbr-agent in the Red Hat OpenShift cluster. The cpdbr-agent deploys the node agents to the cluster. The node agents must be run in privileged mode.

  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator:
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Install the cpdbr-agent by running the following command:
    cpd-cli oadp install --component=cpdbr-agent --namespace=${OADP_PROJECT} --cpd-namespace=${PROJECT_CPD_INST_OPERANDS}
  3. Export the following environment variable:
    export CPDBR_ENABLE_FEATURES=volume-util
  4. Estimate how much storage you need to allocate to a backup by running the following command:
    cpd-cli oadp du-pv

1.4 Removing MongoDB-related ConfigMaps

If you upgraded from IBM Cloud Pak® for Data version 4.8.4 or older, some backup and restore ConfigMaps related to MongoDB might remain in the IBM Software Hub operand project (namespace), and must be removed. Ensure that these ConfigMaps do not exist in the operand project by running the following commands:
oc delete cm zen-cs-aux-br-cm
oc delete cm zen-cs-aux-ckpt-cm
oc delete cm zen-cs-aux-qu-cm
oc delete cm zen-cs2-aux-ckpt-cm

1.5 Checking the primary instance of every PostgreSQL cluster is in sync with its replicas

The replicas for Cloud Native PostgreSQL and EDB Postgres clusters occasionally get out of sync with the primary node. To check whether this problem exists and to fix the problem, see the troubleshooting topic PostgreSQL cluster replicas get out of sync.

1.6 Excluding external volumes from IBM Software Hub offline backups

You can exclude external Persistent Volume Claims (PVCs) in the IBM Software Hub instance project (namespace) from offline backups.

You might want to exclude PVCs that were manually created in the IBM Software Hub project (namespace) but are not needed by IBM Software Hub services. These volumes might be too large for a backup, or they might already be backed up by other means.

Optionally, you can choose to include PVC YAML definitions in the offline backup, and exclude only the contents of the volumes.

Note: During restore, you might need to manually create excluded PVCs if pods fail to start because of an excluded PVC.
  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. For backups that are created by using Container Storage Interface (CSI) snapshots, do one of the following choices:
    • To exclude a PVC YAML definition and the contents of the volume in a backup, label the PVC to exclude with the Velero exclude label:
      oc label pvc <pvc-name> velero.io/exclude-from-backup=true
    • To include a PVC YAML definition in a backup but exclude the contents of the volume, apply the following label to the PVC:
      oc label pvc <volume-name> icpdsupport/empty-on-backup=true
  3. To exclude both the PVC YAML definition and the contents of the volume in backups that are created by using Restic, do the following steps.
    1. Label the PVC to exclude with the Velero exclude label:
      oc label pvc <pvc-name> velero.io/exclude-from-backup=true
    2. Label any pods that mount the PVC with the exclude label.

      In the PVC describe output, look for pods in Mounted By. For each pod, add the label:

      oc describe pvc <pvc-name>
      oc label po <pod-name> velero.io/exclude-from-backup=true
  4. To include the PVC YAML definition and exclude the contents of the volume in backups that are created by using Restic, apply the following label to the PVC:
    oc label pvc <volume-name> icpdsupport/empty-on-backup=true

1.7 Updating the Common core services ConfigMap

5.1.0 You might need to update the cpd-ccs-maint-br-cm ConfigMap before you create a backup. Do the following steps:

  1. Check if any common core services download images pod is in a Running state:
    oc get po -l icpdsupport/addOnId=ccs,icpdsupport/module=ccs-common,app=download-images -n ${PROJECT_CPD_INST_OPERANDS}
  2. If the output of the command shows one or more pods in a Running state, edit the managed-resources section in the cpd-ccs-maint-br-cm ConfigMap to ignore the pod:
      aux-meta:
        managed-resources:
          - resource-kind: pod
            labels: icpdsupport/addOnId=ccs,icpdsupport/module=ccs-common,app=download-images
Note: The common core services ConfigMap is regenerated every time the common core services custom resource reconciles. Consequently, you need to do this check each time you create a backup.

1.8 Deleting Analytics Engine powered by Apache Spark runtime deployments

5.1.0 Spark master/worker runtime deployment pods are transient pods that are automatically deleted when the Spark job completes. You can wait for the job to complete and the pods to be cleaned up, or you can run the following command to delete the runtime deployments:
oc get deploy -n ${PROJECT_CPD_INST_OPERANDS} | grep 'spark-master\|spark-worker' | awk '{print $1}' | xargs oc delete deploy -n ${PROJECT_CPD_INST_OPERANDS}

1.9 Stopping Data Refinery runtimes and jobs

5.1.0 To avoid any unnecessary data loss, it is recommended that you stop all Data Refinery runtimes and jobs. Do the following steps:
  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. To stop all active Data Refinery runtimes and jobs, run the following commands:
    oc delete $(oc get deployment -l type=shaper -o name)
    oc delete $(oc get svc -l type=shaper -o name)
    oc delete $(oc get job -l type=shaper -o name)
    oc delete $(oc get secrets -l type=shaper -o name)
    oc delete $(oc get cronjobs -l type=shaper -o name)
    oc scale -\-replicas=0 deploy wdp-shaper wdp-dataprep

1.10 Preparing Db2

Add a label to the Db2U cluster and stop Q Replication so that backups can successfully complete. Do the following steps:
  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Retrieve the names of the IBM Software Hub deployment's Db2U clusters:
    oc get db2ucluster -A -ojsonpath='{.items[?(@.spec.environment.dbType=="db2oltp")].metadata.name}'
  3. For each Db2U cluster, do the following substeps:
    1. Export the Db2U cluster name:
      export DB2UCLUSTER=<db2ucluster_name>
    2. Label the cluster:
      oc label db2ucluster ${DB2UCLUSTER} db2u/cpdbr=db2u --overwrite
    3. Verify that the Db2U cluster now contains the new label:
      oc get db2ucluster ${DB2UCLUSTER} --show-labels
  4. For each Db2U cluster, if Q Replication is enabled, stop Q Replication by doing the following steps.
    1. Get the Q Replication pod name:
      oc get po -n ${PROJECT_CPD_INST_OPERANDS} | grep ${DB2UCLUSTER} | grep qrep
    2. Exec into the Q Replication pod:
      oc exec -it <qrep-pod-name> bash -n ${PROJECT_CPD_INST_OPERANDS}
    3. Log in as the dsadmin user:
      su - dsadm
    4. 5.1.0-5.1.1 Stop the Q Replication monitoring process:
      >nohup $BLUDR_HOME/scripts/bin/bludr-monitor-qrep-components-wrapper-utils.sh stop > /dev/null &
    5. Stop Q Replication:
      $BLUDR_HOME/scripts/bin/bludr-stop.sh
      When the script has finished running, the following messages appear:
      Stopping bludr replication instance ...
      Stopping replication ...
      REPLICATION ENDED SAFELY
      Stopping BLUDR WLP server...
      Stopping replication REST server instance ...
      SERVER STATUS: INACTIVE

1.11 Preparing Db2 Warehouse

  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Retrieve the names of the IBM Software Hub deployment's Db2U clusters:
    oc get db2ucluster -A -ojsonpath='{.items[?(@.spec.environment.dbType=="db2wh")].metadata.name}'
  3. For each Db2U cluster, do the following substeps:
    1. Export the Db2U cluster name:
      export DB2UCLUSTER=<db2ucluster_name>
    2. Label the cluster:
      oc label db2ucluster ${DB2UCLUSTER} db2u/cpdbr=db2u --overwrite
    3. Verify that the Db2U cluster now contains the new label:
      oc get db2ucluster ${DB2UCLUSTER} --show-labels
  4. For each Db2U cluster, if Q Replication is enabled, stop Q Replication by doing the following steps.
    1. Get the Q Replication pod name:
      oc get po -n ${PROJECT_CPD_INST_OPERANDS} | grep ${DB2UCLUSTER} | grep qrep
    2. Exec into the Q Replication pod:
      oc exec -it <qrep-pod-name> bash -n ${PROJECT_CPD_INST_OPERANDS}
    3. Log in as the dsadmin user:
      su - dsadm
    4. 5.1.0-5.1.1 Stop the Q Replication monitoring process:
      >nohup $BLUDR_HOME/scripts/bin/bludr-monitor-qrep-components-wrapper-utils.sh stop > /dev/null &
    5. Stop Q Replication:
      $BLUDR_HOME/scripts/bin/bludr-stop.sh
      When the script has finished running, the following messages appear:
      Stopping bludr replication instance ...
      Stopping replication ...
      REPLICATION ENDED SAFELY
      Stopping BLUDR WLP server...
      Stopping replication REST server instance ...
      SERVER STATUS: INACTIVE

1.12 Labeling the IBM Match 360 ConfigMap

5.1.1 Update the IBM Match 360 ConfigMap to add the mdm label. Do the following steps:
  1. Get the ID of the IBM Match 360 instance:
    1. From the IBM Software Hub home page, go to Services > Instances.
    2. Click the link for the IBM Match 360 instance.
    3. Copy the value after mdm- in the URL.

      For example, if the end of the URL is mdm-1234567891123456, the instance ID is 1234567891123456.

  2. Create the following environment variable:
    export INSTANCE_ID=<instance-id>
  3. Add the mdm label by running the following command:
    oc label cm mdm-operator-${INSTANCE_ID} icpdsupport/addOnId=mdm -n ${PROJECT_CPD_INST_OPERANDS}

1.13 Updating the RStudio Server Runtimes backup and restore ConfigMap

5.1.2 and later Update the RStudio® Server Runtimes backup and restore ConfigMap by doing the following steps:

  1. Create the rstudio-br-patch.sh file.
    Note: Use only spaces (and not tabs) in the file.
    vi rstudio-br-patch.sh
    oc -n ${PROJECT_CPD_INST_OPERANDS} get cm cpd-rstudio-maint-aux-br-cm -o jsonpath='{.data.plan-meta}' > plan-meta.yaml
    sed -i '44d;48,50d' plan-meta.yaml
    sed -i '44i\
        sequence:
    ' plan-meta.yaml
    sed -i '45i\
          - group: rstudio-clusterroles
    ' plan-meta.yaml
    sed -i '46i\
          - group: rstudio-crs
    ' plan-meta.yaml
    echo "    sequence: []" >> plan-meta.yaml
    echo "data:" > plan-meta-patch.yaml
    echo "  plan-meta: |" >> plan-meta-patch.yaml
    sed 's/^/    /' plan-meta.yaml >>  plan-meta-patch.yaml
    oc patch -n ${PROJECT_CPD_INST_OPERANDS} cm cpd-rstudio-maint-aux-br-cm --type=merge --patch-file  plan-meta-patch.yaml
  2. Put the RStudio Server Runtimes service in maintenance mode and wait until the RStudio Server Runtimes custom resources are in the InMaintenance state:
    oc patch -n ${PROJECT_CPD_INST_OPERANDS} rstudioaddon rstudio-cr --type=merge -p '{"spec": {"ignoreForMaintenance":true}}'
    oc -n ${PROJECT_CPD_INST_OPERANDS} get rstudio -w
  3. Run the rstudio-br-patch.sh file:
    bash rstudio-br-patch.sh
    When the script has finished running, the ConfigMap is updated, and you see the following message:
    configmap/cpd-rstudio-maint-aux-br-cm patched
  4. Remove the RStudio Server Runtimes service from maintenance mode:
    oc patch -n ${PROJECT_CPD_INST_OPERANDS} rstudioaddon rstudio-cr --type=merge -p '{"spec": {"ignoreForMaintenance":false}}'
    oc -n ${PROJECT_CPD_INST_OPERANDS} get rstudio -w

1.14 Stopping SPSS Modeler runtimes and jobs

Before you back up the SPSS Modeler service, stop all active runtimes and jobs. Do the following steps:
  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. To stop all active SPSS Modeler runtimes and jobs, run the following command:
    oc delete rta -l type=service,job -l component=spss-modeler
  3. To check whether any SPSS Modeler runtime sessions are still running, run the following command:
    oc get pod -l type=spss-modeler

    When no pods are running, no output is produced for this command.

1.15 Backing up Watson Discovery data separately

Before you back up a cluster where the Watson Discovery service is installed, back up the Watson Discovery data separately by running the Watson Discovery backup script. For more information, see Backing up and restoring data.

1.16 Scaling down watsonx.ai deployments

5.1.0 If watsonx.ai™ is installed, manually scale down the following deployments.

  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Run the following command:
    oc scale deploy caikit-runtime-stack-operator -n ${PROJECT_CPD_INST_OPERATORS} --replicas=0

1.17 Preparing watsonx Code Assistant for Z

5.1.2 and later and later: If watsonx Code Assistant™ for Z is installed, do the following steps:

  1. If watsonx Code Assistant for Z includes a GPU node, taint the worker node.
    1. Find the GPU node:
      oc get node -L nvidia.com/gpu.replicas | grep -oP '.*[\d]$'  | cut -f1 -d' '
    2. For each node, make it not preferable to schedule on so that only GPU workloads go there:
      oc  adm taint nodes workerX special=true:PreferNoSchedule
  2. Because the IBM large language model (LLM) is more than 75GB, expand the minio-storage-pvc PVC size in the Velero project to 100GB.
    oc patch pvc minio-storage-pvc -n velero --type='merge' -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'
  3. Improve the startup performance of the catalog-api-jobs job by increasing the startup probe initial delay to 300s.
    oc patch deployment catalog-api-jobs -n ${PROJECT_CPD_INST_OPERANDS} --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/startupProbe/initialDelaySeconds",  "value": 300}]'

1.18 Checking the status of installed services

Ensure that the status of all installed services is Completed. Do the following steps:
  1. Log the cpd-cli in to the Red Hat OpenShift Container Platform cluster:
    ${CPDM_OC_LOGIN}
    Remember: CPDM_OC_LOGIN is an alias for the cpd-cli manage login-to-ocp command.
  2. Run the following command to get the status of all services.
    cpd-cli manage get-cr-status \
    --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS}

2. Creating an offline backup

Create an offline backup of a IBM Software Hub deployment by doing the following tasks.

2.1 Setting the mode in which to create backups

You can run the IBM Software Hub OADP backup and restore utility in Kubernetes mode or in REST mode.

By default, the IBM Software Hub OADP backup and restore utility runs in Kubernetes mode. In this mode, you must log in to your Red Hat OpenShift cluster and you must have Kubernetes cluster administrator privileges to use the utility.

If you installed the IBM Software Hub OADP backup REST service, you can run the utility in REST mode to create backups. In REST mode, the utility runs as a REST client that communicates to a REST server. The REST service is configured to work with a specific IBM Software Hub instance. You do not have to log in to the cluster, and IBM Software Hub users with the Administrator role can run backup and checkpoint commands on their own IBM Software Hub instances, based on the specified control plane and any tethered projects.

Important: Restore operations must always be run in Kubernetes mode by a cluster administrator.

Running the utility in REST mode is useful when you are generally creating backups only, or when backups take a long time to complete. For backups that take a long time to complete, running the utility in REST mode avoids the problem of the Red Hat OpenShift user session token expiring before the backup process completes. If the session token expires, you must log back in to the cluster and reset the utility.

Tip: The output format of CLI commands that are run in REST mode are different than CLI commands that are run in Kubernetes mode.
  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator:
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. To create backups in REST mode, run the following command:
    cpd-cli oadp client config set runtime-mode=rest-client
  3. To change the IBM Software Hub OADP backup and restore utility back to the Kubernetes mode, run the following command:
    cpd-cli oadp client config set runtime-mode=

Related topic: Unable to run an online backup or restore operation

2.2 Backing up the scheduling service

If the IBM Software Hub scheduling service is installed, create a backup of the service.

Backups that are created in IBM Cloud Pak for Data 5.0 cannot be restored in IBM Software Hub 5.1.0. You must take new backups in 5.1.0.

Restriction: For s390x clusters (IBM Z and LinuxONE), you must run the backup and restore commands from an x86_64 workstation.

Check the Known issues and limitations for IBM Software Hub page for any workarounds that you might need to do before you create a backup.

  1. If you are running the backup and restore utility in Kubernetes mode, log in to Red Hat OpenShift Container Platform as a cluster administrator:
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Configure the OADP client to set the IBM Software Hub project to the scheduling service project:
    cpd-cli oadp client config set cpd-namespace=${PROJECT_SCHEDULING_SERVICE}
  3. Configure the OADP client to set the OADP project to the project where the OADP operator is installed:
    cpd-cli oadp client config set namespace=${OADP_PROJECT}
  4. Run service backup prechecks:
    IBM Software Hub 5.1.0
    cpd-cli oadp backup precheck \
    --include-namespaces=${PROJECT_SCHEDULING_SERVICE} \
    --log-level=debug \
    --verbose \
    --hook-kind=br
    IBM Software Hub 5.1.1 and later
    cpd-cli oadp backup precheck \
    --backup-type singleton \
    --include-namespaces=${PROJECT_SCHEDULING_SERVICE} \
    --log-level=debug \
    --verbose \
    --hook-kind=br
  5. Back up the IBM Software Hub scheduling service:
    The cluster pulls images from the IBM Entitled Registry
    IBM Software Hub 5.1.0
    cpd-cli oadp backup create ${PROJECT_SCHEDULING_SERVICE}-offline \
    --include-namespaces ${PROJECT_SCHEDULING_SERVICE} \
    --include-resources='operatorgroups,configmaps,catalogsources.operators.coreos.com,subscriptions.operators.coreos.com,customresourcedefinitions.apiextensions.k8s.io,scheduling.scheduler.spectrumcomputing.ibm.com' \
    --prehooks=true \
    --posthooks=true \
    --log-level=debug \
    --verbose \
    --hook-kind=br \
    --selector 'velero.io/exclude-from-backup notin (true)' \
    --image-prefix=registry.redhat.io/ubi9
    IBM Software Hub 5.1.1 and later
    cpd-cli oadp backup create ${PROJECT_SCHEDULING_SERVICE}-offline \
    --backup-type singleton \
    --include-namespaces ${PROJECT_SCHEDULING_SERVICE} \
    --include-resources='operatorgroups,configmaps,catalogsources.operators.coreos.com,subscriptions.operators.coreos.com,customresourcedefinitions.apiextensions.k8s.io,scheduling.scheduler.spectrumcomputing.ibm.com' \
    --prehooks=true \
    --posthooks=true \
    --log-level=debug \
    --verbose \
    --hook-kind=br \
    --selector 'velero.io/exclude-from-backup notin (true)' \
    --image-prefix=registry.redhat.io/ubi9
    The cluster pulls images from a private container registry
    IBM Software Hub 5.1.0
    cpd-cli oadp backup create ${PROJECT_SCHEDULING_SERVICE}-offline \
    --include-namespaces ${PROJECT_SCHEDULING_SERVICE} \
    --include-resources='operatorgroups,configmaps,catalogsources.operators.coreos.com,subscriptions.operators.coreos.com,customresourcedefinitions.apiextensions.k8s.io,scheduling.scheduler.spectrumcomputing.ibm.com' \
    --prehooks=true \
    --posthooks=true \
    --log-level=debug \
    --verbose \
    --hook-kind=br \
    --selector 'velero.io/exclude-from-backup notin (true)' \
    --image-prefix=PRIVATE_REGISTRY_LOCATION/ubi9
    IBM Software Hub 5.1.1 and later
    cpd-cli oadp backup create ${PROJECT_SCHEDULING_SERVICE}-offline \
    --backup-type singleton \
    --include-namespaces ${PROJECT_SCHEDULING_SERVICE} \
    --include-resources='operatorgroups,configmaps,catalogsources.operators.coreos.com,subscriptions.operators.coreos.com,customresourcedefinitions.apiextensions.k8s.io,scheduling.scheduler.spectrumcomputing.ibm.com' \
    --prehooks=true \
    --posthooks=true \
    --log-level=debug \
    --verbose \
    --hook-kind=br \
    --selector 'velero.io/exclude-from-backup notin (true)' \
    --image-prefix=PRIVATE_REGISTRY_LOCATION/ubi9
  6. Validate the backup:
    IBM Software Hub 5.1.0
    cpd-cli oadp backup validate \
    --include-namespaces=${PROJECT_SCHEDULING_SERVICE} \
    --backup-names ${PROJECT_SCHEDULING_SERVICE}-offline \
    --log-level trace \
    --verbose \
    --hook-kind=br
    IBM Software Hub 5.1.1 and later
    cpd-cli oadp backup validate \
    --backup-type singleton \
    --include-namespaces=${PROJECT_SCHEDULING_SERVICE} \
    --backup-names ${PROJECT_SCHEDULING_SERVICE}-offline \
    --log-level trace \
    --verbose \
    --hook-kind=br

2.3 Backing up an IBM Software Hub instance

Create an offline backup of each IBM Software Hub instance, or tenant, in your environment by doing the following steps.

Notes:
  • To create Restic backups, if IBM Software Hub is installed on NFS, NFS storage must be configured with no_root_squash.

  • When backup commands are run, some pods remain in a Running state. These running pods do not affect the backup process, and you do not need to manually shut them down.

  • The storage provider that you use to store backups might limit the number of snapshots that you can take per volume. For more information, consult your storage provider documentation.
  • For s390x clusters (IBM Z and LinuxONE), you must run the backup and restore commands from an x86_64 workstation.
  • This section shows you how to create a backup by using the IBM Software Hub 5.1.0 command. You can still create a backup by using the IBM Cloud Pak for Data 5.0 backup commands instead. For details, see Creating an offline backup of IBM Cloud Pak for Data with the OADP utility in the IBM Cloud Pak for Data 5.0 documentation.
Important: If you upgraded from IBM Software Hub 5.1.0 or 5.1.1 to 5.1.2, you must create a new backup.

Check the Known issues and limitations for IBM Software Hub page for any workarounds that you might need to do before you create a backup.

  1. If you are running the backup and restore utility in Kubernetes mode, log in to Red Hat OpenShift Container Platform as a cluster administrator:
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. 5.1.0 Ensure that the expected EDB Postgres replica PVCs are included in the backup:
    oc label pvc,pods -l k8s.enterprisedb.io/cluster,velero.io/exclude-from-backup=true velero.io/exclude-from-backup- -n ${PROJECT_CPD_INST_OPERANDS}
  3. Create a backup by running one of the following commands.
    The cluster pulls images from the IBM Entitled Registry
    cpd-cli oadp tenant-backup create ${TENANT_OFFLINE_BACKUP_NAME} \
    --namespace ${OADP_PROJECT} \
    --vol-mnt-pod-mem-request=1Gi \
    --vol-mnt-pod-mem-limit=4Gi \
    --tenant-operator-namespace ${PROJECT_CPD_INST_OPERATORS} \
    --mode offline \
    --image-prefix=registry.redhat.io/ubi9 \
    --log-level=debug \
    --verbose &> ${TENANT_OFFLINE_BACKUP_NAME}.log&
    The cluster pulls images from a private container registry
    cpd-cli oadp tenant-backup create ${TENANT_OFFLINE_BACKUP_NAME} \
    --namespace ${OADP_PROJECT} \
    --vol-mnt-pod-mem-request=1Gi \
    --vol-mnt-pod-mem-limit=4Gi \
    --tenant-operator-namespace ${PROJECT_CPD_INST_OPERATORS} \
    --mode offline \
    --image-prefix=PRIVATE_REGISTRY_LOCATION/ubi9 \
    --log-level=debug \
    --verbose &> ${TENANT_OFFLINE_BACKUP_NAME}.log&
    Note: If the backup fails during the volume backup stage, try increasing the --vol-mnt-pod-mem-limit option. You might need to increase this option when you have terabytes of data.
  4. Confirm that the tenant backup was created and has a Completed status:
    cpd-cli oadp tenant-backup list
  5. To view the detailed status of the backup, run the following command:
    cpd-cli oadp tenant-backup status ${TENANT_BACKUP_NAME} \
    --details
    The command shows the following sub-backups:
    Backup Description
    cpd-tenant-xxx Backup that contains Kubernetes resources.
    cpd-tenant-vol-yyy Backup that contains volume data.
    Tip: If you need more information, listed in the status details are sub-backups (of type group). You can view more information about these sub-backups by running the following command:
    cpd-cli oadp backup status <SUB_BACKUP_NAME> \
    --details
  6. To view logs of the tenant backup and all sub-backups, run the following command:
    cpd-cli oadp tenant-backup log ${TENANT_BACKUP_NAME}
Best practice: If you have services that connect to an external database, such as for business intelligence (BI) reporting, it is recommended that you also back up the database. Backing up the external database ensures data consistency if the IBM Software Hub backup is later restored. For example, you need to restore an older IBM Software Hub backup instead of the most recent backup. The external database is synchronized with the most recent IBM Software Hub backup, so it has data that is not in the backup that you want to restore. To maintain data consistency, you need to restore the external database backup that was taken at the same time as the IBM Software Hub backup.

2.4 Doing post-backup tasks

For some services, you must do additional tasks after you create an offline backup.

  1. 5.1.2 and later If RStudio Server Runtimes is installed, remove the RStudio Server Runtimes service from maintenance mode:
    oc patch -n ${PROJECT_CPD_INST_OPERANDS} rstudioaddon rstudio-cr --type=merge -p '{"spec": {"ignoreForMaintenance":false}}'
    oc -n ${PROJECT_CPD_INST_OPERANDS} get rstudio -w
  2. 5.1.0 If Data Refinery is installed, restart the service:
    1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
      ${OC_LOGIN}
      Remember: OC_LOGIN is an alias for the oc login command.
    2. Run the following command.

      The value of <number_of_replicas> depends on the scaleConfig setting when Data Refinery was installed (1 for small, 3 for medium, and 4 for large).

      oc scale --replicas=<number_of_replicas> deploy wdp-shaper wdp-dataprep
  3. 5.1.0 If watsonx.ai is installed, manually scale up three deployments.
    1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
      ${OC_LOGIN}
      Remember: OC_LOGIN is an alias for the oc login command.
    2. Wait for watsonxai-cr to reach the Completed state:
      oc get watsonxai -n ${PROJECT_CPD_INST_OPERANDS}

      Check that the command returns output such as in the following example:

      NAME           VERSION   RECONCILED   STATUS      AGE
      watsonxai-cr   9.1.0     9.1.0        Completed 4d5h
    3. Scale up the following deployment:
      oc scale deploy caikit-runtime-stack-operator -n ${PROJECT_CPD_INST_OPERATORS} --replicas=1

3. Preparing to restore IBM Software Hub to a different cluster

Complete the following prerequisite tasks before you restore an offline backup. Some tasks are service-specific, and need to be done only when those services are installed.

3.1 Preparing the target cluster

Prepare the target cluster that you want to use to restore IBM Software Hub.

  1. Make sure that the target cluster meets the following requirements:
    • The target cluster has the same storage classes as the source cluster.
    • For environments that use a private container registry, such as air-gapped environments, the target cluster has the same image content source policy as the source cluster. For details on configuring the image content source policy, see Configuring an image content source policy for IBM Software Hub software images.
    • The target cluster must be able to pull software images. For details, see Updating the global image pull secret for IBM Software Hub.
    • The deployment environment of the target cluster is the same as the source cluster.
      • The target cluster uses the same hardware architecture as the source cluster. For example, x86-64.
      • The target cluster is on the same OpenShift version as the source cluster.
      • The target cluster allows for the same node configuration as the source cluster. For example, if the source cluster uses a custom KubeletConfig, the target cluster must allow the same custom KubeletConfig.
      • Moving between IBM Cloud and non-IBM Cloud deployment environments is not supported.
  2. If you are using node labels as the method for identifying nodes in the cluster, re-create the labels on the target cluster.
    Best practice: Use node labels instead of node lists when you are restoring a IBM Software Hub deployment to a different cluster, especially if you plan to enforce node pinning. Node labels enable node pinning with minimal disruption. To learn more, see Passing node information to IBM Software Hub.
  3. If your cluster pulls images from a private container registry or if your cluster is in a restricted network, push images that the OADP backup and restore utility needs to the private container registry so that users can run the restore commands against the cluster.

    For details, see 2. Moving images for backup and restore to a private container registry.

  4. Install the components that the OADP backup and restore utility uses.
    Tip: You need to use the same configuration information that you specified in the source cluster. For example, when you install OADP, use the same credentials and DataProtectionApplication configuration that was specified on the source cluster.
    1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
      ${OC_LOGIN}
      Remember: OC_LOGIN is an alias for the oc login command.
    2. Create the environment variables that the utility needs so that you can copy commands from the documentation and run them without making any changes.
    3. Create the ${OADP_PROJECT} project where you want to install the OADP operator.
    4. Annotate the ${OADP_PROJECT} project so that Restic pods can be scheduled on all nodes.
      oc annotate namespace ${OADP_PROJECT} openshift.io/node-selector=""
    5. Install the cpdbr-tenant service role-based access controls (RBACs).
      Note: Run the cpdbr installation command in the IBM Software Hub operators project even though the project does not yet exist in the target cluster. Do not manually create the project on the target cluster. The project is created during the IBM Software Hub restore process.
      The cluster pulls images from the IBM Entitled Registry
      cpd-cli oadp install \
      --component=cpdbr-tenant \
      --namespace ${OADP_PROJECT} \
      --tenant-operator-namespace ${PROJECT_CPD_INST_OPERATORS} \
      --rbac-only \
      --log-level=debug \
      --verbose
      The cluster pulls images from a private container registry
      cpd-cli oadp install \
      --component=cpdbr-tenant \
      --namespace ${OADP_PROJECT} \
      --tenant-operator-namespace ${PROJECT_CPD_INST_OPERATORS} \
      --cpdbr-hooks-image-prefix=${PRIVATE_REGISTRY}/cpdbr-oadp:${VERSION} \
      --cpfs-image-prefix=${PRIVATE_REGISTRY} \
      --rbac-only \
      --log-level=debug \
      --verbose
    6. Install the Red Hat OADP operator.
    7. Create a secret in the ${OADP_PROJECT} project with the credentials of the S3-compatible object store that you are using to store the backups.

      Credentials must use alphanumeric characters and cannot contain special characters like the number sign (#).

      1. Create a file named credentials-velero that contains the credentials for the object store:
        cat << EOF > credentials-velero
        [default]
        aws_access_key_id=${ACCESS_KEY_ID}
        aws_secret_access_key=${SECRET_ACCESS_KEY}
        EOF
      2. Create the secret.

        The name of the secret must be cloud-credentials.

        oc create secret generic cloud-credentials \
        --namespace ${OADP_PROJECT} \
        --from-file cloud=./credentials-velero
    8. Create the DataProtectionApplication (DPA) custom resource, and specify a name for the instance.
      Tip: You can create the DPA custom resource manually or by using the cpd-cli oadp dpa create command. However, if you use this command, you might need to edit the custom resource afterward to add options that are not available with the command. This step shows you how to manually create the custom resource.
      You might need to change some values.
      • spec.configuration.restic.memory specifies the Restic memory limit. You might need to increase the Restic memory limit if Restic volume backups fail or hang on a large volume, indicated by Restic pod containers restarting due to an OOMKilled Kubernetes error.
      • If the object store is Amazon S3, you can omit s3ForcePathStyle.
      • For object stores with a self-signed certificate, add backupLocations.velero.objectStorage.caCert and specify the base64 encoded certificate string as the value. For more information, see Use Self-Signed Certificate.
      Important:
      • spec.configuration.nodeAgent.timeout specifies the Restic timeout. The default is 1 hour. You might need to increase the Restic timeout if Restic backup or restore fails, indicated by pod volume timeout errors in the Velero log.
      • If only Restic backups are needed, under spec.configuration.velero.defaultPlugins, remove csi.
      • The object storage information (backupLocations.velero.objectStorage) in the source and target cluster DPA configurations must be identical.
      Recommended DPA configuration
      The following example shows the recommended DPA configuration.
      cat << EOF | oc apply -f -
      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        name: dpa-sample
      spec:
        configuration:
          velero:
            customPlugins:
            - image: ${CPDBR_VELERO_PLUGIN_IMAGE_LOCATION}
              name: cpdbr-velero-plugin
            defaultPlugins:
            - aws
            - openshift
            - csi
            podConfig:
              resourceAllocations:
                limits:
                  cpu: "${VELERO_POD_CPU_LIMIT}"
                  memory: 4Gi
                requests:
                  cpu: 500m
                  memory: 256Mi
            resourceTimeout: 60m
          nodeAgent:
            enable: true
            uploaderType: restic
            timeout: 72h
            podConfig:
              resourceAllocations:
                limits:
                  cpu: "${NODE_AGENT_POD_CPU_LIMIT}"
                  memory: 32Gi
                requests:
                  cpu: 500m
                  memory: 256Mi
              tolerations:
              - key: icp4data
                operator: Exists
                effect: NoSchedule
        backupImages: false
        backupLocations:
          - velero:
              provider: aws
              default: true
              objectStorage:
                bucket: ${BUCKET_NAME}
                prefix: ${BUCKET_PREFIX}
              config:
                region: ${REGION}
                s3ForcePathStyle: "true"
                s3Url: ${S3_URL}
              credential:
                name: cloud-credentials
                key: cloud
      EOF
      5.1.0-5.1.2 DPA configuration if watsonx™ Orchestrate is installed
      If your IBM Software Hub deployment includes watsonx Orchestrate, add the appcon-plugin to the DPA configuration. To obtain the link to the appcon-plugin image, see Backing up and restoring your IBM App Connect resources and persistent volumes on Red Hat OpenShift.
      cat << EOF | oc apply -f -
      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        name: dpa-sample
      spec:
        configuration:
          velero:
            customPlugins:
            - image: ${CPDBR_VELERO_PLUGIN_IMAGE_LOCATION}
              name: cpdbr-velero-plugin
            - image: '<appcon-plugin-image-link>'
              name: appcon-plugin
            defaultPlugins:
            - aws
            - openshift
            - csi
            podConfig:
              resourceAllocations:
                limits:
                  cpu: "${VELERO_POD_CPU_LIMIT}"
                  memory: 4Gi
                requests:
                  cpu: 500m
                  memory: 256Mi
            resourceTimeout: 60m
          nodeAgent:
            enable: true
            uploaderType: restic
            timeout: 72h
            podConfig:
              resourceAllocations:
                limits:
                  cpu: "${NODE_AGENT_POD_CPU_LIMIT}"
                  memory: 32Gi
                requests:
                  cpu: 500m
                  memory: 256Mi
              tolerations:
              - key: icp4data
                operator: Exists
                effect: NoSchedule
        backupImages: false
        backupLocations:
          - velero:
              provider: aws
              default: true
              objectStorage:
                bucket: ${BUCKET_NAME}
                prefix: ${BUCKET_PREFIX}
              config:
                region: ${REGION}
                s3ForcePathStyle: "true"
                s3Url: ${S3_URL}
              credential:
                name: cloud-credentials
                key: cloud
      EOF
    9. After you create the DPA, do the following checks.
      1. Check that the velero pods are running in the ${OADP_PROJECT} project.
        oc get po -n ${OADP_PROJECT}
        The node-agent daemonset creates one node-agent pod for each worker node. For example:
        NAME                                                    READY   STATUS    RESTARTS   AGE
        openshift-adp-controller-manager-678f6998bf-fnv8p       2/2     Running   0          55m
        node-agent-455wd                                        1/1     Running   0          49m
        node-agent-5g4n8                                        1/1     Running   0          49m
        node-agent-6z9v2                                        1/1     Running   0          49m
        node-agent-722x8                                        1/1     Running   0          49m
        node-agent-c8qh4                                        1/1     Running   0          49m
        node-agent-lcqqg                                        1/1     Running   0          49m
        node-agent-v6gbj                                        1/1     Running   0          49m
        node-agent-xb9j8                                        1/1     Running   0          49m
        node-agent-zjngp                                        1/1     Running   0          49m
        velero-7d847d5bb7-zm6vd                                 1/1     Running   0          49m
      2. Verify that the backup storage location PHASE is Available.
        cpd-cli oadp backup-location list

        Example output:

        NAME           PROVIDER    BUCKET             PREFIX              PHASE        LAST VALIDATED      ACCESS MODE
        dpa-sample-1   aws         ${BUCKET_NAME}     ${BUCKET_PREFIX}    Available    <timestamp>
  5. Install the jq JSON command-line utility.
  6. Configure the IBM Software Hub OADP backup and restore utility.
  7. Install Certificate manager and the IBM License Service.

    For details, see Installing shared cluster components for IBM Software Hub.

    Note: You must install the same version of Certificate manager and the IBM License Service that is installed on the source cluster.
  8. If IBM Knowledge Catalog Premium or IBM Knowledge Catalog Standard is installed, install Red Hat OpenShift AI.

3.2 Cleaning up the target cluster after a previous restore

If you previously restored a IBM Software Hub backup or a previous restore attempt was unsuccessful, delete the IBM Software Hub instance projects (namespaces) in the target cluster before you try another restore.

Resources in the IBM Software Hub instance are watched and managed by operators and controllers that run in other projects. To prevent corruption or out of sync operators and resources when you delete a IBM Software Hub instance, Kubernetes resources that have finalizers specified in metadata must be located, and those finalizers must be deleted before you can delete the IBM Software Hub instance.

  1. Log in to Red Hat OpenShift Container Platform as an instance administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Download the cpd-pre-restore-cleanup.sh script from https://github.com/IBM/cpd-cli/tree/master/cpdops/5.1.3.
  3. If the tenant operator project exists and has the common-service NamespaceScope custom resource that identifies all the tenant projects, run the following command:
    ./cpd-pre-restore-cleanup.sh --tenant-operator-namespace="${PROJECT_CPD_INST_OPERATORS}"
  4. If the tenant operator project does not exist or specific IBM Software Hub projects need to be deleted, run the following command.

    If the common-service NamespaceScope custom resource is not available and additional projects, such as tethered projects, need to be deleted, modify the list of comma-separated projects in the --additional-namespaces option as necessary.

    ./cpd-pre-restore-cleanup.sh --additional-namespaces="${PROJECT_CPD_INST_OPERATORS},${PROJECT_CPD_INST_OPERANDS}"
  5. If the IBM Software Hub scheduling service was installed, uninstall it.

    For details, see Uninstalling the scheduling service.

4. Restoring IBM Software Hub to a different cluster

Restore an offline backup of a IBM Software Hub deployment to a different cluster by doing the following tasks.

Restriction: For s390x clusters (IBM Z and LinuxONE), you must run the backup and restore commands from an x86_64 workstation.

4.1 Restoring the scheduling service

If the IBM Software Hub scheduling service is installed on the source cluster, restore the service on the target cluster by doing the following steps.

Check the Known issues and limitations for IBM Software Hub page for any workarounds that you might need to do before you restore a backup.

  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator:
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Restore an offline backup by running one of the following commands.
    The cluster pulls images from the IBM Entitled Registry
    cpd-cli oadp restore create ${PROJECT_SCHEDULING_SERVICE}-restore \
    --from-backup=${PROJECT_SCHEDULING_SERVICE}-offline \
    --include-resources='operatorgroups,configmaps,catalogsources.operators.coreos.com,subscriptions.operators.coreos.com,customresourcedefinitions.apiextensions.k8s.io,scheduling.scheduler.spectrumcomputing.ibm.com' \
    --include-cluster-resources=true \
    --skip-hooks \
    --log-level=debug \
    --verbose \
    --image-prefix=registry.redhat.io/ubi9
    The cluster pulls images from a private container registry
    cpd-cli oadp restore create ${PROJECT_SCHEDULING_SERVICE}-restore \
    --from-backup=${PROJECT_SCHEDULING_SERVICE}-offline \
    --include-resources='operatorgroups,configmaps,catalogsources.operators.coreos.com,subscriptions.operators.coreos.com,customresourcedefinitions.apiextensions.k8s.io,scheduling.scheduler.spectrumcomputing.ibm.com' \
    --include-cluster-resources=true \
    --skip-hooks \
    --log-level=debug \
    --verbose \
    --image-prefix=${PRIVATE_REGISTRY_LOCATION}

4.2 Restoring an IBM Software Hub instance

Restore an IBM Software Hub instance to a different cluster by doing the following steps.

Notes:
  • Check the Known issues and limitations for IBM Software Hub page for any workarounds that you might need to do before you restore a backup.

  • You cannot restore a backup to a different project of the IBM Software Hub instance.

  • If service-related custom resources are manually placed into maintenance mode prior to creating an online backup, those custom resources will remain in the same state if the backup is restored. Taking these services out of maintenance mode must be done manually after the restore.

  • For s390x clusters (IBM Z and LinuxONE), you must run the backup and restore commands from an x86_64 workstation
  • If running a restore command produces a Failed or PartiallyFailed error, you must clean up the IBM Software Hub instance and restart the restore process.

Check the Known issues and limitations for IBM Software Hub page for any workarounds that you might need to do before you restore a backup.

  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator:
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Restore IBM Software Hub by running one of the following commands.
    The cluster pulls images from the IBM Entitled Registry
    cpd-cli oadp tenant-restore create ${TENANT_OFFLINE_BACKUP_NAME}-restore \
    --from-tenant-backup ${TENANT_OFFLINE_BACKUP_NAME} \
    --image-prefix=registry.redhat.io/ubi9 \
    --verbose \
    --log-level=debug &> ${TENANT_OFFLINE_BACKUP_NAME}-restore.log&
    The cluster pulls images from a private container registry
    cpd-cli oadp tenant-restore create ${TENANT_OFFLINE_BACKUP_NAME}-restore \
    --from-tenant-backup ${TENANT_OFFLINE_BACKUP_NAME} \
    --image-prefix=${PRIVATE_REGISTRY_LOCATION}/ubi9 \
    --verbose \
    --log-level=debug &> ${TENANT_OFFLINE_BACKUP_NAME}-restore.log&
  3. Get the status of the installed components:
    1. Log the cpd-cli in to the Red Hat OpenShift Container Platform cluster:
      ${CPDM_OC_LOGIN}
      Remember: CPDM_OC_LOGIN is an alias for the cpd-cli manage login-to-ocp command.
    2. Run the appropriate command for your environment:
      Installations with tethered projects
      cpd-cli manage get-cr-status \
      --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
      --tethered_instance_ns=${PROJECT_CPD_INSTANCE_TETHERED_LIST}
      Installations without tethered projects
      cpd-cli manage get-cr-status \
      --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS}
    3. Ensure that the status of all of the services is Completed or Succeeded.
  4. To view a list of restores, run the following command:
    cpd-cli oadp tenant-restore list
  5. To view the detailed status of the restore, run the following command:
    cpd-cli oadp tenant-restore status ${TENANT_BACKUP_NAME}-restore \
    --details
    The command shows a varying number of sub-restores in the following form:
    cpd-tenant-r-xxx
    Tip: If you need more information, listed in the status details are sub-restores (of type group). You can view more information about these sub-restores by running the following command:
    cpd-cli oadp restore status <SUB_RESTORE_NAME> \
    --details
  6. To view logs of the tenant restore, run the following command:
    cpd-cli oadp tenant-restore log ${TENANT_BACKUP_NAME}-restore
Best practice: If your IBM Software Hub deployment has services that connect to an external database, and you followed the recommendation to back up the database at the same time that you back up IBM Software Hub, restore the database backup that was taken at the same time as the IBM Software Hub backup.

5. Completing post-restore tasks

Complete additional tasks for the control plane and some services after you restore an IBM Software Hub deployment from an offline backup.

5.1 Applying cluster HTTP proxy settings or other RSI patches to the control plane

If you applied cluster HTTP proxy settings or other RSI patches to an IBM Software Hub instance in the source cluster, the evictor cronjob runs every 30 minutes to patch pods that did not get patched. Optionally, you can run the following command to apply the patches:
cpd-cli manage apply-rsi-patches --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} -vvv

5.2 Patching Cognos Analytics instances

If a Db2 OLTP database within the cluster is used for a Cognos Analytics content store or audit database, the Cognos Analytics service instance must be patched. Because the Db2 database host and port might be different in the target cluster, update these values in the Cognos Analytics service instance to the correct values to ensure that the instance starts successfully. Do the following steps:
  1. Patch the content store and audit database ports in the Cognos Analytics service instance by running the following script:
    #!/usr/bin/env bash
    #-----------------------------------------------------------------------------
    #Licensed Materials - Property of IBM
    #IBM Cognos Products: ca
    #(C) Copyright IBM Corp. 2024
    #US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule
    #-----------------------------------------------------------------------------
    set -e
    #set -x
    
    function usage {
        echo $0: usage: $0 [-h] -t tethered_namespace -a audit_db_port_number -c cs_db_port_number [-v]
    }
    
    function help {
        usage
        echo "-h prints help to the console"
        echo "-t tethered namespace (required)"
        echo "-a Audit DB port number"
        echo "-c CS DB port number"
        echo "-v turn on verbose mode"
        echo ""
        exit 0
    }
    
    while getopts ":ht:a:c:v" opt; do
        case ${opt} in
            h)
                help
                ;;
            t)
                tethered_namespace=$OPTARG
                ;;
            a)
                audit_db_port_number=$OPTARG
                ;;
            c)
                cs_db_port_number=$OPTARG
                ;;
            v)
                verbose_flag="true"
                ;;
            ?)
                usage
                exit 0
                ;;
        esac
    done
    
    if [[ -z ${tethered_namespace} ]]; then
        echo "A tethered namespace must be provided"
        help
    fi
    
    echo "Get CAServiceInstance Name"
    cr_name=$(oc -n ${tethered_namespace} get caserviceinstance --no-headers -o custom-columns=NAME:.metadata.name)
    if [[ -z ${cr_name} ]]; then
        echo "Unable to find CAServiceInstance CR for namespace: ${tethered_namespace}"
        help
    fi
    
    if [[ ! -z ${cs_db_port_number} ]]; then
        echo "Updating CS Database Port Number in the Custom Resource ${cr_name}..."
        oc patch caserviceinstance ${cr_name} --type merge -p "{\"spec\":{\"cs\":{\"database_port\":\"${cs_db_port_number}\"}}}" -n ${tethered_namespace}
    fi
    
    if [[ ! -z ${audit_db_port_number} ]]; then
        echo "Updating Audit Database Port Number in the Custom Resource ${cr_name}..."
        oc patch caserviceinstance ${cr_name} --type merge -p "{\"spec\":{\"audit\":{\"database_port\":\"${audit_db_port_number}\" }}}" -n ${tethered_namespace}
    fi
    
    sleep 20
    check_status="Completed"
  2. Check the status of the Cognos Analytics reconcile action:
    for i in {1..240};do
    caStatus=$(oc get caserviceinstance ${cr_name} -o jsonpath="{.status.caStatus}" -n ${tethered_namespace})
    
    if [[ ${caStatus} == ${check_status} ]];then
        echo "ca ${check_status} Successfully"
        break
    elif [[ ${caStatus} == "Failed" ]];then
        echo "ca ${caStatus}!"
        exit 1
    fi
    echo "ca Status: ${caStatus}"
    sleep 30
    
    done

5.3 Restarting Data Replication replications

After IBM Software Hub is restored, do the following steps:
  1. Connect to the restored IBM Software Hub instance.
  2. Go to the restored replications and stop them.
  3. Restart the replications.

5.4 Restoring Db2

If Q Replication is enabled on the source cluster, the service must be re-enabled after the restore. Follow the instructions in the following topics:

5.5 Restoring Db2 Warehouse

If Q Replication is enabled on the source cluster, the service must be re-enabled after the restore. Follow the instructions in the following topics:

5.6 Restarting IBM Knowledge Catalog lineage pods

After a restore, restart the following lineage pods so that you can access lineage data from the knowledge graph:
  • wkc-data-lineage-service-xxx
  • wdp-kg-ingestion-service-xxx
Do the following steps:
  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator:
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Restart the wkc-data-lineage-service-xxx pod:
    oc delete -n ${PROJECT_CPD_INST_OPERANDS} "$(oc get pods -o name -n ${PROJECT_CPD_INST_OPERANDS} | grep wkc-data-lineage-service)"
  3. Restart the wdp-kg-ingestion-service-xxx pod:
    oc delete -n ${PROJECT_CPD_INST_OPERANDS} "$(oc get pods -o name -n ${PROJECT_CPD_INST_OPERANDS} | grep wdp-kg-ingestion-service)"

5.7 Verifying the Watson Machine Learning restore operation

After restoring from a backup, users might be unable to deploy new models and score existing models. To resolve this issue, after the restore operation, wait until operator reconciliation completes.

  1. Log in to Red Hat OpenShift Container Platform as a user with sufficient permissions to complete the task.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Check the status of the operator with the following commands:
    export PROJECT_WML=<wml-namespace>
    kubectl describe WmlBase wml-cr -n ${PROJECT_WML} | grep "Wml Status" | awk '{print $3}'
    
  3. After backup and restore operations, before using Watson Machine Learning, make sure that the wml-cr is in completed state and all the wml pods are in running state. Use this command to check that all wml pods are in running state:
    oc get pods -n <wml-namespace> -l release=wml

5.8 Retraining existing watsonx Assistant skills

After restoring the watsonx Assistant backup, it is necessary to retrain the existing skills. This involves modifying a skill, to trigger training. The training process for a skill typically requires less than 10 minutes to complete. For more information, see the Retraining your backend model section in the IBM Cloud documentation.

5.9 Starting the ibm-granite-20b-code-cobol-v1-predictor pod in the watsonx Code Assistant for Z service

5.1.2 and later If the ibm-granite-20b-code-cobol-v1-predictor pod is not running, start it.

  1. Check whether the ibm-granite-20b-code-cobol-v1-predictor pod is in a Running state by running the following command:
    oc get po -n ${PROJECT_CPD_INST_OPERANDS} | grep ibm-granite-20b-code-cobol-v1-predictor
  2. If the ibm-granite-20b-code-cobol-v1-predictor pod is not in a Running state, edit the pod:
    oc edit deploy -n ${PROJECT_CPD_INST_OPERANDS} ibm-granite-20b-code-cobol-v1-predictor
  3. Under startupProbe, check if initialDelaySeconds: 200 is missing. If it is missing, add it.
    ...
    startupProbe:
      failureThreshold: 200
      httpGet:
        path: /health
        port: http
        scheme: HTTP
      initialDelaySeconds: 200
      periodSeconds: 10
    ...

5.10 Restoring services that do not support offline backup and restore

The following list shows the services that don't support offline backup and restore. If any of these services are installed in your IBM Software Hub deployment, do the appropriate steps to make them functional after a restore.

Data Gate
Data Gate synchronizes Db2 for z/OS data in real time. After IBM Software Hub is restored, data might be out of sync from Db2 for z/OS. It is recommended that you re-add tables after IBM Software Hub foundational services are restored.
MongoDB
The service must be deleted and reinstalled. Recreate the instance as a new instance, and then restore the data with MongoDB tools. For more information, see Installing the MongoDB service and Back Up and Restore with MongoDB Tools.
Watson Discovery

The service must be uninstalled, reinstalled, then the data restored.

Watson Speech services
The service is functional and you can re-import data. For more information, see Importing and exporting data.