Restoring a backup of Guardium Insights

This topic describes the procedure for restoring a backup of Guardium Insights.

Before you begin

Before you start the restore process:
  • Ensure that the target system is in normal running status - as the restore process does not have the capability to recover a broken cluster.
  • Ensure that you have permission to access the backup files before attempting to restore. For example, issue the chmod 777 -R <backup_directory> command.
Note: You cannot restore a backup from version 3.2.0 to version 3.2.x. Instead, restore to version 3.2.0. Then, patch version 3.2.0 to version 3.2.x.

Procedure

  1. Ensure that you are logged in to the IBM Cloud® Private command line interface. This also ensures that you are authenticated to the OpenShift® command line interface. Use this command to log in:
    cloudctl login -a <ICP_hostname> -u <openshift_username> -p <openshift_password> --skip-ssl-validation staging

    Where

    • <ICP_hostname> is your Cloud Private server, for example https://cp-console.apps.myserver.com
    • <openshift_username> is your OpenShift username.
    • <openshift_password> is your OpenShift password.
  2. Prepare a custom resource file named gi-restore.yaml by following the examples in Guardium Insights restore custom resource file options.
  3. Create the backup resource:
    oc apply -f gi-restore.yaml

    The expected results should be similar to:

    restore.gi.ds.isc.ibm.com/insights created
  4. Ensure that the custom resource has been created:
    oc get restore

    The expected results should be similar to:

    NAME       AGE
    insights   10s
  5. Confirm that the job has been created.
    1. Issue this command:
      oc get job|grep restore

      The expected results should be similar to:

      insights-restore                                             0/1           35s        35s
    2. If the job is not created after one minute, the most likely cause is a bug in the operator cache and that it is still holding the historical backup resource. To correct this, restart the operator:
      oc delete pod $(oc get pod |awk '/guardiuminsights-controller-manager/{print $1;}')

      The expected results should be similar to:

      pod "guardiuminsights-controller-manager-756b55dff9-zgz5g" deleted
    3. If needed, we remove the backup restore:
      oc delete restore insights

      The expected results should be similar to:

      restore.gi.ds.isc.ibm.com "insights" deleted
    4. Then recreate the backup restore:
      oc apply -f gi-restore.yaml

      The expected results should be similar to:

      restore.gi.ds.isc.ibm.com/insights created
    5. Now, restart the operator:
      oc delete pod $(oc get pod |awk '/guardiuminsights-controller-manager/{print $1;}')

      The expected results should be similar to:

      pod "guardiuminsights-controller-manager-756b55dff9-zgz5g" deleted
    6. Check again to see if the job exists. If it does not, repeat the above steps.
  6. Wait for the restore pod to show up (the job and its pod should start in a few seconds).
    oc get pod |grep restore

    The expected results should be similar to:

    insights-restore-n7rgm            0/1       Pending     0          60s

    The job and its pod should start in a few seconds.

  7. Confirm that the status of the pod is Running.
    1. Issue this command:
      oc get pod |grep restore
    2. If the status shows Pending, similar to this:
      insights-restore-n7rgm            0/1       Pending     0          60s

      This means that the PV is still attached to the PVC.

    3. To determine the status of the PV, issue this command:
      oc get pv|grep backup

      If the PV is attached to the PVC, the expected results will be similar to:

      pvc-7f8c3bb4-5a2c-4408-ad25-fe4f20b604f8   50Gi       RWO            Retain           Released   staging/backup            rook-ceph-block             2d21h
    4. To manually release the PV, get its name from the above results (in this example, it is pvc-7f8c3bb4-5a2c-4408-ad25-fe4f20b604f8), and then issue this command:
      oc patch pv pvc-7f8c3bb4-5a2c-4408-ad25-fe4f20b604f8 -p '{"spec":{"claimRef": null}}'

      The expected results should be similar to:

      persistentvolume/pvc-7f8c3bb4-5a2c-4408-ad25-fe4f20b604f8 patched
    5. Now when you check the status of the pod, it should be Running. Issue this command:
      oc get pod |grep restore

      The expected results should show the Running status:

      insights-restore-n7rgm         1/1       Running     0          6m29s
  8. Watch the pod logs:
    oc logs --follow  insights-restore-n7rgm
    . . . . .
    
    . . . . .
    
    . . . . .
  9. Confirm all services are accessible and that data is available.
  10. If your NFS server is configured to work with backup pod and backup PV, then complete the following steps to manually backup datamarts. If you did not allocate storage for backup during installation then the datamart data can be copied directly to the db2 pod /mnt/blumeta0/scratch/insights-datamart/.
    1. Login to the remote NFS. Copy the datamart backup files from the target backup directory to the shared PVC directory at the base directory of NFS.
      In the following example, the backup-pvc-support-pvc directory and the folder datamart-temp are created to hold the datamart data.
      cp -r /data/insights/v3.2.0_backups/<Backup-Dir>/meta/datamart-backup   /data/insights/backup-pvc-support-pvc/datamart_temp
    2. Login to the restored cluster and enter the DB2 pod:
      NAMESPACE=$(oc get guardiuminsights | cut -d' ' -f1 | tail -n 1)
      oc exec -it  -n $NAMESPACE c-$NAMESPACE-db2-db2u-0 -- bash
    3. Copy the datamart backup from the mounted remote NFS to the DB2 pod and change the file permissions of the datamarts.
      sudo chown -R db2inst1 /mnt/backup/datamart_temp
      su - db2inst1
      cp -r /mnt/backup/datamart_temp/* /mnt/blumeta0/scratch/insights-datamart/
    4. Restart the ssh-service pod to re-process the newly copied datamart data.
      oc get pods | grep ssh-service
      oc delete pod <ssh-service pod>
      
    5. Ensure that the guardium-connector service is up. If it's not, bring it back up by running the following command:
      oc scale deployment.apps/<NAMESPACE>-guardium-connector –replicas=3

What to do next

Use one of these methods to check the log files:

  • To check one pod, issue this command: oc logs --follow <pod>
  • See <gi-backup-xxxx>/ backup-<timestamp>.log <gi-backup-xxxx>/restore-<timestamp>.log. These logs are in the PV under each directory for full backups.