Restoring a backup of Guardium Insights
This topic describes the procedure for restoring a backup of Guardium Insights.
Before you begin
Before you start the restore process, ensure that the target system is in normal running status - as the restore process does not have the capability to recover a broken cluster.
Procedure
- Ensure that you are logged in to the IBM Cloud®
Private command line interface. This also ensures that you are
authenticated to the OpenShift® command line interface.
Use this command to log in:
cloudctl login -a <ICP_hostname> -u <openshift_username> -p <openshift_password> --skip-ssl-validation staging
Where
<ICP_hostname>
is your Cloud Private server, for examplehttps://cp-console.apps.myserver.com
<openshift_username>
is your OpenShift username.<openshift_password>
is your OpenShift password.
- Prepare a custom resource file named gi-restore.yaml by following the examples in Guardium Insights restore custom resource file options.
- Create the backup resource:
oc apply -f gi-restore.yaml
The expected results should be similar to:
restore.gi.ds.isc.ibm.com/insights created
- Ensure that the custom resource has been created:
oc get restore
The expected results should be similar to:
NAME AGE insights 10s
- Confirm that the job has been created.
- Issue this command:
oc get job|grep restore
The expected results should be similar to:
insights-restore 0/1 35s 35s
- If the job is not created after one minute, the most likely cause is a bug in the
operator cache and that it is still holding the historical backup resource. To correct this, restart
the operator:
oc delete pod $(oc get pod |awk '/guardiuminsights-controller-manager/{print $1;}')
The expected results should be similar to:
pod "guardiuminsights-controller-manager-756b55dff9-zgz5g" deleted
- If needed, we remove the backup restore:
oc delete restore insights
The expected results should be similar to:
restore.gi.ds.isc.ibm.com "insights" deleted
- Then recreate the backup restore:
oc apply -f gi-restore.yaml
The expected results should be similar to:
restore.gi.ds.isc.ibm.com/insights created
- Now, restart the operator:
oc delete pod $(oc get pod |awk '/guardiuminsights-controller-manager/{print $1;}')
The expected results should be similar to:
pod "guardiuminsights-controller-manager-756b55dff9-zgz5g" deleted
- Check again to see if the job exists. If it does not, repeat the above steps.
- Issue this command:
- Wait for the restore pod to show up (the job and its pod should start in a few
seconds).
oc get pod |grep restore
The expected results should be similar to:
insights-restore-n7rgm 0/1 Pending 0 60s
The job and its pod should start in a few seconds.
- Confirm that the status of the pod is
Running
.- Issue this command:
oc get pod |grep restore
- If the status shows
Pending
, similar to this:insights-restore-n7rgm 0/1 Pending 0 60s
This means that the PV is still attached to the PVC.
- To determine the status of the PV, issue this command:
oc get pv|grep backup
If the PV is attached to the PVC, the expected results will be similar to:
pvc-7f8c3bb4-5a2c-4408-ad25-fe4f20b604f8 50Gi RWO Retain Released staging/backup rook-ceph-block 2d21h
- To manually release the PV, get its name from the above results (in this example, it
is
pvc-7f8c3bb4-5a2c-4408-ad25-fe4f20b604f8
), and then issue this command:oc patch pv pvc-7f8c3bb4-5a2c-4408-ad25-fe4f20b604f8 -p '{"spec":{"claimRef": null}}'
The expected results should be similar to:
persistentvolume/pvc-7f8c3bb4-5a2c-4408-ad25-fe4f20b604f8 patched
- Now when you check the status of the pod, it should be
Running
. Issue this command:oc get pod |grep restore
The expected results should show the
Running
status:insights-restore-n7rgm 1/1 Running 0 6m29s
- Issue this command:
- Watch the pod logs:
oc logs --follow insights-restore-n7rgm . . . . . . . . . . . . . . .
- Confirm all services are accessible and that data is available.
- If your NFS server is configured to work with backup pod and backup PV, then complete the
following steps to manually backup datamarts. If you did not allocate storage for backup during
installation then the datamart data can be copied directly to the db2 pod
/mnt/blumeta0/scratch/insights-datamart/.
- Login to the remote NFS. Copy the datamart backup files from the target backup
directory to the shared PVC directory at the base directory of NFS. In the following example, the backup-pvc-support-pvc directory and the folder datamart-temp are created to hold the datamart data.
cp -r /data/insights/v3.2.0_backups/<Backup-Dir>/meta/datamart-backup /data/insights/backup-pvc-support-pvc/datamart_temp
- Login to the restored cluster and enter the DB2 pod:
NAMESPACE=$(oc get guardiuminsights | cut -d' ' -f1 | tail -n 1)
oc exec -it -n $NAMESPACE c-$NAMESPACE-db2-db2u-0 -- bash
- Copy the datamart backup from the mounted remote NFS to the DB2 pod and change the
file permissions of the datamarts.
sudo chown -R db2inst1 /mnt/backup/datamart_temp
su - db2inst1
cp -r /mnt/backup/datamart_temp/* /mnt/blumeta0/scratch/insights-datamart/
- Restart the ssh-service pod to re-process the newly copied datamart data.
oc get pods | grep ssh-service
oc delete pod <ssh-service pod>
- Ensure that the guardium-connector service is up. If it's not, bring it back up by
running the following command:
oc scale deployment.apps/<NAMESPACE>-guardium-connector –replicas=3
- Login to the remote NFS. Copy the datamart backup files from the target backup
directory to the shared PVC directory at the base directory of NFS.
What to do next
Use one of these methods to check the log files:
- To check one pod, issue this command:
oc logs --follow <pod>
- See <gi-backup-xxxx>/ backup-<timestamp>.log <gi-backup-xxxx>/restore-<timestamp>.log. These logs are in the PV under each directory for full backups.