Roll back or sync Common Service DB and Zen data to a specified backup
Before you begin
-
There must be an existing velero backup to roll back to. For information on the backup and restore process, see IBM Cloud Pak foundational services backup and restore.
Download the necessary files for restoring different resources:
wget https://raw.githubusercontent.com/IBM/ibm-common-service-operator/scripts/velero/restore/restore-cs-db.yaml wget https://raw.githubusercontent.com/IBM/ibm-common-service-operator/scripts/velero/schedule/common-service-db/cs-db-br-scripts-cm.yaml wget https://raw.githubusercontent.com/IBM/ibm-common-service-operator/scripts/velero/schedule/common-service-db/cs-db-sa.yaml wget https://raw.githubusercontent.com/IBM/ibm-common-service-operator/scripts/velero/schedule/common-service-db/cs-db-role.yaml wget https://raw.githubusercontent.com/IBM/ibm-common-service-operator/scripts/velero/schedule/common-service-db/cs-db-rolebinding.yaml
Note: Completing the rollback procedure removes any data not present in the backup used.
-
The following instructions support two processes depending on the relative date of the backup that you use. If the backup that you select is older than what is presently on the target cluster, the data is rolled back to this previous state. If the backup that you select is newer than what is presently on the target cluster, the data is synced to match this newer backup. There is no difference in the process besides which backup you use.
Set the Common Service DB data to the desired state
-
Determine the velero backup to roll back to.
velero backup get
-
Verify whether the backup was successful and check the details to see if all resources are saved.
velero backup describe <__BACKUP_NAME__> --details
-
Substitute the
__BACKUP_NAME__
with the name of the backup resource that you gathered in the previous step.vi restore-cs-db.yaml
Note: If you do not want to roll back all the multiple instances of Common Services and Common Service DB in the cluster, specify the namespaces to roll back by replacing the
'*'
underincludedNamespaces
with target namespaces. Each namespace should be on its own line like the following:includedNamespaces: - '<cs namespace1>' - '<cs namespace2>'
-
Clean up existing Common Service DB restore resources.
- Remove the
cs-db-backup
deployment if presentoc delete deploy cs-db-backup -n <target namespace>
- Remove the
cs-db-backup-pvc
pvc if presentoc delete pvc cs-db-backup-pvc -n <target namespace>
- Remove the velero restore object if present:
velero restore delete restore-cs-db
- Remove the
-
Restore the Common Service DB data.
oc apply -f restore-cs-db.yaml
-
Check the progress and the details of the restore by using the following commands. Proceed with the next step after the status shows as
Completed
.velero restore get
velero restore describe <__RESTORE_NAME__> --details
-
Verify that the restore completed successfully.
Check the logs for the velero restore to ensure that the restore went through. Search for the following log:
"memcache.go:238] couldn't get current server API group list"
If this message is present, follow these instructions:
-
Get the
cs-db-restore-job.yaml
file.wget https://raw.githubusercontent.com/IBM/ibm-common-service-operator/scripts/velero/restore/common-service-db/cs-db-restore-job.yaml
-
Delete the existing
cs-db-backup
deployment:oc delete deploy cs-db-backup -n <cs namespace>
-
Apply the
cs-db-restore-job.yaml
file.oc apply -f cs-db-restore-job.yaml -n
Troubleshooting: If the
cs-db-restore-job
pod is stuck inContainerCreating
, complete the following steps:- Delete the deployment
cs-db-backup
. - Make sure the
cs-db-backup
pod is fully deleted (notTerminating
). - Delete the
cs-db-restore-job
job and its pod (notTerminating
). - Apply
cs-db-restore-job.yaml
file again.
- Delete the deployment
-
Note: The secondary steps that are listed here must be run only if the restore logs indicate that the restore was not run. Logs like duplicate key error collection
are expected and do not indicate a need to run the
secondary steps.
Set the Zen 5 data to the desired state
-
Determine the velero backup to roll back to
velero backup get
-
Verify whether the backup was successful and check the details to see if all resources are saved.
velero backup describe <__BACKUP_NAME__> --details
-
Substitute the
__BACKUP_NAME__
with the name of the backup resource that you want to roll back to.vi restore-zen5-data.yaml
Note:If you do not want to roll back all the multiple instances of Common Services and Common Service DB in the cluster, specify the namespaces to roll back by replacing the
'*'
underincludedNamespaces
with target namespaces. Each namespace should be on its own line like the following:includedNamespaces: - '<cs namespace1>' - '<cs namespace2>'
-
Give the Zen 5 backup necessary permissions
-
For each namespace with a
zenservice
to backup, create a service account. Replace the<zenservice namespace>
value before applying.oc apply -f zen5-sa.yaml
-
Once per
zenservice
namespace, apply the Role for the zen backup. Replace the<zenservice namespace>
value before applying.oc apply -f zen5-role.yaml
-
Create the RoleBinding to connect the ServiceAccount to the Role.
-
Edit the
zen5-rolebinding.yaml
file to add the ServiceAccount created earlier and replace the<zenservice namespace>
value.vi zen5-rolebinding.yaml
-
Apply the
zen5-rolebinding.yaml
fileoc apply -f zen5-rolebinding.yaml
-
-
-
Clean up existing zen5 restore resources
- Remove the zen5-backup deployment if present
oc delete deploy zen5-backup -n <target namespace>
- Remove the zen5-backup-pvc if present
oc delete pvc zen5-backup-pvc -n <target namespace>
- Remove the velero restore object if present:
velero restore delete restore-zen5-data
- Remove the zen5-backup deployment if present
-
Restore the Zen data.
oc apply -f restore-zen5-data.yaml
-
Check the progress and the details of the restore by using the following commands. Proceed with the next step after the status shows as
Completed
.velero restore get
velero restore describe <__RESTORE_NAME__> --details
-
Check logs of the velero restore to verify that the data was restored
velero restore logs restore-zen5-data
-
Search for
restore_zen5
to find relevant logs. If it is not present, the restore did not run. If the logs or the data indicate that the restore was not successful, the following steps can be taken as a workaround:-
Get the
zen5-restore-job
file:wget https://raw.githubusercontent.com/IBM/ibm-common-service-operator/scripts/velero/restore/zen/zen5-restore-job.yaml
-
Delete the existing
zen5-backup
deploymentoc delete deploy zen5-backup -n <target namespace>
-
Wait for the
zen5-backup
pods to fully delete (fully gone, notTerminating
) -
Edit the
zen5-restore-job.yaml
file. Update any field in<>
. These are the target restore namespace (<zenservice namesapce>
) and the target zenservice (<zenservice name>
). -
Apply the
zen5-restore-job.yaml
fileoc apply -f zen5-restore-job.yaml
-
Wait for the job to complete, then check the logs of the
zen5-restore-job
pod to verify that the restore is completed. -
Repeat as needed for each namespace with a
zenservice
instance installed.
-
-
-
Wait for the
zenservice
instances to come ready. Once theProgress
field is 100%, the instance is ready. The following command will continuously output the percentage of all thezenservices
on the cluster.oc get zenservice -A -w -o yaml | grep Progress:
Troubleshooting:
- Make sure that there is only one
zen5-backup
or onezen5-restore-job
pod in a namespace at any given time as they compete for the same PVC. - If the
zen5-restore-job
pod is stuck inContainerCreating
:- Delete the deployment
zen5-backup
. - Make sure the
zen5-backup
pod is fully deleted (notTerminating
). - Delete the
zen5-restore-job
job and its pod (notTerminating
). - Ensure that the configmap
zen5-br-configmap
, pvczen5-backup-pvc
, rolezen5-backup-role
, rolebindingzen5-backup-rolebinding
, and service accountzen5-backup-sa
are present in the namespace. - Apply the
zen5-restore-job
yaml again.
- Delete the deployment
-
If the configmap
zen5-br-configmap
is not present, it can be downloaded from:wget https://raw.githubusercontent.com/IBM/ibm-common-service-operator/scripts/velero/schedule/zen5-br-scripts-cm.yaml.
Make sure to edit the namespace field before applying with the following command:
oc apply -f zen5-br-scripts-cm.yaml
- Velero restore is less predictable than backup when restoring databases. There is no harm to deleting a velero restore object (that is,
restore-cs-db
orrestore-zen5-data
), deleting the accompanying deployment and pvc, waiting for these items to be fully deleted, then re-creating the velero restore object to try again. Should this still not work, the secondary instructions that use the Common Service DB and zen5 restore jobs can be used on an individual namespace basis. There is no harm to running the restore in a namespace that is already restored.