Roll back or sync Common Service DB and Zen data to a specified backup

Before you begin

Set the Common Service DB data to the desired state

  1. Determine the velero backup to roll back to.

     velero backup get
    
  2. Verify whether the backup was successful and check the details to see if all resources are saved.

      velero backup describe <__BACKUP_NAME__> --details
    
  3. Substitute the __BACKUP_NAME__ with the name of the backup resource that you gathered in the previous step.

     vi restore-cs-db.yaml
    

    Note: If you do not want to roll back all the multiple instances of Common Services and Common Service DB in the cluster, specify the namespaces to roll back by replacing the '*' under includedNamespaces with target namespaces. Each namespace should be on its own line like the following:

     includedNamespaces:
     - '<cs namespace1>'
     - '<cs namespace2>'
    
  4. Clean up existing Common Service DB restore resources.

    • Remove the cs-db-backup deployment if present
        oc delete deploy cs-db-backup -n <target namespace>
      
    • Remove the cs-db-backup-pvc pvc if present
        oc delete pvc cs-db-backup-pvc -n <target namespace>
      
    • Remove the velero restore object if present:
        velero restore delete restore-cs-db
      
  5. Restore the Common Service DB data.

     oc apply -f restore-cs-db.yaml
    
  6. Check the progress and the details of the restore by using the following commands. Proceed with the next step after the status shows as Completed.

     velero restore get
    
     velero restore describe <__RESTORE_NAME__> --details
    
  7. Verify that the restore completed successfully.

    Check the logs for the velero restore to ensure that the restore went through. Search for the following log: "memcache.go:238] couldn't get current server API group list"

    If this message is present, follow these instructions:

    1. Get the cs-db-restore-job.yaml file.

      wget https://raw.githubusercontent.com/IBM/ibm-common-service-operator/scripts/velero/restore/common-service-db/cs-db-restore-job.yaml
      
    2. Delete the existing cs-db-backup deployment:

      oc delete deploy cs-db-backup -n <cs namespace>
      
    3. Apply the cs-db-restore-job.yaml file.

      oc apply -f cs-db-restore-job.yaml -n
      

      Troubleshooting: If the cs-db-restore-job pod is stuck in ContainerCreating, complete the following steps:

      1. Delete the deployment cs-db-backup.
      2. Make sure the cs-db-backup pod is fully deleted (not Terminating).
      3. Delete the cs-db-restore-job job and its pod (not Terminating).
      4. Apply cs-db-restore-job.yaml file again.

Note: The secondary steps that are listed here must be run only if the restore logs indicate that the restore was not run. Logs like duplicate key error collection are expected and do not indicate a need to run the secondary steps.

Set the Zen 5 data to the desired state

  1. Determine the velero backup to roll back to

     velero backup get
    
  2. Verify whether the backup was successful and check the details to see if all resources are saved.

      velero backup describe <__BACKUP_NAME__> --details
    
  3. Substitute the __BACKUP_NAME__ with the name of the backup resource that you want to roll back to.

     vi restore-zen5-data.yaml
    

    Note:If you do not want to roll back all the multiple instances of Common Services and Common Service DB in the cluster, specify the namespaces to roll back by replacing the '*' under includedNamespaces with target namespaces. Each namespace should be on its own line like the following:

     includedNamespaces:
     - '<cs namespace1>'
     - '<cs namespace2>'
    
  4. Give the Zen 5 backup necessary permissions

    • For each namespace with a zenservice to backup, create a service account. Replace the <zenservice namespace> value before applying.

        oc apply -f zen5-sa.yaml
      
    • Once per zenservice namespace, apply the Role for the zen backup. Replace the <zenservice namespace> value before applying.

        oc apply -f zen5-role.yaml
      
    • Create the RoleBinding to connect the ServiceAccount to the Role.

      1. Edit the zen5-rolebinding.yaml file to add the ServiceAccount created earlier and replace the <zenservice namespace> value.

         vi zen5-rolebinding.yaml
        
      2. Apply the zen5-rolebinding.yaml file

         oc apply -f zen5-rolebinding.yaml
        
  5. Clean up existing zen5 restore resources

    • Remove the zen5-backup deployment if present
        oc delete deploy zen5-backup -n <target namespace>
      
    • Remove the zen5-backup-pvc if present
        oc delete pvc zen5-backup-pvc -n <target namespace>
      
    • Remove the velero restore object if present:
        velero restore delete restore-zen5-data
      
  6. Restore the Zen data.

     oc apply -f restore-zen5-data.yaml
    
  7. Check the progress and the details of the restore by using the following commands. Proceed with the next step after the status shows as Completed.

     velero restore get
    
     velero restore describe <__RESTORE_NAME__> --details
    
  8. Check logs of the velero restore to verify that the data was restored

     velero restore logs restore-zen5-data
    
    • Search for restore_zen5 to find relevant logs. If it is not present, the restore did not run. If the logs or the data indicate that the restore was not successful, the following steps can be taken as a workaround:

      1. Get the zen5-restore-job file:

         wget https://raw.githubusercontent.com/IBM/ibm-common-service-operator/scripts/velero/restore/zen/zen5-restore-job.yaml
        
      2. Delete the existing zen5-backup deployment

         oc delete deploy zen5-backup -n <target namespace>
        
      3. Wait for the zen5-backup pods to fully delete (fully gone, not Terminating)

      4. Edit the zen5-restore-job.yaml file. Update any field in <>. These are the target restore namespace (<zenservice namesapce>) and the target zenservice (<zenservice name>).

      5. Apply the zen5-restore-job.yaml file

         oc apply -f zen5-restore-job.yaml
        
      6. Wait for the job to complete, then check the logs of the zen5-restore-job pod to verify that the restore is completed.

      7. Repeat as needed for each namespace with a zenservice instance installed.

  9. Wait for the zenservice instances to come ready. Once the Progress field is 100%, the instance is ready. The following command will continuously output the percentage of all the zenservices on the cluster.

     oc get zenservice -A -w -o yaml | grep Progress:
    

Troubleshooting: