Troubleshooting backup and restore

Troubleshooting steps for problems with backup and restore of the analytics database on Kubernetes and OpenShift.

Check the CR status

List the analytics backup jobs:
kubectl -n <namespace> get analyticsbackup -n <namespace>
NAME                  STATUS   ID                                     CR TYPE                   AGE   COMMENT
<backup name>         Ready                                           create                   <age>   <comment>
check the status of the backup that you are interested in:
kubectl -n <namespace> get analyticsbackup <backup name> -o yaml
List the analytics restore jobs:
kubectl get analyticsrestore -n <namespace>
NAME              STATUS     ID                                        AGE
<restore name>    Ready      analytics-all-2022-07-01t04:49:15utc               1h
check the status of the restore job you are interested in:
kubectl -n <namespace> get analyticsrestore <restore name> -o yaml

Check the logs

Check for backup or restore related errors in the logs of the API Connect operator pod and analytics warehouse:

  1. Identify the API Connect operator pod ibm-apiconnect by running the following command:
    kubectl -n <namespace> get pods
    Note: <namespace> might not be the same namespace as your analytics subsystem. On a default OpenShift deployment, the API Connect operator runs in a global namespace called openshift-operators.
  2. View the API Connect operator pod logs with:
    kubectl -n <namespace> logs <ibm-apiconnect pod name>

    Check for messages with the error label that contain the words analytics, backup, or restore.

  3. Identify the name of your warehouse pod:
    kubectl get pods | grep warehouse
  4. View warehouse pods logs with:
    kubectl logs <warehouse pod name>

Common problems with backup and restore

These are some frequent issues that can cause backup or restore to fail:
  • Invalid login credentials for your remote backup server. Check that the username and password that you configured in your subsystem database backup settings is correct.
  • The user does not have write permissions on the backup server. Check that the username you configured in your subsystem database backup settings has write permission to the specified backup path.
  • Remote backup server storage full. Check your remote backup server and if the storage is full either extend the storage space or delete older backups.
  • No network access to the remote backup server. Check that you can communicate with your remote backup server from your API Connect environment.
  • TLS handshaking failure with the remote backup server. If your remote backup server has a self-signed CA certificate, check that this certificate is trusted by your API Connect deployment, see the database backup configuration steps for your subsystem for more information.

Increasing storage space for your local backup PVC

Local database backups are stored on the warehouse PVC. The warehouse PVC has either data-a7s-warehouse or data-analytics-warehouse in its name. By default the warehouse PVC has 150 Gi of storage space.

If you find you do not have enough space in your warehouse PVC, then follow these steps:

  1. Disable analytics database backups.
    Edit your analytics CR:
    kubectl edit a7s
    set spec.databaseBackup.enabled = false.
  2. Delete the warehouse PVC:
    1. Identify the name of your warehouse PVC:
      kubectl get pvc | grep warehouse
    2. Delete the identified PVC:
      kubectl delete pvc <warehouse PVC name>
  3. Edit your analytics CR and increase the size of your warehouse PVC.

    Set the new size in spec.storage.backup.volumeClaimTemplate.volumeSize.

  4. Re-enable analytics database backups.

    Edit your analytics CR and set spec.databaseBackup.enabled = true.

Backup failure due to warehouse node failure

The analytics warehouse service is responsible for staging and exporting analytics database backups to your remote SFTP or S3 object-store. The warehouse service runs as a single Kubernetes pod on all deployment profiles. In a three replica deployment, if the worker node where the warehouse pod is running fails, then take the manual steps that are documented here to recover the service.

When the warehouse pod is not running, the analytics CR reports that status of Pending, and all backup related features are unavailable.

To recover the warehouse service on a different worker node, complete the following steps:

  1. Disable analytics database backups.
    Edit your analytics CR:
    kubectl edit a7s

    set spec.databaseBackup.enabled = false.

  2. Identify the name of the warehouse PVC:
    kubectl get pvc | grep warehouse
    Look for the warehouse PVC in the output:
    
    NAME                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    warehouse-data-analytics-warehouse-0   Bound    pvc-4cbbb8a0-10ad-45de-b7da-ae6c26bde1de   150Gi      RWO            standard       24h
  3. Delete the warehouse PVC:
    kubectl delete pvc <name of warehouse pvc>
  4. Re-enable analytics database backups.

    Edit your analytics CR and set spec.databaseBackup.enabled = true.

    The PVC is then re-created on one of the two available nodes, and the warehouse pod starts on this same node.