Troubleshooting backup and restore
Troubleshooting steps for problems with backup and restore of the analytics database on Kubernetes and OpenShift.
Check the CR status
kubectl -n <namespace> get analyticsbackup -n <namespace>
NAME STATUS ID CR TYPE AGE COMMENT
<backup name> Ready create <age> <comment>
check the status of the backup that
you are interested
in:kubectl -n <namespace> get analyticsbackup <backup name> -o yaml
kubectl get analyticsrestore -n <namespace>
NAME STATUS ID AGE
<restore name> Ready analytics-all-2022-07-01t04:49:15utc 1h
check the status of the restore job
you are interested
in:kubectl -n <namespace> get analyticsrestore <restore name> -o yaml
Check the logs
Check for backup or restore related errors in the logs of the API Connect operator pod and analytics warehouse:
- Identify the API Connect
operator pod
ibm-apiconnect
by running the following command:kubectl -n <namespace> get pods
Note: <namespace> might not be the same namespace as your analytics subsystem. On a default OpenShift deployment, the API Connect operator runs in a global namespace calledopenshift-operators
. - View the API Connect
operator pod logs with:
kubectl -n <namespace> logs <ibm-apiconnect pod name>
Check for messages with the
error
label that contain the wordsanalytics
,backup
, orrestore
. - Identify the name of your warehouse
pod:
kubectl get pods | grep warehouse
- View warehouse pods logs
with:
kubectl logs <warehouse pod name>
Common problems with backup and restore
- Invalid login credentials for your remote backup server. Check that the username and password that you configured in your subsystem database backup settings is correct.
- The user does not have write permissions on the backup server. Check that the username you configured in your subsystem database backup settings has write permission to the specified backup path.
- Remote backup server storage full. Check your remote backup server and if the storage is full either extend the storage space or delete older backups.
- No network access to the remote backup server. Check that you can communicate with your remote backup server from your API Connect environment.
- TLS handshaking failure with the remote backup server. If your remote backup server has a self-signed CA certificate, check that this certificate is trusted by your API Connect deployment, see the database backup configuration steps for your subsystem for more information.
Increasing storage space for your local backup PVC
Local database backups are stored on the warehouse PVC. The warehouse PVC has either
data-a7s-warehouse
or data-analytics-warehouse
in its name. By
default the warehouse PVC has 150 Gi of storage space.
If you find you do not have enough space in your warehouse PVC, then follow these steps:
- Disable analytics database backups. Edit your analytics CR:
setkubectl edit a7s
spec.databaseBackup.enabled = false
. - Delete the warehouse PVC:
- Identify the name of your warehouse
PVC:
kubectl get pvc | grep warehouse
- Delete the identified
PVC:
kubectl delete pvc <warehouse PVC name>
- Identify the name of your warehouse
PVC:
- Edit your analytics CR and increase the size of your warehouse PVC.
Set the new size in
spec.storage.backup.volumeClaimTemplate.volumeSize
. - Re-enable analytics database backups.
Edit your analytics CR and set
spec.databaseBackup.enabled = true
.
Backup failure due to warehouse node failure
The analytics warehouse service is responsible for staging and exporting analytics database backups to your remote SFTP or S3 object-store. The warehouse service runs as a single Kubernetes pod on all deployment profiles. In a three replica deployment, if the worker node where the warehouse pod is running fails, then take the manual steps that are documented here to recover the service.
When the warehouse pod is not running, the analytics CR reports that status of
Pending
, and all backup related features are unavailable.
To recover the warehouse service on a different worker node, complete the following steps:
- Disable analytics database backups. Edit your analytics CR:
kubectl edit a7s
set
spec.databaseBackup.enabled = false
. - Identify the name of the warehouse
PVC:
kubectl get pvc | grep warehouse
Look for the warehouse PVC in the output:NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE warehouse-data-analytics-warehouse-0 Bound pvc-4cbbb8a0-10ad-45de-b7da-ae6c26bde1de 150Gi RWO standard 24h
- Delete the warehouse
PVC:
kubectl delete pvc <name of warehouse pvc>
- Re-enable analytics database backups.
Edit your analytics CR and set
spec.databaseBackup.enabled = true
.The PVC is then re-created on one of the two available nodes, and the warehouse pod starts on this same node.