Troubleshooting management database backup and restore
Common problems and how to diagnose issues with backup and restore.
Common problems with backup and restore
These are some frequent issues that can cause backup or restore to fail:
- Invalid login credentials for your remote backup server. Check that the username and password that you configured in your subsystem database backup settings is correct.
- The user does not have write permissions on the backup server. Check that the username you configured in your subsystem database backup settings has write permission to the specified backup path.
- Remote backup server storage full. Check your remote backup server and if the storage is full either extend the storage space or delete older backups.
- No network access to the remote backup server. Check that you can communicate with your remote backup server from your API Connect environment.
- TLS handshaking failure with the remote backup server. If your remote backup server has a self-signed CA certificate, check that this certificate is trusted by your API Connect deployment, see the database backup configuration steps for your subsystem for more information.
Backup or restore CR stuck in running or in a failed state.
Describe the backup or restore CR for more information on the
problem:
kubectl -n <namespace> describe backup backup-1700143802
Name: management-1700143800
Namespace: apic
Labels: k8s.enterprisedb.io/cluster=management-dc1-db
k8s.enterprisedb.io/immediateBackup=false
k8s.enterprisedb.io/scheduled-backup=management
Annotations: <none>
API Version: postgresql.k8s.enterprisedb.io/v1
Kind: Backup
Metadata:
Creation Timestamp: 2023-11-16T14:10:00Z
Generation: 1
Resource Version: 2316802
UID: c7a72883-1f6f-44a7-b360-c18f8c3ba24d
Spec:
Cluster:
Name: management-dc1-db
Status:
Backup Id: 20231116T141005
Backup Name: backup-1700143802
Begin LSN: 0/D000028
Begin Wal: 00000001000000000000000D
Destination Path: s3://ent-edb-bnr/2dcdr-mgmt-active-1538
End LSN: 0/D01AB58
End Wal: 00000001000000000000000D
Instance ID:
Container ID: cri-o://13f5f44d0b84cb18b8a88b0f747f4c95f24a5e2c1423f5ec0a8615463874ffa2
Pod Name: management-dc1-db-1
Phase: completed
s3Credentials:
Access Key Id:
Key: key
Name: mgmt-backup-secret
Region:
Key: region
Name: mgmt-backup-secret
Secret Access Key:
Key: keysecret
Name: mgmt-backup-secret
Server Name: management-dc1-db-2023-11-16T10:56:39Z
Started At: 2023-11-16T14:10:05Z
Stopped At: 2023-11-16T14:10:29Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 14m cloud-native-postgresql-backup Starting backup for cluster management-dc1-db
Normal Starting 14m instance-manager Backup started
Normal Completed 13m instance-manager Backup completed
Unable to find database backup CRs
If kubectl get backup
does not return the backup that you want to restore, then
search for the backup on your SFTP server or object-store.
The path to API Connect management database backups on the remote SFTP server or object-store has the following format:
<backup path>/<mgmt db cluster name>-<time when db was created>/base/<backup ID>
- <backup path> is the path that is defined in your management CR: backup
settings. The format is
bucket_name/folder
.If your API Connect deployment was upgraded from v10.0.6.0 or earlier, then
/edb
is appended to the path. - <mgmt db cluster name> is the name of your management database cluster.
Identify this name
with:
See get cluster name.kubectl -n <management namespace> get cluster -n active
- <time when db was created> is a timestamp of when the management subsystem
database was created. The format is:
2023-12-25T00:00:00Z
.Whenever you do a management database restore, a new management database is created, and so a new directory is created where subsequent database backups are stored.
- <backup ID> is the ID of a management database backup. The format is
YYYYMMDDTHHMMSS
. This is a directory that contains all the files that comprise the database backup.
s3bucket1/folder2/mgmt-db-2023-12-25T00:00:00Z/base/20231225T094400
With the information identified, you can restore the backup by using the Restoring the management database with a backup ID method.
Error when restoring a management subsystem with the API discovery service
If you enable the API
discovery service, the backups
that you take can be restored only onto a management subsystem that also has the API
discovery service
enabled. If you try to
restore a backup onto a management subsystem that doesn't have the API
discovery service enabled, the
database recovery jobs will go into an error state, and the logs will contain the following
tablespaces directory
error:
[Errno 30] Read-only file system: '/var/lib/postgresql/tablespaces'
The
status of the management job pods will look like the following
output:management-apim-restore-1bc41b74-sjzzp 0/1 Completed 0 3h28m
management-bf631b70-db-1-full-recovery-4lbdv 0/1 Error 0 11m
management-bf631b70-db-1-full-recovery-725l9 0/1 Error 0 13m
management-bf631b70-db-1-full-recovery-8mhdk 0/1 Error 0 103s
In
this case, you must take the following steps.- Update the
ManagementCluster
CR to enable the API discovery service:spec: ... discovery: enabled: true proxyCollectorEnabled: true
- Apply the updated CR by running the following
command:
Where management_namespace is the name of the target installation namespace in the Kubernetes cluster.kubectl apply -f management_cr.yaml -n <management_namespace>
- Find the name of the EDB PostgreSQL cluster, by running the following
command:
The output will look similar to the following output:kubectl get cluster -n <namespace>
NAME AGE INSTANCES READY STATUS PRIMARY management-bf631b70-db 13m 1 Setting up primary
- Wait until the EDB PostgreSQL cluster has been updated with the
tablespaces
configuration. You can check this status by running the following command:
Look in the output for thekubectl get cluster <cluster-name> -o yaml -n <namespace>
tablespaces
entry, which looks like the following example:
Note that the cluster status will still betablespaces: - name: apidiscoverysvc owner: name: apicuser storage: resizeInUseVolumes: false size: 10Gi storageClass: local-storage temporary: false
Setting up primary
at this point. - Delete the EDB PostgresSQL cluster by running the following
command:
kubectl delete cluster <cluster-name> -n <namespace>
- After a few moments the EDB PostgreSQL cluster will be recreated by the IBM®
API Connect operator. You can
check this by running the following
command:
kubectl get cluster -n <namespace>
- The database recovery jobs will eventually be recreated, and succeed. You can then verify that
the PVCs for the
tablespaces
have been recreated by running the following command:
The output of this command will show that there are one or more PVCs present, with a name similar tokubectl get pvc -n <namespace>
management-bf631b70-db-1-tbs-apidiscoverysvc
. - The API discovery service pods will now start, and the restore operation will complete.