Recover the management subsystem from backups after a disaster event.
Before you begin
To successfully recover the management subsystem, you must have
previously completed the steps in Preparing the management subsystem for disaster recovery on VMware.You must use the same
project directory that you used for your original deployment, or a restore of your project directory
backup, to ensure that configuration and secret information is transferred to the replacement
deployment.
In a clustered deployment, if any one VM is corrupted then all of the VMs in the
cluster must be redeployed. You cannot replace just a single corrupted VM in a cluster.
Important: Successful disaster recovery depends on recovery of both the
Management subsystem and the Developer Portal
subsystem. You must complete preparation steps for both subsystems in order to achieve disaster
recovery. If you have to perform a restore, you must complete the restoration of the Management
Service first, and then immediately restore the Developer Portal. Therefore, the backups of the
Management and Portal must be taken at the same time, to ensure that the Portal sites are consistent
with Management database.
Procedure
- Determine which backup to restore from.
- Obtain a list of the available backups for your backup type:
- s3 backups
- For IBM's Cloud Object Storage, you can check currently stored backups in the COS console.
Select Buckets > Objects. See the Object Names
displayed on the <cos_name>/backup/db panel. For
example:
20200605-105429F
20200605-100008F
20200605-11040F
-
For Amazon's AWS, you can check currently stored backups in the S3 console For example, under
Amazon S3 > cluster_name > old cluster > backup > db:
20200606-144315F
20200606-145011F
- SFTP backups
View the SFTP backups available on your remote storage site:
-rw-r--r-- 1 root root 13092333 Aug 26 08:56 20200826-154646F.tgz
-rw-r--r-- 1 root root 18703758 Aug 26 09:10 20200826-160010F.tgz
-rw-r--r-- 1 root root 24318561 Aug 26 09:21 20200826-161301F.tgz
- Select the backup ID of the backup you want to restore. For example, in the sample SFTP backup
list, for the Aug 26 09:21 backup, the backup ID is
20200826-161301F
.Each
filename contains the date, time, and type of the backup stored. The format of the backup ID is
YYYYMMDD-HHMMSS<F|I>
. Full backups are denoted with a suffix F
on the ID. Incremental backups are denoted with a suffix I
on the ID. For
incremental backups, ensure each incremental backup has its prior full backup also present in
storage. You can check this by examining the ID
<prior-backup-id>_<backup-id>
.
- Make sure you know the management database cluster name. You can get this name from the
original management subsystem CR.
If you are not able to recover the original management subsystem CR, you can obtain the name from
your backup configuration. Use the following steps applicable to your backup type:
Backup type |
How to obtain database cluster name |
s3 |
- Open your IBM Cloud Object Storage or AWS S3 console and proceed to the bucket location where
the old Management subsystem backups are located.
- Download
backup/db/<backup-id>/pg_data/postgresql.conf.gz . Open
postgresql.conf to view the database cluster name:
|
SFTP |
- Recover the Management database cluster name and siteName by examining the SFTP backup tar.
Download or move the SFTP backup tar file and decompress (untar) it.
- Open
<management-subsystem-name>-<siteName>-postgres-backrest-shared-repo/backup/db/<backup-id>/pg_data/postgresql.conf.gz
which contains the management subsystem name and siteName. For
example:# Do not edit this file manually!
# It will be overwritten by Patroni!
include 'postgresql.base.conf'
archive_command = 'source /opt/cpm/bin/pgbackrest/pgbackrest-set-env.sh && pgbackrest archive-push "%p"'
archive_mode = 'True'
archive_timeout = '60'
autovacuum_vacuum_cost_limit = '1000'
autovacuum_vacuum_scale_factor = '0.01'
cluster_name = 'm1-f785a3e3-postgres'
In
this example:
cluster_name has both the management subsystem name and siteName
m1 - Management subsystem name
f785a3e3 - site name
|
- Run the commands in
apicup-commands.txt
to restore the saved (extracted)
data:
You created apicup-commands.txt
when you ran dr-preparation.py
as part of Preparing the management subsystem for disaster recovery on VMware. The commands in the file create the necessary
secrets
Be sure to:
- Use an
apicup
binary from the version of API Connect that you want to restore
to.
- Use your prior existing project directory (or a restore of your project directory backup). The
directory contains
apiconnect-up-v10.yml
, which has your configuration.
When the commands in apicup-commands.txt
complete successfully, the Management
subsystem configuration is complete.
- Use your prior existing project directory (or a restore of your project directory backup)
to install the Management subsystem:
- Create your ISO files
apicup subsys install mgmt --out mgmtplan-out
The
--out
parameter and value are required.
In this example, the ISO file is created
in the
myProject/mgmtplan-out directory.
Note: If your original ISO files are still available, and you haven't upgraded from
the original installation, you can reuse your original files. However, if you have upgraded your
original deployment, you must create your ISO files by using the version of apicup
that corresponds to the version of API Connect that you want to
restore to.
- Deploy the files into the replacement VMs. See Deploying the management subsystem OVA file.
- Verify the deployment. See Verify installation of the management subsystem.
Important:
For S3, the recovery remains in an intermediate state until the restore is complete, and Postgres
wal files might cause serious disk issues. To avoid this possibility, continue immediately with the
next step.
Note that if you delay completion of the restore:
- Health check might fail. In this case, you can still proceed to the next step and perform a
restore.
- Postgres wal files might cause problems by consuming all disk space. In this case, you must either:
- Re-install the system, prepare again for disaster recovery, and perform the restore.
- Or increase disk space so that the system returns to a stable state, and then proceed with the
restore.
- Once your Management subsystem is ready, confirm the backup ID noted in Step 1 is
present on the sftp or s3 server.
- After a few moments, confirm there is a
ManagementBackup
of type
record
and it's backup ID matches with the backup ID noted in Step 1.
You can list the management backups
using:
apicup subsys list-backups <subsystem_name>
For example:
NAME STATUS ID CLUSTER SUBSYSTEM TYPE CR TYPE AGE
mgmt-backup-8hqqg Ready 20200826-161301F management-82b290a2-postgres management full record 40s
- Perform a Management Restore using the name of the backup CR that has the ID you want to
restore.
For example, for ID 20200826-161301F
the backup CR name is
mgmt-backup-8hqqg
.
For instructions on how to restore, see Restoring the management subsystem.
Once the Management Restore has completed and the database is running again, the data of the old
Management subsystem will be successfully restored onto the new Management subsystem. Manual and
scheduled backups should perform as normal once again.
What to do next
You should now complete the recovery steps
for the Developer Portal
subsystem on VMware, see Recovering the Developer Portal subsystem on VMware.