Recover the management subsystem from backups after a disaster event.
Before you begin
To successfully recover the management subsystem, you must have
previously completed the steps in Preparing the management subsystem for disaster recovery on VMware.You must use the same
project directory that you used for your original deployment, or a restore of your project directory
backup, to ensure that configuration and secret information is transferred to the replacement
deployment.
In a clustered deployment, if any one VM is corrupted then all of the VMs in the
cluster must be redeployed. You cannot replace just a single corrupted VM in a cluster.
Important: Successful disaster recovery depends on recovery of both the
Management subsystem and the Developer Portal
subsystem. You must complete preparation steps for both subsystems in order to achieve disaster
recovery. If you have to perform a restore, you must complete the restoration of the Management
Service first, and then immediately restore the Developer Portal. Therefore, the backups of the
Management and Portal must be taken at the same time, to ensure that the Portal sites are consistent
with Management database.
Procedure
- Determine which backup to restore from.
- Obtain a list of the available backups for your backup type:
- s3 backups
- For IBM®'s Cloud Object Storage, you can check currently stored backups in the COS console. Select Buckets > Objects. See the Object Names
displayed on the <cos_name>/backup/db panel. For
example:
20200605-105429F
20200605-100008F
20200605-11040F
-
For Amazon's AWS, you can check currently stored backups in the S3 console For example, under
Amazon S3 > cluster_name > old cluster > backup > db:
20200606-144315F
20200606-145011F
- SFTP backups
View the SFTP backups available on your remote storage site:
-rw-r--r-- 1 root root 13092333 Aug 26 08:56 20200826-154646F.tgz
-rw-r--r-- 1 root root 18703758 Aug 26 09:10 20200826-160010F.tgz
-rw-r--r-- 1 root root 24318561 Aug 26 09:21 20200826-161301F.tgz
- Select the backup ID of the backup you want to restore. For example, in the sample SFTP backup
list, for the Aug 26 09:21 backup, the backup ID is
20200826-161301F
.Each
filename contains the date, time, and type of the backup stored. The format of the backup ID is
YYYYMMDD-HHMMSS<F|I>
. Full backups are denoted with a suffix F
on the ID. Incremental backups are denoted with a suffix I
on the ID. For
incremental backups, ensure each incremental backup has its prior full backup also present in
storage. You can check this by examining the ID
<prior-backup-id>_<backup-id>
.
Note:
During the disaster recovery process, the S3 configuration detail of the older management system
is used, but the older management system must be in offline mode. The old subsystem must be offline
because you cannot have two management systems simultaneously using the same s3 bucket name in the
database backup configurations.
- Make sure you know the management database cluster name. Use the following steps
applicable to your backup type:
Backup type |
How to obtain database cluster name |
s3 |
- Open your IBM Cloud® Object Storage or AWS S3 console and proceed to the bucket location where the old Management subsystem backups are located.
- Download
backup/db/<backup-id>/pg_data/postgresql.conf.gz . Open
postgresql.conf to view the database cluster name:
|
SFTP |
- Recover the Management database cluster name and siteName by examining the SFTP backup tar.
Download or move the SFTP backup tar file and decompress (untar) it.
- Open
<management-subsystem-name>-<siteName>-postgres-backrest-shared-repo/backup/db/<backup-id>/pg_data/postgresql.conf.gz
which contains the management subsystem name and siteName. For
example:# Do not edit this file manually!
# It will be overwritten by Patroni!
include 'postgresql.base.conf'
archive_command = 'source /opt/cpm/bin/pgbackrest/pgbackrest-set-env.sh && pgbackrest archive-push "%p"'
archive_mode = 'True'
archive_timeout = '60'
autovacuum_vacuum_cost_limit = '1000'
autovacuum_vacuum_scale_factor = '0.01'
cluster_name = 'm1-f785a3e3-postgres'
In this example:
cluster_name has both the management subsystem name and siteName
m1 - Management subsystem name
f785a3e3 - site name
|
- Use your prior existing project directory, or a restore of your project directory backup,
to install the Management subsystem:
- Create your ISO files
apicup subsys install mgmt --out mgmtplan-out
The
--out
parameter and value are required.
In this example, the ISO files are
created in the
myProject/mgmtplan-out directory.
Note: If your original ISO files are still available and you haven't upgraded from the
original installation, you can reuse them. However, if you have upgraded your original deployment,
you must create new ISO files using the version of apicup
that corresponds to the
version your API Connect
installation was on at the time of the disaster. For example, do not attempt to deploy v10.0.5.1
OVAs with ISO files that were created with apicup
v10.0.4.0.
- Deploy the files into the replacement VMs. See Deploying the Management subsystem OVA file.
- Verify the deployment. See Verify installation of the Management subsystem.
Important:
For S3, the recovery remains in an intermediate state until the restore is complete, and Postgres
wal files might cause serious disk issues. To avoid this possibility, continue immediately with the
next step.
Note that if you delay completion of the restore:
- Health check might fail. In this case, you can still proceed to the next step and perform a
restore.
- Postgres wal files might cause problems by consuming all disk space. In this case, you must either:
- Re-install the system, prepare again for disaster recovery, and perform the restore.
- Or increase disk space so that the system returns to a stable state, and then proceed with the
restore.
- Once your Management subsystem is ready, confirm the backup ID noted in Step 1 is present on the
sftp or s3 server.
- After a few moments, confirm there is a
ManagementBackup
of type
record
and its backup ID matches with the backup ID noted in Step 1.
You can list the management backups
using:
apicup subsys list-backups <subsystem_name>
For example:
NAME STATUS ID CLUSTER SUBSYSTEM TYPE CR TYPE AGE
mgmt-backup-8hqqg Ready 20200826-161301F management-82b290a2-postgres management full record 40s
- Perform a Management Restore using the name of the backup that has the ID you want to
restore.
For example, for ID 20200826-161301F
the backup name is
mgmt-backup-8hqqg
.
For instructions on how to restore, see Restoring the management subsystem.
Once the Management Restore has completed and the database is running again, the data of the old
Management subsystem will be successfully restored onto the new Management subsystem. Manual and
scheduled backups should perform as normal once again.
What to do next
You should now complete the recovery steps
for the Developer Portal
subsystem on VMware, see Recovering the Developer Portal subsystem on VMware.