Recover the management subsystem from SFTP backups after a disaster event on
10.0.1.1-eus
Before you begin
To successfully recover the management subsystem, you must have
previously completed the steps in Preparing the management subsystem for disaster recovery on VMware using SFTP backups (10.0.1.1-eus).You must use the same
project directory that you used for your original deployment, to ensure that configuration and
secret information is transferred to the replacement deployment.
Important: Successful disaster recovery depends on recovery of both the
Management subsystem and the Developer Portal
subsystem. You must complete preparation steps for both subsystems in order to achieve disaster
recovery. If you have to perform a restore, you must complete the restoration of the Management
Service first, and then immediately restore the Developer Portal. Therefore, the backups of the
Management and Portal must be taken at the same time, to ensure that the Portal sites are consistent
with Management database.
Procedure
- Determine which backup to restore from.
View the SFTP
backups available on your remote storage site:
-rw-r--r-- 1 root root 13092333 Aug 26 08:56 20200826-154646F.tgz
-rw-r--r-- 1 root root 18703758 Aug 26 09:10 20200826-160010F.tgz
-rw-r--r-- 1 root root 24318561 Aug 26 09:21 20200826-161301F.tgz
Take note of the backup ID of the backup you wish to restore to. Each filename contains the date,
time, and type of the backup stored. This will be used later in the procedure.
The format of the
backup ID is
YYYYMMDD-HHMMSS<F|I>
. For example, if we want to use the Aug 26,
09:21 backup, it's backup ID will be
20200826-161301F
- Incremental backups are denoted with a suffix
I
on the ID, ensure each
incremental backup has it's prior backup also present in storage. You can check this by examining
the ID <prior-backup-id>_<backup-id>
.
- Full backups are denoted with a suffix
F
on the ID.
- Make sure you know the management database cluster name.
You can get this name from the original management subsystem CR. You made note of this name in
Step 2.e in Preparing the management subsystem for disaster recovery on VMware using SFTP backups (10.0.1.1-eus).
If you are not able to recover the original management subsystem CR, you can also recover the
Management database cluster name and siteName by examining the SFTP backup tar:
- Download or move the SFTP backup tar file and decompress (untar) it.
- Open
<management-subsystem-name>-<siteName>-postgres-backrest-shared-repo/backup/db/<backup-id>/pg_data/postgresql.conf.gz
which contains the management subsystem name and siteName. For
example:# Do not edit this file manually!
# It will be overwritten by Patroni!
include 'postgresql.base.conf'
archive_command = 'source /opt/cpm/bin/pgbackrest/pgbackrest-set-env.sh && pgbackrest archive-push "%p"'
archive_mode = 'True'
archive_timeout = '60'
autovacuum_vacuum_cost_limit = '1000'
autovacuum_vacuum_scale_factor = '0.01'
cluster_name = 'm1-f785a3e3-postgres'
In this example:
cluster_name
has both the management subsystem name and siteName
m1
- Management subsystem name
f785a3e3
- site name
- From your new project directory, install the Management subsystem as per Deploying the Management subsystem
Important:
- The hostnames of the Endpoints cannot be changed, and must remain the same in the Management
subsystem used for installation now as they were for the original installation.
- The Management subsystem name must remain the same in the new installation as the original.
- Before you restore onto new management subsystem, complete the following steps:
- SSH onto your Management appliance and remove the Management subsystem if it is already
present.
kubectl get managementcluster
NAME READY STATUS VERSION RECONCILED VERSION AGE
management 16/16 Running 10.0.1.1 10.0.1.1-706 5h28m
kubectl delete managementcluster management
managementcluster.management.apiconnect.ibm.com "management" deleted
- Apply the YAML file that contains the Management Database Encryption Secret into the cluster.
For example, where
encryption-bin-secret.yaml
is the local YAML file containing the
backup-up encryption secret:kubectl create -f encryption-bin-secret.yaml -n <namespace>.
Replace
<namespace>
with the namespace being used for the management
subsystem installation.
This command re-creates the original Management Database encryption
secret on the cluster. It will be named as the original name of the secret.
- For each of the saved YAML Files that contain the Management Client
Application Credential Secrets, apply each file into the cluster using the following
command:
kubectl create -f <secret_name>.yaml -n <namespace>
where
<secret_name>
is the local YAML file containing one of the backed-up Credential
Secrets.
Repeat this for each of the backed-up Credential Secrets. These are the secrets you
saved in Step 2.b of Preparing the management subsystem for disaster recovery using SFTP backups.
These commands will re-create the original
Management Client Application Credential Secrets on the cluster. Each will be named as the original
name of the Secret.
- In your project directory, create the
management-extra-values.yaml
file for the
Management subsystem.
- Add the
encryptionSecret
subsection to the spec
, which is the
name of the newly created secret on the cluster containing the original Management Database
Encryption Secret from the previous step.
- Add the
siteName
property to the spec
, where in this example,
82b290a2
is the original siteName
that was noted after the
installation of the original Management Subsystem
- Add the
customApplicationCredentials
subsection to the spec
.
For each named credential above, the secretName
is given as the corresponding name
of the newly created Secret from Step 4.c above for each of the above Credential Secrets.
spec:
customApplicationCredentials:
- name: atm-cred
secretName: management-atm-cred
- name: ccli-cred
secretName: management-ccli-cred
- name: cli-cred
secretName: management-cli-cred
- name: cui-cred
secretName: management-cui-cred
- name: dsgr-cred
secretName: management-dsgr-cred
- name: juhu-cred
secretName: management-juhu-cred
- name: ui-cred
secretName: management-ui-cred
encryptionSecret:
secretName: management-enc-key
siteName: 82b290a2
- Set the
management-extra-values
file in your Management subsystem
apicup subsys set [SUBSYS_NAME] extra-values-file management-extra-values.yaml
- Prepare your backup configuration before fresh install.
The following settings can be configured via apicup subsys set [SUBSYS_NAME]
[setting]=[value]
:
database-backup-auth-pass
database-backup-auth-user
database-backup-host
database-backup-path
database-backup-port 22 (default)
database-backup-protocol sftp (default)
database-backup-retries 0 (default)
database-backup-schedule 0 0 * * * (default)
You can view all the management subsystem settings with the command:
apicup subsys get[SUBSYS_NAME] --validate
To review backup configuration, see Configuring backup settings during initial installation of the management subsystem (10.0.1.1-eus or greater).
- Install the Management subsystem using
apicup
with the flag
--skip-health-check
:
apicup subsys install [SUBSYS_NAME] --skip-health-check
- Once your Management subsystem is ready, confirm the backup ID noted in Step 1 is present on the sftp
server.
- After a few moments, confirm there is a
ManagementBackup
of type
record
and it's backup ID matches with the backup ID noted in Step 1.
You can list the management backups
using:
apicup subsys list-backups [SUBSYS_NAME]
For example:
NAME STATUS ID CLUSTER SUBSYSTEM TYPE CR TYPE AGE
mgmt-backup-8hqqg Ready 20200826-161301F management-82b290a2-postgres management full record 40s
- Perform a Management Subsystem Restore using the name of the backup CR that has the ID
you want to restore.
For example, for ID 20200826-161301F
the backup CR name is
mgmt-backup-8hqqg
.
For instructions on how to restore, see Restoring the management subsystem.
Once the Management Restore has completed and the database is running again, the data of the old
Management subsystem will be successfully restored onto the new Management subsystem. Manual and
scheduled backups should perform as normal once again.
What to do next
You should now complete the recovery steps
for the Developer Portal
subsystem on VMware, see Recovering the Developer Portal subsystem on VMware.