You can recover the management subsystem from SFTP backups after a disaster
event.
Before you begin
To successfully recover the management subsystem, you must have
previously completed the steps in Preparing the management subsystem for disaster recovery.Important: Successful disaster recovery depends on recovery of both the Management subsystem
and the Developer Portal
subsystem. You must complete preparation steps for both subsystems in order to achieve disaster
recovery. If you have to perform a restore, you must complete the restoration of the Management
Service first, and then immediately restore the Developer Portal. Therefore, the backups of the
Management and Portal must be taken at the same time, to ensure that the Portal sites are consistent
with Management database.
About this task
To recover from a disaster event, you must create a new IBM API Connect installation with a
running IBM API Connect Operator.
Note: Limitation for backups created on Version 10.0.2
- If restoring a Version 10.0.2 backup onto a new Version 10.0.2 deployment, performing a restore
may not work if the subsystem CR name exceeds 15 characters. This limitation applies only to
restoring onto Version 10.0.2.
- Restoration is supported for Version 10.0.2 backups onto a Version 10.0.3.0 or later deployment
when subsystem name exceeds 15 characters, as long as the correct
spec.originalUID
is specified upon restore. See Step 3.f.
Procedure
- Determine which backup to restore from.
View the SFTP
backups available on your remote storage site:
-rw-r--r-- 1 root root 13092333 Aug 26 08:56 20200826-154646F.tgz
-rw-r--r-- 1 root root 18703758 Aug 26 09:10 20200826-160010F.tgz
-rw-r--r-- 1 root root 24318561 Aug 26 09:21 20200826-161301F.tgz
Take note of the backup ID of the backup you wish to restore to. Each filename contains the date,
time, and type of the backup stored. This will be used later in the procedure.
The format of the
backup ID is
YYYYMMDD-HHMMSS<F|I>
. For example, if we want to use the Aug 26,
09:21 backup, it's backup ID will be
20200826-161301F
- Incremental backups are denoted with a suffix
I
on the ID, ensure each
incremental backup has it's prior backup also present in storage. You can check this by examining
the ID <prior-backup-id>_<backup-id>
.
- Full backups are denoted with a suffix
F
on the ID.
- Make sure you know the management database cluster name.
You can get this name from the original management subsystem CR. You made note of this name in
Step 2.e in Preparing the management subsystem for disaster recovery.
If you are not able to recover the original management subsystem CR, you can also recover the
Management database cluster name and siteName by examining the SFTP backup tar:
- Download or move the SFTP backup tar file and decompress (untar) it.
- Open
<management-subsystem-name>-<siteName>-postgres-backrest-shared-repo/backup/db/<backup-id>/pg_data/postgresql.conf.gz
which contains the management subsystem name and siteName. For
example:# Do not edit this file manually!
# It will be overwritten by Patroni!
include 'postgresql.base.conf'
archive_command = 'source /opt/cpm/bin/pgbackrest/pgbackrest-set-env.sh && pgbackrest archive-push "%p"'
archive_mode = 'True'
archive_timeout = '60'
autovacuum_vacuum_cost_limit = '1000'
autovacuum_vacuum_scale_factor = '0.01'
cluster_name = 'm1-f785a3e3-postgres'
In this example:
cluster_name
has both the management subsystem name and siteName
m1
- Management subsystem name
f785a3e3
- site name
- Before installing the replacement management subsystem CR:
- Apply the YAML file that contains the Management Database Encryption Secret into the cluster.
For example, where
encryption-bin-secret.yaml
is the local YAML file containing the
backup-up encryption
secret:kubectl create -f encryption-bin-secret.yaml -n <namespace>.
Replace
<namespace>
with the namespace being used for the management
subsystem installation.
This command re-creates the original Management Database encryption
secret on the cluster. It will be named as the original name of the secret.
- Add the following
encryptionSecret
subsection to the spec
of
the Management CR. For example, if management-enc-key
is the name of the newly
created secret on the cluster containing the original Management Database encryption secret from the
previous step: encryptionSecret:
secretName: management-enc-key
- For each of the saved YAML Files that contain the Management Client
Application Credential Secrets, apply each file into the cluster using the following
command:
kubectl create -f <secret_name>.yaml -n <namespace>
where
<secret_name>
is the local YAML file containing one of the backed-up Credential
Secrets.
Repeat this for each of the backed-up Credential Secrets. These are the secrets you
saved in Step 2.b in
Preparing the management subsystem for disaster recovery.
These commands will re-create the original
Management Client Application Credential Secrets on the cluster. Each will be named as the original
name of the Secret.
- Add the following
customApplicationCredentials
subsection to the
spec
subsection of the Management CR:
customApplicationCredentials:
- name: atm-cred
secretName: management-atm-cred
- name: ccli-cred
secretName: management-ccli-cred
- name: cli-cred
secretName: management-cli-cred
- name: cui-cred
secretName: management-cui-cred
- name: dsgr-cred
secretName: management-dsgr-cred
- name: juhu-cred
secretName: management-juhu-cred
- name: ui-cred
secretName: management-ui-cred
For
each named credential above, the secretName
is given as the corresponding name of
the newly created secret from Step 3.c.
- Add the
siteName
property to the spec
of the Management
CR.For example, if a2a5e6e2
is the original siteName
that was
noted after the installation of the original Management Subsystem:
siteName: a2a5e6e2
- Version 10.0.3.0 or later: Add the
originalUID:
property to the spec
of the Management CR.When recreating a system, to restore a
backup into it, you must specify the same spec.originalUID
in the CR as was present
in the system that was backed up. If the spec.originalUID
in the new CR for
recreating the system does not match the spec.originalUID
that was present in the
system that was backed up, the restore will fail.
spec:
originalUID: "fa0f6f49-b931-4472-b84d-0922a9a92dfd"
Note:
- For Version 10.0.3.0 or later, if you do not specify
spec.originalUID
in the
new CR, the operator automatically sets the Management CR value of spec.originalUID
to match the new CR value metadata.uid
. In this case, the restore will fail because
the spec.originalUID
in the saved (backed-up) CR does not match
spec.originalUID
in the new CR.
- The originalUID is only essential when the subsystem CR name exceeds 15 characters in length, or
10 characters limit for the API Connect Cluster CR. Recommended practice is that all backups should
include the originalUID for Management.
- See also Step 2.e in
Preparing the management subsystem for disaster recovery.
- Verify that the name of the management subsystem in the CR matches with the old management
subsystem name, as described in Step 2.e in Preparing the management subsystem for disaster recovery.
- Install the Management subsystem CR with the values obtained in Step
2.e in Preparing the management subsystem for disaster recovery..
Important: The hostnames of the endpoints cannot be changed, and must remain the same in
the Management CR YAML file used for installation now as they were for the original
installation.
To review installation of the management subsystem, see Installing the Management subsystem cluster.
- Get a list of available backups and confirm that the backup ID noted in
Step 1 is in the backup
list.
The API Connect Operator automatically reads backups from configured remote storage and populates
the list of available backups we can restore to.
$ kubectl get mgmtb
NAME STATUS ID CLUSTER SUBSYSTEM TYPE CR TYPE AGE
mgmt-backup-4z87f Complete 20200606-145011F m1-82b290a2-postgres m1 full record 11m
mgmt-backup-6bms2 Complete 20200606-144315F m1-82b290a2-postgres m1 full record 11m
- Perform a Management Restore using the name of the backup that has the ID you want to
restore. For example, as shown in Step 5, for ID
20200606-145011F
the backup name is
mgmt-backup-4z87f
.
- Use the following command to check the status of the restore:
kubectl get mgmtr -n <namespace>
Once the Management Restore has completed and the database is running again, the data of the old
Management subsystem will be successfully restored onto the new Management subsystem. Manual and
scheduled backups should perform as normal once again.
What to do next
You should now complete the recovery steps
for the Developer Portal
subsystem on Kubernetes, see Recovering the Developer Portal after a disaster.