Recovering from a disaster on OpenShift
Install and configure API Connect, then restore backed up data to recover the deployment after a disaster.
Before you begin
To successfully recover API Connect from a disaster, you must have previously completed the steps to prepare the Management, Portal, and Analytics subsystems for disaster recovery. Make sure that you can access the backups and deployment information because you cannot recover your deployment without it.
About this task
To recover from a disaster event, recreate your API Connect Cluster CR and add recovery details as explained in the following procedure. Use the file containing the original CR (which you saved during the preparation task) as a reference.
Procedure
-
Make a local copy of the apiconnectcluster_cr.yaml template custom
resource so that you can customize it for your deployment.
Template files are stored in helper_files.zip. For both OpenShift and CP4I, you can configure settings directly in the YAML file as explained in this task. In CP4I, you will then copy the YAML code and paste it in using the YAML tab in Platform Navigator.
-
Configure backup and restore settings for each subsystem.
Remember to create the backup secret for each subsystem as well, using the same name that you used in the original deployment (you can see the name in the
credentials
setting of each subsystem's backup section). The backup and restore settings, including the secrets, must exactly match the settings that you created in the original deployment. See the following topics for instructions on configuring backup and restore settings:- Configuring backup settings for a fresh install of the Management subsystem on OpenShift or Cloud Pak for Integration
- Configuring backups for Developer Portal on OpenShift and Cloud Pak for Integration
- Configuring backup settings for Analytics on OpenShift and Cloud Pak for Integration
Note:During the disaster recovery process, the S3 configuration detail of the older management system is used, but the older management system must be in offline mode. The old subsystem must be offline because you cannot have two management systems simultaneously using the same s3 bucket name in the database backup configurations.
-
Apply the Management and Portal encryption secrets to the cluster.
-
Apply the YAML file that contains the Management database encryption secret into the
cluster.
Run the following command, using the filename of the encryption secret that you saved when you prepared the Management subsystem for disaster recovery:
oc -n <APIC_namespace> apply -f mgmt-enc-key.yaml
-
Apply the YAML file that contains the Portal encryption secret into the cluster.
Run the following command, using the filename of the encryption secret that you saved when you prepared the Portal subsystem for disaster recovery:
oc -n <APIC_namespace> apply -f <portal-enc-key>.yaml
-
Apply the YAML file that contains the Management database encryption secret into the
cluster.
-
Important: Make sure that your new version of the API Connect Cluster installation CR uses the
same value for the
name
setting as the original CR.Edit the YAML file that you copied from the template in the helper_files.zip. In the following steps, you will configure additional deployment and recovery settings in the new API Connect Cluster installation CR before you install API Connect.
-
Add the Management and Portal encryption secrets to the API Connect Cluster CR.
Add each secret to the appropriate
spec.subsystem
section of the CR, as shown in the following example:spec: management: encryption_secret: secret_name: mgmt-enc-key portal: encryption_secret: secret_name: portal-enc-key
-
Apply each of the Management client application credential secrets to the cluster.
Run the following command to apply each secret, using the filename of each client application credential secret that you saved when you prepared the Management subsystem for disaster recovery:
oc -n <APIC_namespace> apply -f <secret_name>.yaml
For example, a deployment might use the following names for the secrets:
atmCredentialSecret: management-atm-cred consumerToolkitCredentialSecret: management-ccli-cred consumerUICredentialSecret: management-cui-cred designerCredentialSecret: management-dsgr-cred juhuCredentialSecret: management-juhu-cred toolkitCredentialSecret: management-cli-cred uiCredentialSecret: management-ui-cred
-
Add the credential secrets to the
spec.management
section of API Connect Cluster CR.For example:
spec: management: customApplicationCredentials: - name: atm-cred - secretName: management-atm-cred - name: ccli-cred - secretName: management-ccli-cred - name: cui-cred - secretName: management-cui-cred - name: dsgr-cred - secretName: management-dsgr-cred - name: juhu-cred - secretName: management-juhu-cred - name: cli-cred - secretName: management-cli-cred - name: ui-cred - secretName: management-ui-cred
-
Add the
siteName
property to thespec.management
andspec.portal
sections of the API Connect Cluster CR.For example, if the
a2a5e6e2
is the originalsiteName
of the Management subsystem, and890772e3
is the originalsiteName
of the Developer Portal subsystem, the CR looks like the following example:spec: management: siteName: "a2a5e6e2" portal: siteName: "890772e3"
-
If your original CR
name
was 10 characters or longer, you must add theoriginalUID
ormetadata.uid
value from the original Management and Portal subsystems into the appropriate sections of the installation CR.Important: If your original CR name was 10 characters or longer and you do not add this value, the deployments and routes that are generated during installation will include a different UID, which will cause the recovery process to fail.The IDs were created with the original deployment of the Management and Portal subsystems, and the recovered subsystems must continue to use the same IDs. You can locate the ID in the Management (apic-cluster-name-mgt.yaml) or Portal (apic-cluster-name-ptl.yaml) CR that you saved while preparing for disaster recovery. (Do not use the
metadata.uid
setting from the API Connect Cluster CR).The CR setting that contains the ID depends on the version of API Connect that you are recovering:
- Version 10.0.1.4-ifix1-eus or later:
spec.originalUID
- Version 10.0.1.2-ifix2-eus or earlier:
metadata.uid
Locate each of the original settings in the appropriate subsystem CR, and copy them to corresponding sections of the new API Connect Cluster CR using the same key and value. For example, if you originally deployed version 10.0.1.2-ifix2-eus, the setting name was
uid
. If you are deploying Version 10.0.1.5-eus as part of disaster recover, copy the original value and use it for theoriginalUID
setting in the new CR. For example, in 10.0.1.5-eus:spec: management: originalUID: "fa0f6f49-b931-4472-b84d-0922a9a92dfd" portal: originalUID: "447ea4b3-9514-4a84-a34b-ce0b349838da"
- Version 10.0.1.4-ifix1-eus or later:
- If you installed the API Connect using an S3 provider for the Management
subsystem backups, add the following annotation to the
apiconnectcluster
CR.If you specified SFTP or local as the backup, skip this step.
- Using the Platform UI:
- Edit the API Management instance and click the YAML tab to edit the CR.
- Add the following statement to the
spec.metadata.annotations
section:apiconnect-operator/deployment-mode: disasterRecovery
For example:
metadata: annotations: apiconnect-operator/deployment-mode: disasterRecovery
- Using the CLI:
- Get the name of the CR by running the following
command:
oc get apiconnectcluster -o name -n <APIC_namespace>
The response looks like the following example:
apiconnectcluster.apiconnect.domain/instance_name
- Add the following annotation to the
spec.metadata.annotations
section of the CR by running the following command:oc annotate apiconnectcluster/instance_name apiconnect-operator/deployment-mode="disasterRecovery" -n <APIC_namespace>
The response looks like the following example:
apiconnectcluster.apiconnect.domain/instance_name annotated
- Get the name of the CR by running the following
command:
- Using the Platform UI:
-
Verify that you correctly added all of the settings to the API Connect Cluster CR.
For example, the completed CR might look like the following example:
apiVersion: apiconnect.ibm.com/v1beta1 kind: APIConnectCluster metadata: namespace: apiconnect name: apis-minimum labels: app.kubernetes.io/instance: apiconnect app.kubernetes.io/managed-by: ibm-apiconnect app.kubernetes.io/name: apiconnect-minimum spec: license: accept: true license: L-RJON-C2YLGB metric: VIRTUAL_PROCESSOR_CORE use: nonproduction management: originalUID: "fa0f6f49-b931-4472-b84d-0922a9a92dfd" encryption_secret: secret_name: apis-minim-faaa44bf-enc-key siteName: "faaa44bf" customApplicationCredentials: - name: atm-cred secretName: apis-minim-faaa44bf-atm-cred - name: ccli-cred secretName: apis-minim-faaa44bf-ccli-cred - name: cli-cred secretName: apis-minim-faaa44bf-cli-cred - name: cui-cred secretName: apis-minim-faaa44bf-cui-cred - name: dsgr-cred secretName: apis-minim-faaa44bf-dsgr-cred - name: juhu-cred secretName: apis-minim-faaa44bf-juhu-cred - name: ui-cred secretName: apis-minim-faaa44bf-ui-cred analytics: client: {} ingestion: {} apiManagerEndpoint: {} cloudManagerEndpoint: {} consumerAPIEndpoint: {} platformAPIEndpoint: {} portal: admin: {} databaseBackup: path: ocp-dr-mgmt host: s3.eu-gb.cloud-object-storage.appdomain.cloud/eu-standard s3provider: ibm schedule: 0 1 * * * protocol: objstore credentials: mgmt-backup-secret analytics: storage: enabled: true type: unique databaseBackup: chunkSize: 1GB credentials: a7s-backup-secret host: s3.eu-gb.cloud-object-storage.appdomain.cloud/eu-standard path: ocp-dr-a7s schedule: 0 1 * * * storageClassName: rook-ceph-block profile: n1xc7.m48 portal: adminClientSubjectDN: "" portalAdminEndpoint: {} portalUIEndpoint: {} originalUID: "447ea4b3-9514-4a84-a34b-ce0b349838da" encryption_secret: secret_name: apis-minim-913edd20-enc-key siteName: "913edd20" portalBackup: credentials: portal-backup-secret host: s3.eu-gb.cloud-object-storage.appdomain.cloud/eu-standard path: ocp-dr-portal protocol: objstore schedule: 0 1 * * * version: 10.0.1.5-eus
-
Prepare your environment for installing API Connect by completing the prerequisites and all
components of step 1 in Installing with the top-level CR.
After you prepare your environment, continue with this procedure to install API Connect and then restore data.
-
Install API Connect by creating the CR with the following command:
Replace
<cr_name>
with the value of thename
setting in the API Connect Cluster CR.oc -n <APIC_namespace> create -f <cr_name>.yaml
-
Wait for all subsystems to be created.
When the Management subsystem is installed, you might see
backup
job pods andstanza-create
job pods inError
state; for example:m1-82b290a2-postgres-stanza-create-4zcgz 0/1 Error 0 35m m1-82b290a2-postgres-full-sch-backup-2g9hm 0/1 Error 0 20m
This is expected behavior, for the following reasons:
- The
stanza-create
job normally expects buckets or subdirectories within buckets to be empty. However, since you configured the Management subsystem with your already-populated S3 bucket (where your backups exist), the job will enter theError
state. - Any scheduled or manual backups will enter the
Error
state. Although you configured the Management subsystem with your already-populated S3 bucket, the new database isn't yet configured to write backups into remote storage.
The errors will not prevent a successful restore, so continue to the next step.
Important:For S3, the recovery remains in an intermediate state until the restore is complete, and Postgres wal files might cause serious disk issues. To avoid this possibility, continue immediately with the next step.
Note that if you delay completion of the restore:
- Health check might fail. In this case, you can still proceed to the next step and perform a restore.
- Postgres wal files might cause problems by consuming all disk space. In this case, you must either:
- Re-install the system, prepare again for disaster recovery, and perform the restore.
- Or increase disk space so that the system returns to a stable state, and then proceed with the restore.
- The
-
Restore the subsystems in the following sequence:
- Restore the Management
subsystem.
When you perform a restore, you must complete the restoration of the Management subsystem first, and then immediately restore the Portal subsystem.
Important: Be sure to restore the Management subsystem from the backup whose name and ID of the you noted while preparing for disaster recovery. - Restore the Developer Portal
subsystem.
Use the restore type
all
to ensure that you restore the complete subsystem and all Portal sites. - Restore the Analytics subsystem.
- Restore the Management
subsystem.
- Update the Management OIDC credentials as explained in Configuring the OIDC credentials on OpenShift.
-
Verify that the recovery was successful:
- Ensure that you can log in to the Cloud Manager UI.
- Verify that your provider organizations exist.
- Ensure that you can log in to each Developer portal.
- Ensure that the Analytics dashboard contains all of the analytics data that you preserved.
- After the successful recovery, remove the annotation that you added
to the
apiconnectcluster
CR in step 10.If you skipped step 10, then skip this step as well.
- Using the Platform UI:
- Edit the API Management instance and click the YAML tab to edit the CR.
- Delete the following statement from the
spec.metadata.annotations
section:apiconnect-operator/deployment-mode: disasterRecovery
- Using the CLI:
- Get the name of the CR by running the following
command:
oc get apiconnectcluster -o name -n <APIC_namespace>
The response looks like the following example:
apiconnectcluster.apiconnect.domain/instance_name
- Remove the annotation by running the following command, making sure to include the trailing "-"
on
deployment-mode-
to indicate the removal:oc -n <APIC_namespace> annotate apiconnectcluster/instance_name apiconnect-operator/deployment-mode-
The response looks like the following example:
apiconnectcluster.apiconnect.domain/instance_name annotated
- Get the name of the CR by running the following
command:
- Using the Platform UI: