Installing a two data center deployment on Kubernetes

Additional installation instructions for a two data center disaster recovery (2DCDR) deployment on Kubernetes.

Before you begin

Ensure that you understand the concepts of 2DCDR in API Connect. For more information, see Two data center warm-standby deployment on Kubernetes and OpenShift.

Download the operator files and custom resource definitions for use during installation onto both of your data centers. Ensure that you complete the instructions for Deploying operators and cert-manager, and set the same ingress-ca certificate in both data centers: Installing cert-manager and certificates in a two data center deployment.

Familiarize yourself with the instructions in Installing the API Connect subsystems. Follow these instructions along with the 2DCDR specific steps in this topic.

Important: Each data center must use a different backup path for your Management backups. For portal backups each data center must use the same backup path. For more information, see Backup and restore requirements for a 2DCDR deployment.

Restrictions:

API Connect is not supported on a FIPS-enabled environment.
It is not possible to use the Automated API behavior testing application (Installing the Automated API behavior testing application) in a 2DCDR configuration (Two data center warm-standby deployment on Kubernetes and OpenShift).

The following endpoints must be the same on both data centers:

Management subsystem endpoints

cloudManagerEndpoint
apiManagerEndpoint
platformAPIEndpoint
consumerAPIEndpoint
consumerCatalogEndpoint

Portal subsystem endpoints

portalAdminEndpoint
portalUIEndpoint

For more information about how to set the endpoints, see Installing the management subsystem on Kubernetes, Installing the developer portal subsystem on Kubernetes.

About this task

A 2DCDR deployment differs from a standard API Connect deployment in that the custom resources (CRs) for both the management and portal in each data center include a multiSiteHA configuration section. The multiSiteHA section is used to define which data center is the active and which is the warm-standby.

The following examples show multiSiteHA sections in the PortalCluster CRs for the active and warm-standby data centers.

siteName: dallas
multiSiteHA:
  mode: active
  replicationEndpoint:
    annotations:
      cert-manager.io/issuer: ingress-issuer
    hosts:
    - name: ptlreplicationdallas.cluster1.example.com
      secretName: dallas-ptl-replication-worker-1
  replicationPeerFQDN: ptlreplicationraleigh.cluster2.example.com
  tlsClient:
    secretName: ptl-replication-client
...

siteName: raleigh
multiSiteHA:
  mode: passive
  replicationEndpoint:
    annotations:
      cert-manager.io/issuer: ingress-issuer
    hosts:
    - name: ptlreplicationraleigh.cluster2.example.com
      secretName: raleigh-ptl-replication-worker-1
  replicationPeerFQDN: ptlreplicationdallas.cluster1.example.com
  tlsClient:
    secretName: ptl-replication-client
...

Table 1. `multiSiteHA` configuration properties
Configuration property	Description
`mode`	The state of the data center in the context of the 2DCDR deployment. Valid values are: `active` - indicates the active data center. `passive` - indicates the warm-standby data center.
`replicationEndpoint annotations: cert-manager.io/issuer: ingress-issuer hosts: - name: secretName:`	Contains details of the external ingress name for the subsystem in the current data center in the 2DCDR deployment. The hosts `name` is a unique fully qualified hostname that the other data center uses to communicate with the current data center.
`replicationPeerFQDN`	The ingress hostname for the other data center in the 2DCDR deployment. This information is required so that the two data centers can communicate with each other.
`tlsClient: secretName`	The TLS client secret for the subsystem.

The siteName property is a unique descriptive name for the portal and management subsystems, in each data center. For example, dallas in data center 1 (DC1), and raleigh in data center 2 (DC2).

Set a different siteName on the active to the one set on the warm-standby data center.
siteName is used in the hostnames, and so can contain only a-z and 0-9 characters.
You can set a siteName in the ManagementCluster or PortalCluster CRs only on installation. You cannot change the name after deployment. If you don't set a siteName at first deployment, an automated siteName is created for you.

To install a 2DCDR deployment:

Set the placeholder properties in the CR file for management and portal subsystems as detailed in Installing the API Connect subsystems, but do not run the final kubectl apply operation. Complete the additional 2DCDR configuration steps before you run kubectl apply.
Set the properties in the multiSiteHA section, and synchronize encryption secrets between the data centers as described in the procedure below:

Procedure

Installing to an active data center

The following example shows how to set the multi-site HA CR values for deploying to the active data center. In this example, the initial active data center is called dallas (DC1) and the warm-standby data center is called raleigh (DC2).
1. Set Dallas to be active for the API Manager service on DC1.
  Edit the ManagementCluster CR file management_cr for DC1, and set the multiSiteHA properties as shown:
```
siteName: dallas
multiSiteHA:
  mode: active
  replicationEndpoint:
    annotations:
      cert-manager.io/issuer: ingress-issuer
    hosts:
    - name: mgrreplicationdallas.cluster1.example.com
      secretName: dc1-mgmt-replication
  replicationPeerFQDN: mgrreplicationraleigh.cluster2.example.com
  tlsClient:
    secretName: dc1-mgmt-replication-client
```
2. Set Dallas to be active for the Developer Portal service on DC1.
  Edit the PortalCluster CR file portal_cr for DC1, and set the multiSiteHA properties as shown:
```
multiSiteHA:
  mode: active
  replicationEndpoint:
    annotations:
      cert-manager.io/issuer: ingress-issuer
    hosts:
    - name: ptlreplicationdallas.cluster1.example.com
      secretName: dallas-ptl-replication-worker-1
  replicationPeerFQDN: ptlreplicationraleigh.cluster2.example.com
  tlsClient:
    secretName: ptl-replication-client
...
siteName: dallas
```
3. Create a secret called ptl-encryption-key on the active and warm-standby data centers.
  The secret must use the same random string in both data centers. Run the following command:
```
kubectl create secret generic ptl-encryption-key --from-literal=encryption_secret=<RANDOM STRING> -n <namespace>
```
  The string can consist of uppercase letters, lowercase letters, and numbers, and must be at least 64 characters and no more than 100 characters.
4. When the secrets are created on both DC1 and DC2, update the portal CR on DC1 and DC2 and add the following under the spec section:
```
 encryptionSecret:
    secretName: ptl-encryption-key
```
  Warning: Ensure that you add the secret directly under the spec section and not inside the multiSiteHA section.
5. Create a secret called mgmt-encryption-key on the active and warm-standby data centers.
  The secret must use the same random string in both data centers. Run the following command:
  Note: Do not use the same random string that you used to create the secret for ptl-encryption-key.
```
kubectl create secret generic mgmt-encryption-key --from-literal=encryption_secret.bin=<RANDOM STRING> -n <namespace>
```
  The string can consist of uppercase letters, lowercase letters, and numbers, and must be at least 64 characters and no more than 100 characters.
6. When the secrets are created on both DC1 and DC2, update the portal CR on DC1 and DC2 and add the following under the spec section:
```
 encryptionSecret:
    secretName: mgmt-encryption-key
```
  Warning: Ensure that you add the secret directly under the spec section and not inside the multiSiteHA section.
7. Apply the CR files to the subsystems in DC1.
  For example, to apply the ManagementCluster CR file to the DC1 cluster, run the following command:
```
kubectl apply -f management_cr.yaml -n <namespace>
```
  Where <namespace> is the target installation namespace in the Kubernetes cluster for DC1. Repeat this process to apply the updated PortalCluster CR file to DC1.
  You can verify that the installation process is running with the following command:
```
kubectl get ManagementCluster -n <namespace>
```
  Note: Until the warm-standby is installed and running, the haStatus reports PeerNotReachable.
  Change ManagementCluster to PortalCluster to verify the Developer Portal installation. For more information, see Installing the API Connect subsystems.
8. Set your dynamic router to direct all traffic to DC1. The endpoints that must be directed to DC1 are:
  Management subsystem endpoints
  cloudManagerEndpoint
  
  apiManagerEndpoint
  
  platformAPIEndpoint
  
  consumerAPIEndpoint
  
  consumerCatalogEndpoint
  Portal subsystem endpoints
  portalAdminEndpoint
  
  portalUIEndpoint
Installing the warm-standby data center

The following example shows how to set the multi-site HA CR values for the remote data center called raleigh (DC2) to be the warm-standby. Use the same encryption key on both sites.

Ensure that the installation of the active data center is complete before you start the installation of the warm-standby data center.
1. Set Raleigh to be warm-standby for the API Manager service on DC2.
  Edit the ManagementCluster CR file management_cr for DC2, and set the multiSiteHA properties as shown:
```
multiSiteHA:
  mode: passive
  replicationEndpoint:
    annotations:
      cert-manager.io/issuer: ingress-issuer
    hosts:
    - name: mgrreplicationraleigh.cluster2.example.com
      secretName: raleigh-mgr-replication-worker-1
  replicationPeerFQDN: mgrreplicationdallas.cluster1.example.com
  tlsClient:
    secretName: mgr-replication-client
...
siteName: raleigh
```
2. Set Raleigh to be warm-standby for the Developer Portal service on DC2.
  Edit the PortalCluster CR file portal_cr for DC2, and set the multiSiteHA properties as shown:
```
multiSiteHA:
  mode: passive
  replicationEndpoint:
    annotations:
      cert-manager.io/issuer: ingress-issuer
    hosts:
    - name: ptlreplicationraleigh.cluster2.example.com
      secretName: raleigh-ptl-replication-worker-1
  replicationPeerFQDN: ptlreplicationdallas.cluster1.example.com
  tlsClient:
    secretName: ptl-replication-client
...
siteName: raleigh
```
3. Apply the CR files to the subsystems in DC2.
  For example, to apply the ManagementCluster CR file to the DC2 cluster, run the following command:
```
kubectl apply -f management_cr.yaml -n <namespace>
```
  Where <namespace> is the target installation namespace in the Kubernetes cluster for DC2. Repeat this process to apply the updated PortalCluster CR file to DC2.
  You can verify that the installation process is running with the following command:
```
kubectl get ManagementCluster -n <namespace>
```
  When installation is complete, haStatus reports Ready.
  Change ManagementCluster to PortalCluster to verify the Developer Portal installation. For more information, see Installing the API Connect subsystems.

Validating your two data center deployment

While the management subsystems on the warm-standby and active data centers are synchronizing their databases, the management status reports Warning, and the haStatus reports pending:

kubectl get mgmt -n <namespace>

NAME         READY   STATUS    VERSION      RECONCILED VERSION   MESSAGE                                                                          AGE
management   n/n     Warning   10.0.8.0-0   10.0.8.0-0           Management is ready. HA Status Warning - see HAStatus in CR for details   8m59s

status:
  haStatus
    {
      "lastTransitionTime": "2023-03-31T19:47:08Z",
      "message": "Replication not working, install or upgrade in progress.",
      "reason": "na",
      "status": "True",
      "type": "Pending"
   }

When the management database replication between sites is complete, the management status reports Running, and status.haStatus reports Ready:

oc get mgmt -n <namespace>

NAME         READY   STATUS    VERSION      RECONCILED VERSION   MESSAGE                                                                          AGE
management   n/n     Running   10.0.10.0-0   10.0.10.0-0           Management is ready. HA status Ready - see HAStatus in CR for details              8m59s

oc get mgmt -n <namespace> -o yaml

...
status:
  haStatus
  {
    "lastTransitionTime": "2023-03-31T19:47:08Z",
    "message": "Replication is working",
    "reason": "na",
    "status": "True",
    "type": "Ready"
  }

If the management database replication between sites fails for any reason other than because an install or upgrade is in progress, the haStatus output shows the reason, for example::

NAME         READY   STATUS    VERSION      RECONCILED VERSION   MESSAGE                                                                          AGE
management   n/n     Warning   10.0.10.0-0   10.0.10.0-0           Management is ready. HA Status Warning - see HAStatus in CR for details   8m59s

status:
  haStatus
    {
      "lastTransitionTime": "2023-12-31T19:47:08Z",
      "message": "Replication not working",
      "reason": "na",
      "status": "True",
      "type": "PeerNotReachable"
   }

If the warning persists, see Troubleshooting a two data center deployment.

You can validate that your portal deployments are synchronizing by running kubectl get pods on both the active and warm-standby data centers, and confirming that the number and names of the pods all match (the UUIDs in the names might be different on each site), and that all are in the Ready state.

For additional replication verification checks, see Verifying replication between data centers. It is recommended to run a test failover, and confirm that all of the expected data is present and correct on the newly active site. See Failing over to the warm-standby.