Installing a two data center deployment on Kubernetes

Additional installation instructions for a two data center disaster recovery (2DCDR) deployment on Kubernetes.

Before you begin

Ensure that you understand the concepts of 2DCDR in API Connect. For more information, see Two data center warm-standby deployment on Kubernetes and OpenShift.

Download the operator files and custom resource definitions for use during installation onto both of your data centers. Ensure that you complete the instructions for Deploying operators and cert-manager, and set the same ingress-ca certificate in both data centers: Installing cert-manager and certificates in a two data center deployment.

Familiarize yourself with the instructions in Installing the API Connect subsystems. Follow these instructions along with the 2DCDR specific steps in this topic.

Important: Each data center must use a different backup path for your Management backups. For portal backups each data center must use the same backup path. For more information, see Backup and restore requirements for a 2DCDR deployment.
Restrictions:
The following endpoints must be the same on both data centers:
Management subsystem endpoints
  • cloudManagerEndpoint
  • apiManagerEndpoint
  • platformAPIEndpoint
  • consumerAPIEndpoint
  • consumerCatalogEndpoint
Portal subsystem endpoints
  • portalAdminEndpoint
  • portalUIEndpoint
For more information about how to set the endpoints, see Installing the management subsystem on Kubernetes, Installing the developer portal subsystem on Kubernetes.

About this task

A 2DCDR deployment differs from a standard API Connect deployment in that the custom resources (CRs) for both the management and portal in each data center include a multiSiteHA configuration section. The multiSiteHA section is used to define which data center is the active and which is the warm-standby.

The following examples show multiSiteHA sections in the PortalCluster CRs for the active and warm-standby data centers.

Table 1. multiSiteHA configuration properties
Configuration property Description
mode The state of the data center in the context of the 2DCDR deployment. Valid values are:
  • active - indicates the active data center.
  • passive - indicates the warm-standby data center.
replicationEndpoint
  annotations: 
    cert-manager.io/issuer: ingress-issuer
  hosts:
    - name:
      secretName:
Contains details of the external ingress name for the subsystem in the current data center in the 2DCDR deployment.

The hosts name is a unique fully qualified hostname that the other data center uses to communicate with the current data center.

replicationPeerFQDN The ingress hostname for the other data center in the 2DCDR deployment. This information is required so that the two data centers can communicate with each other.
tlsClient:
  secretName
The TLS client secret for the subsystem.
The siteName property is a unique descriptive name for the portal and management subsystems, in each data center. For example, dallas in data center 1 (DC1), and raleigh in data center 2 (DC2).
  • Set a different siteName on the active to the one set on the warm-standby data center.
  • siteName is used in the hostnames, and so can contain only a-z and 0-9 characters.
  • You can set a siteName in the ManagementCluster or PortalCluster CRs only on installation. You cannot change the name after deployment. If you don't set a siteName at first deployment, an automated siteName is created for you.
To install a 2DCDR deployment:
  1. Set the placeholder properties in the CR file for management and portal subsystems as detailed in Installing the API Connect subsystems, but do not run the final kubectl apply operation. Complete the additional 2DCDR configuration steps before you run kubectl apply.
  2. Set the properties in the multiSiteHA section, and synchronize encryption secrets between the data centers as described in the procedure below:

Procedure

  • Installing to an active data center

    The following example shows how to set the multi-site HA CR values for deploying to the active data center. In this example, the initial active data center is called dallas (DC1) and the warm-standby data center is called raleigh (DC2).

    1. Set Dallas to be active for the API Manager service on DC1.
      Edit the ManagementCluster CR file management_cr for DC1, and set the multiSiteHA properties as shown:
      siteName: dallas
      multiSiteHA:
        mode: active
        replicationEndpoint:
          annotations:
            cert-manager.io/issuer: ingress-issuer
          hosts:
          - name: mgrreplicationdallas.cluster1.example.com
            secretName: dc1-mgmt-replication
        replicationPeerFQDN: mgrreplicationraleigh.cluster2.example.com
        tlsClient:
          secretName: dc1-mgmt-replication-client
    2. Set Dallas to be active for the Developer Portal service on DC1.
      Edit the PortalCluster CR file portal_cr for DC1, and set the multiSiteHA properties as shown:
      multiSiteHA:
        mode: active
        replicationEndpoint:
          annotations:
            cert-manager.io/issuer: ingress-issuer
          hosts:
          - name: ptlreplicationdallas.cluster1.example.com
            secretName: dallas-ptl-replication-worker-1
        replicationPeerFQDN: ptlreplicationraleigh.cluster2.example.com
        tlsClient:
          secretName: ptl-replication-client
      ...
      siteName: dallas
      
    3. Create a secret called ptl-encryption-key on the active and warm-standby data centers.
      The secret must use the same random string in both data centers. Run the following command:
      kubectl create secret generic ptl-encryption-key --from-literal=encryption_secret=<RANDOM STRING> -n <namespace>

      The string can consist of uppercase letters, lowercase letters, and numbers, and must be at least 64 characters and no more than 100 characters.

    4. When the secrets are created on both DC1 and DC2, update the portal CR on DC1 and DC2 and add the following under the spec section:
       encryptionSecret:
          secretName: ptl-encryption-key
      Warning: Ensure that you add the secret directly under the spec section and not inside the multiSiteHA section.
    5. Create a secret called mgmt-encryption-key on the active and warm-standby data centers.
      The secret must use the same random string in both data centers. Run the following command:
      Note: Do not use the same random string that you used to create the secret for ptl-encryption-key.
      kubectl create secret generic mgmt-encryption-key --from-literal=encryption_secret.bin=<RANDOM STRING> -n <namespace>

      The string can consist of uppercase letters, lowercase letters, and numbers, and must be at least 64 characters and no more than 100 characters.

    6. When the secrets are created on both DC1 and DC2, update the portal CR on DC1 and DC2 and add the following under the spec section:
       encryptionSecret:
          secretName: mgmt-encryption-key
      Warning: Ensure that you add the secret directly under the spec section and not inside the multiSiteHA section.
    7. Apply the CR files to the subsystems in DC1.
      For example, to apply the ManagementCluster CR file to the DC1 cluster, run the following command:
      kubectl apply -f management_cr.yaml -n <namespace>
      Where <namespace> is the target installation namespace in the Kubernetes cluster for DC1. Repeat this process to apply the updated PortalCluster CR file to DC1.
      You can verify that the installation process is running with the following command:
      kubectl get ManagementCluster -n <namespace>
      Note: Until the warm-standby is installed and running, the haStatus reports PeerNotReachable.

      Change ManagementCluster to PortalCluster to verify the Developer Portal installation. For more information, see Installing the API Connect subsystems.

    8. Set your dynamic router to direct all traffic to DC1. The endpoints that must be directed to DC1 are:
      Management subsystem endpoints
      • cloudManagerEndpoint
      • apiManagerEndpoint
      • platformAPIEndpoint
      • consumerAPIEndpoint
      • consumerCatalogEndpoint
      Portal subsystem endpoints
      • portalAdminEndpoint
      • portalUIEndpoint
  • Installing the warm-standby data center

    The following example shows how to set the multi-site HA CR values for the remote data center called raleigh (DC2) to be the warm-standby. Use the same encryption key on both sites.

    Ensure that the installation of the active data center is complete before you start the installation of the warm-standby data center.

    1. Set Raleigh to be warm-standby for the API Manager service on DC2.
      Edit the ManagementCluster CR file management_cr for DC2, and set the multiSiteHA properties as shown:
      multiSiteHA:
        mode: passive
        replicationEndpoint:
          annotations:
            cert-manager.io/issuer: ingress-issuer
          hosts:
          - name: mgrreplicationraleigh.cluster2.example.com
            secretName: raleigh-mgr-replication-worker-1
        replicationPeerFQDN: mgrreplicationdallas.cluster1.example.com
        tlsClient:
          secretName: mgr-replication-client
      ...
      siteName: raleigh
      
    2. Set Raleigh to be warm-standby for the Developer Portal service on DC2.
      Edit the PortalCluster CR file portal_cr for DC2, and set the multiSiteHA properties as shown:
      multiSiteHA:
        mode: passive
        replicationEndpoint:
          annotations:
            cert-manager.io/issuer: ingress-issuer
          hosts:
          - name: ptlreplicationraleigh.cluster2.example.com
            secretName: raleigh-ptl-replication-worker-1
        replicationPeerFQDN: ptlreplicationdallas.cluster1.example.com
        tlsClient:
          secretName: ptl-replication-client
      ...
      siteName: raleigh
      
    3. Apply the CR files to the subsystems in DC2.
      For example, to apply the ManagementCluster CR file to the DC2 cluster, run the following command:
      kubectl apply -f management_cr.yaml -n <namespace>
      Where <namespace> is the target installation namespace in the Kubernetes cluster for DC2. Repeat this process to apply the updated PortalCluster CR file to DC2.
      You can verify that the installation process is running with the following command:
      kubectl get ManagementCluster -n <namespace>
      When installation is complete, haStatus reports Ready.

      Change ManagementCluster to PortalCluster to verify the Developer Portal installation. For more information, see Installing the API Connect subsystems.

  • Validating your two data center deployment
    While the management subsystems on the warm-standby and active data centers are synchronizing their databases, the management status reports Warning, and the haStatus reports pending:
    kubectl get mgmt -n <namespace>
    
    NAME         READY   STATUS    VERSION      RECONCILED VERSION   MESSAGE                                                                          AGE
    management   n/n     Warning   10.0.8.0-0   10.0.8.0-0           Management is ready. HA Status Warning - see HAStatus in CR for details   8m59s
    
    status:
      haStatus
        {
          "lastTransitionTime": "2023-03-31T19:47:08Z",
          "message": "Replication not working, install or upgrade in progress.",
          "reason": "na",
          "status": "True",
          "type": "Pending"
       }
    When the management database replication between sites is complete, the management status reports Running, and status.haStatus reports Ready:
    oc get mgmt -n <namespace>
    
    NAME         READY   STATUS    VERSION      RECONCILED VERSION   MESSAGE                                                                          AGE
    management   n/n     Running   10.0.8.0-0   10.0.8.0-0           Management is ready. HA status Ready - see HAStatus in CR for details              8m59s
    oc get mgmt -n <namespace> -o yaml
    
    ...
    status:
      haStatus
      {
        "lastTransitionTime": "2023-03-31T19:47:08Z",
        "message": "Replication is working",
        "reason": "na",
        "status": "True",
        "type": "Ready"
      }
    If the management database replication between sites fails for any reason other than because an install or upgrade is in progress, the haStatus output shows the reason, for example::
    NAME         READY   STATUS    VERSION      RECONCILED VERSION   MESSAGE                                                                          AGE
    management   n/n     Warning   10.0.8.0-0   10.0.8.0-0           Management is ready. HA Status Warning - see HAStatus in CR for details   8m59s
    
    status:
      haStatus
        {
          "lastTransitionTime": "2023-12-31T19:47:08Z",
          "message": "Replication not working",
          "reason": "na",
          "status": "True",
          "type": "PeerNotReachable"
       }
    If the warning persists, see Troubleshooting a two data center deployment.

    You can validate that your portal deployments are synchronizing by running kubectl get pods on both the active and warm-standby data centers, and confirming that the number and names of the pods all match (the UUIDs in the names might be different on each site), and that all are in the Ready state.