Converting a single data center to a two data center deployment on Kubernetes

Instructions on how to convert a single data center to a two data center disaster recovery (2DCDR) deployment on Kubernetes.

Before you begin

Ensure that you understand the concepts of 2DCDR in API Connect. For more information, see Two data center warm-standby deployment on Kubernetes and OpenShift.

Download the operator files and custom resource definitions for use during installation onto both of your data centers. Ensure that you complete the instructions for Deploying operators and cert-manager, and set the same ingress-ca certificate in both data centers: Installing cert-manager and certificates in a two data center deployment.

Familiarize yourself with the instructions in Installing the API Connect subsystems. Follow these instructions along with the 2DCDR specific steps in this topic.

Important: Each data center must use a different backup path for your Management backups. For portal backups each data center must use the same backup path. For more information, see Backup and restore requirements for a 2DCDR deployment.
Restrictions:

About this task

You can convert a single data center API Connect to a 2DCDR deployment. Both data centers must have the same portal-admin and portal-www endpoints. The main difference is that the custom resource (CR) files for both the Management subsystem and the Developer Portal subsystem on each data center must include a multiSiteHA configuration section, and it's this section that determines the two data center DR deployment. It's also this section that's used to control the deployment in the future, for example in failover and recovery situations.

The following codeblocks show examples of completed multiSiteHA sections in the PortalCluster CR files for two data centers, one active and one warm-standby. Both the ManagementCluster and the PortalCluster CR templates have a multiSiteHA section containing the placeholder $MULTI_SITE_HA, and it is this placeholder that must be replaced with the values for your deployment.

The configuration properties are described in the following table.
Table 1. The configuration properties of the multi-site HA section in the CR file
Configuration property Description
mode The state of the data center in the context of the two data center deployment. Valid values are:
  • active - indicates the primary data center
  • passive - indicates the warm-standby data center
replicationEndpoint
  annotations: 
    cert-manager.io/issuer
  hosts:
    - name:
      secretName:
Contains details of the external ingress name for the subsystem in the current data center in the two data center deployment.

The hosts name is a unique fully qualified hostname that the other data center uses to communicate with the current data center.

replicationPeerFQDN The ingress hostname for the other data center in the two data center deployment. This information is required so that the two data centers can communicate with each other.
tlsClient:
  secretName
The TLS client secret for the subsystem.
siteName A descriptive name for the Portal subsystem, and the Management subsystem, in the current data center, for example dallas. This name is used in the hostnames, and so can contain only a-z and 0-9 characters. Note that you can set a siteName in the ManagementCluster or PortalCluster CR only when the subsystems are first deployed, and you cannot change the name after deployment. If you don't set a siteName at first deployment, an automated siteName is created for you. For example, if you are converting an existing single Management cluster to a two data center Management cluster by adding the multi-site HA values, you will not be able to configure a siteName for the existing Management cluster.

To install a two data center disaster recovery deployment, you must set all of the placeholder properties in the CR file for both the Management and the Developer Portal subsystems as detailed in the relevant topics in the Installing the API Connect subsystems section, as well as the properties in the multi-site HA section that are detailed here. Some examples of complete CR files are shown in the Example section.

Procedure

  • Converting a single data center deployment to two data centers
    When converting a single data center deployment into a two data center deployment, it is best practice to use the existing data center as the active one, and add a new warm-standby data center elsewhere. This practice ensures that the data is retained.
    Remember:
    • Both data centers must have the same portal-admin and portal-www endpoints.
    • You cannot configure a siteName for the existing Management or Portal cluster because the siteName property can be configured only at first deployment.

    The following example shows how to update the multi-site HA CR values for making the local data center called dallas (DC1) active, and for setting the new remote data center called raleigh (DC2) as warm-standby.

  • Install cert-manager.
    Note: Run these steps on DC1 only if you don't already have cert-manager running.
    1. Obtain cert-manager.

      API Connect v10 uses cert-manager v1.12.13 of cert-manager, which is a native Kubernetes certificate management controller.

      You can obtain cert-manager v1.12.13 from the API Connect v10 distribution helper_files.zip archive, or from https://github.com/cert-manager/cert-manager/releases/tag/v1.12.13.

    2. Apply the CR:
      kubectl apply -f cert-manager-1.12.13.yaml

      Do not specify a custom namespace.

      See https://docs.cert-manager.io/en/release-0.10/getting-started/install/kubernetes.html.

    3. Wait for cert-manager pods to enter Running 1/1 status before proceeding. To check the status:
      kubectl get po -n cert-manager 
      There are 3 cert-manager pods in total.
  • Use the following steps to allow ingress-ca secrets to be the same on both data centers.
    1. On DC1 apply the file ingress-issuer-v1-dc1.yaml:
      kubectl -n <namespace> apply -f ingress-issuer-v1-dc1.yaml
    2. Validate that the command succeeded:
      kubectl get certificates -n <namespace>
    3. Export ingress-ca secret as a yaml from DC1:
      kubectl -n <namespace> get secret ingress-ca -o yaml > ingress-ca.yaml
    4. Edit the ingress-ca.yaml file to remove all labels, annotations, creationTimestamp, resourceVersion, uid, and selfLink.
    5. Copy the ingress-ca.yaml from DC1 to DC2 and apply that file on DC2:
      kubectl -n <namespace> apply -f ingress-ca.yaml
      
    6. On DC2 apply the file ingress-issuer-v1-dc2.yaml:
      kubectl -n <namespace> apply -f ingress-issuer-v1-dc2.yaml
      
    7. Use the following commands to test that they are the same, on DC1 run:
      kubectl -n <namespace> get secrets ingress-ca -o yaml | grep tls.crt | awk '{print $2}' | base64 -d > /tmp/ingress.pem.dc1
    8. On DC2 run:
      kubectl -n <namespace> get secrets ingress-ca -o yaml | grep tls.crt | awk '{print $2}' | base64 -d > /tmp/ingress.pem.dc2
    9. To see the differences run:
      diff /tmp/ingress.pem.dc1 /tmp/ingress.pem.dc2 
      The files should be the same.
    10. To ensure that the certificates are working correctly and that they are using the ingress-ca secret. First, get the portal-admin-client crt file:
      kubectl -n <namespace> get secrets portal-admin-client -o yaml | grep tls.crt | awk '{print $2}' | base64 -d > /tmp/admin-client.crt
    11. Test that it is working by using OpenSSL:
      openssl verify -verbose -CAfile /tmp/ingress.pem.dc1 /tmp/admin-client.crt
      If it is working, you should see:
      /tmp/admin-client.crt: OK
  • Installing to an active data center

    The following example shows how to set the multi-site HA CR values for deploying to an active data center (DC). In this instance, we are deploying to a local data center called dallas (DC1) that has a remote data center called raleigh (DC2).

    1. Set Dallas to be active for the API Manager service on DC1.
      As the API Manager service already exists, you cannot set the siteName property. Add the multi-site HA section properties to the existing ManagementCluster CR file for DC1. For example:
      multiSiteHA:
        mode: active
        replicationEndpoint:
          annotations:
            cert-manager.io/issuer: ingress-issuer
          hosts:
          - name: mgrreplicationdallas.cluster1.example.com
            secretName: dc1-mgmt-replication
        replicationPeerFQDN: mgrreplicationraleigh.cluster2.example.com
        tlsClient:
          secretName: dc1-mgmt-replication-client
    2. Set Dallas to be active for the Developer Portal service on DC1.
      As the Developer Portal service already exists, you cannot set the siteName property. Add the multi-site HA section properties to the existing PortalCluster CR file for DC1. For example:
      multiSiteHA:
        mode: active
        replicationEndpoint:
          annotations:
            cert-manager.io/issuer: ingress-issuer
          hosts:
          - name: ptlreplicationdallas.cluster1.example.com
            secretName: dallas-ptl-replication-worker-1
        replicationPeerFQDN: ptlreplicationraleigh.cluster2.example.com
        tlsClient:
          secretName: ptl-replication-client
      
    3. Create a secret called ptl-encryption-key on the active and warm-standby data centers.
      You must create a secret on the active and warm-standby sites that use the same random string, run the following command:
      kubectl create secret generic ptl-encryption-key --from-literal=encryption_secret=<RANDOM STRING> -n <namespace>

      The string can consist of uppercase letters, lowercase letters, and numbers, and must be at least 64 characters and no more than 100 characters.

    4. When the secrets are created on both DC1 and DC2, update the Portal CR on DC1 and DC2 to include in the spec object:
       encryptionSecret:
          secretName: ptl-encryption-key
      Warning: Ensure that you add the secret to the spec object and not to the multiSiteHA object.
    5. Apply the CR files to the subsystems in DC1.
      For example, to apply the ManagementCluster CR file to the DC1 cluster, run the following command:
      kubectl apply -f management_cr.yaml -n <namespace>
      Where <namespace> is the target installation namespace in the Kubernetes cluster for DC1. Repeat this process to apply the updated PortalCluster CR file to DC1.
      You can verify that the updated CR files have been applied by running the following command:
      kubectl get ManagementCluster -n <namespace>
      You can verify that the installation process is running with the following command:
      kubectl get ManagementCluster -n <namespace>
      Note: Until the warm-standby is installed and running, the haStatus reports PeerNotReachable.

      Change ManagementCluster to PortalCluster to verify the Developer Portal installation. For more information, see Installing the API Connect subsystems.

    6. Set your dynamic router to direct all traffic to DC1.
      This includes setting the four endpoints for the API Manager cluster, and the two endpoints for the Developer Portal cluster.
  • Installing to a warm-standby data center

    The following example shows how to set the multi-site HA CR values for the remote data center called raleigh (DC2), which is being installed as the warm-standby, standby location. You must use the same encryption key on both sites.

    1. Get the encryption key from Dallas the active (DC1) site:
      1. Run the command kubectl get mgmt -o yaml|grep enc to get the db encryption key on the active site, for example:
        $ kubectl -n <namespace> get mgmt -o yaml|grep enc 
        An example of possible output:
        encryptionSecret: dallas-enc-key
        Make a note of the secret name. In this case, it's dallas-enc-key.
      2. Run kubectl get secret dallas-enc-key -o yaml, an example of possible output:
        apiVersion: v1
        data:
          encryption_secret.bin: VKBNFj7sAOizxvE1H6i+9P31oJHvWWsO+x***********************************EGe/K+x6b3D7FEWGgoyGlWBUJKB4+T21My2iR5rBTovpyLiY5g********************************************tSiRcQKegMPNBPgL829SVBCxuv3I=
        kind: Secret
        metadata:
          creationTimestamp: "2020-08-28T12:53:41Z"
          labels:
            app.kubernetes.io/instance: m1
            app.kubernetes.io/managed-by: ibm-apiconnect
            app.kubernetes.io/name: dallas-enc-key
          name: dallas-enc-key
          namespace: default
          resourceVersion: "43039"
          selfLink: /api/v1/namespaces/default/secrets/m1-enc-key
          uid: 46c92395-9cc2-4421-b2c4-48e472c0cbb1
        type: Opaque
      3. Copy the output that you got and put it into a new file. Delete the contents of metadata: and add back in a name: attribute, in this case dc1-enc-key. An example of the format is here:
        apiVersion: v1
        data:
          encryption_secret.bin: VKBNFj7sAOizxvE1H6i+9P31oJHvWWsO+x***********************************EGe/K+x6b3D7FEWGgoyGlWBUJKB4+T21My2iR5rBTovpyLiY5g********************************************tSiRcQKegMPNBPgL829SVBCxuv3I= 
        kind: Secret
        metadata:
          name: dc1-enc-key
        type: Opaque
      4. Save the file. In this example, it's called dc1-enc-key.yaml.
    2. On Raleigh the (DC2) site:
      1. Copy the yaml file that you just saved that contains the encryption_secret.bin from Dallas the active (DC1) site.
      2. Run kubectl create -f dc1-enc-key.yaml to create the same secret with the same key on the Raleigh (DC2) site. For example:
        $ kubectl  -n <namespace> create -f /tmp/dc1-enc-key.yaml
        An example of possible output:
        secret/dc1-enc-key created
      3. In the management CR for Raleigh the (DC2) site, add the following entry to the spec.
        encryptionSecret:
            secretName: dc1-enc-key
    3. Set Raleigh to be warm-standby for the API Manager service on DC2.
      Edit the ManagementCluster CR file management_cr for DC2, and set the multi-site HA section properties, for example:
      multiSiteHA:
        mode: passive
        replicationEndpoint:
          annotations:
            cert-manage.io/issuer: ingress-issuer
          hosts:
          - name: mgrreplicationraleigh.cluster2.example.com
            secretName: raleigh-mgr-replication-worker-1
        replicationPeerFQDN: mgrreplicationdallas.cluster1.example.com
        tlsClient:
          secretName: mgr-replication-client
      ...
      siteName: raleigh
      
    4. Set Raleigh to be warm-standby for the Developer Portal service on DC2.
      Edit the PortalCluster CR file portal_cr for DC2, and set the multi-site HA section properties, for example:
      multiSiteHA:
        mode: passive
        replicationEndpoint:
          annotations:
            cert-manage.io/issuer: ingress-issuer
          hosts:
          - name: ptlreplicationraleigh.cluster2.example.com
            secretName: raleigh-ptl-replication-worker-1
        replicationPeerFQDN: ptlreplicationdallas.cluster1.example.com
        tlsClient:
          secretName: ptl-replication-client
      ...
      siteName: raleigh
      
    5. Ensure that you completed the steps that are in the Installing to an active data center section to create a secret called ptl-encryption-key on DC1 and DC2, and to update the Portal CR on DC1 and DC2 to include the ptl-encryption-key in the spec object.
    6. Apply the CR files to the subsystems in DC2.
      For example, to apply the ManagementCluster CR file to the DC2 cluster, run the following command:
      kubectl apply -f management_cr.yaml -n <namespace>
      Where <namespace> is the target installation namespace in the Kubernetes cluster for DC2. Repeat this process to apply the updated PortalCluster CR file to DC2.
      You can verify that the updated CR files are applied by running the following command:
      kubectl get ManagementCluster -n <namespace>
      You can verify that the installation process is running with the following command:
      kubectl get ManagementCluster -n <namespace>
      When installation is complete, haStatus reports Ready.

      Change ManagementCluster to PortalCluster to verify the Developer Portal installation. For more information, see Installing the API Connect subsystems.