Installing a two data center deployment

Additional installation instructions explaining how to deploy a two data center disaster recovery (DR) deployment on Kubernetes and OpenShift.

Before you begin

Ensure that you understand the concepts of two data center disaster recovery in API Connect. For more information, see A two data center deployment strategy on Kubernetes and OpenShift.

You must download the operator files and custom resource definitions for use during installation onto both of your data centers. Ensure that you complete the instructions for Deploying operators.

Familiarize yourself with the instructions in Installing the API Connect subsystems. You will need to complete these instructions in conjunction with the additional steps in this topic that are specific to installing a two data center disaster recovery deployment.

About this task

Installing API Connect as a two data center disaster recovery deployment is very similar to a single installation, as the deployment in each data center is effectively an instance of the same API Connect deployment. The following endpoints must be the same on both data centers:
Management subsystem endpoints
  • cloudManagerEndpoint
  • apiManagerEndpoint
  • platformAPIEndpoint
  • consumerAPIEndpoint
Portal subsystem endpoints
  • portalAdminEndpoint
  • portalUIEndpoint
For more information about how to set the endpoints, see Installing the Management subsystem cluster on Kubernetes, Installing the Developer Portal subsystem on Kubernetes, or Deploying on OpenShift.

The main difference is that the custom resource (CR) files for both the Management subsystem and the Developer Portal subsystem on each data center must include a multiSiteHA configuration section, and it's this section that determines the two data center DR deployment. It's also this section that's used to control the deployment in the future, for example in failover and recovery situations.

The following codeblocks show examples of completed multiSiteHA sections in the PortalCluster CR files for two data centers, one active and one passive. Both the ManagementCluster and the PortalCluster CR templates have a multiSiteHA section containing the placeholder $MULTI_SITE_HA, and it is this placeholder that must be replaced with the values for your deployment.

The configuration properties are described in the following table.
Table 1. The configuration properties of the multi-site HA section in the CR file
Configuration property Description
mode The state of the data center in the context of the two data center deployment. Valid values are:
  • active - indicates the primary data center
  • passive - indicates the standby data center
replicationEndpoint
  annotations: 
    cert-manager.io/issuer: ingress-issuer
  hosts:
    - name:
      secretName:
Contains details of the external ingress name for the subsystem in the current data center in the two data center deployment.

The hosts name is a unique fully qualified hostname that the other data center uses to communicate with the current data center.

replicationPeerFQDN The ingress hostname for the other data center in the two data center deployment. This information is required so that the two data centers can communicate with each other.
tlsClient:
  secretName
The TLS client secret for the subsystem.

The siteName property is a unique descriptive name for the Portal subsystem, and the Management subsystem, in each data center, for example dallas on data center 1, and raleigh on data center 2. You must set different a siteName on the active data center, to the one set on the passive data center. This name is used in the host names, and so can contain only a-z and 0-9 characters. Note that you can set a siteName in the ManagementCluster or PortalCluster CR only when the subsystems are first deployed, and you cannot change the name after deployment. If you don't set a siteName at first deployment, an automated siteName is created for you. For example, if you are converting an existing single Management cluster to a two data center Management cluster by adding the multi-site HA values, you will not be able to configure a siteName for the existing Management cluster.

To install a two data center disaster recovery deployment, you must set all of the placeholder properties in the CR file for both the Management and the Developer Portal subsystems as detailed in the relevant topics in the Installing the API Connect subsystems section, as well as the properties in the multi-site HA section that are detailed here. Some examples of complete CR files are shown in the Example section.

See the following instructions for information about how to configure a two data center DR deployment:

Procedure

  • Installing to an active data center

    The following example shows how to set the multi-site HA CR values for deploying to an active data center (DC). In this instance, we are deploying to a local data center called dallas (DC1) that has a remote data center called raleigh (DC2).

    1. Set Dallas to be active for the API Manager service on DC1.
      Edit the ManagementCluster CR file management_cr for DC1, and set the multi-site HA section properties, for example:
      siteName: dallas
      multiSiteHA:
        mode: active
        replicationEndpoint:
          annotations:
            cert-manager.io/issuer: ingress-issuer
          hosts:
          - name: mgrreplicationdallas.cluster1.example.com
            secretName: dc1-mgmt-replication
        replicationPeerFQDN: mgrreplicationraleigh.cluster2.example.com
        tlsClient:
          secretName: dc1-mgmt-replication-client
    2. Set Dallas to be active for the Developer Portal service on DC1.
      Edit the PortalCluster CR file portal_cr for DC1, and set the multi-site HA section properties, for example:
      multiSiteHA:
        mode: active
        replicationEndpoint:
          annotations:
            cert-manager.io/issuer: ingress-issuer
          hosts:
          - name: ptlreplicationdallas.cluster1.example.com
            secretName: dallas-ptl-replication-worker-1
        replicationPeerFQDN: ptlreplicationraleigh.cluster2.example.com
        tlsClient:
          secretName: ptl-replication-client
      ...
      siteName: dallas
      
    3. Create a secret called ptl-encryption-key on the active and passive data centers.
      You must create a secret on the active and passive sites that use the same random string, run the following command:
      kubectl create secret generic ptl-encryption-key --from-literal=encryption_secret=<RANDOM STRING> -n <namespace>

      The string can consist of uppercase letters, lowercase letters, and numbers, and must be at least 64 characters and no more than 100 characters.

    4. When the secrets are created on both DC1 and DC2, update the Portal CR on DC1 and DC2 to include in the spec object:
       encryptionSecret:
          secretName: ptl-encryption-key
      Warning: Ensure that you add the secret to the spec object and not to the multiSiteHA object.
    5. Create a secret called mgmt-encryption-key on the active and passive data centers.
      You must create a secret on the active and passive sites that use the same random string, run the following command:
      Note: You must not use the same random string that you used to create the secret for ptl-encryption-key.
      kubectl create secret generic mgmt-encryption-key --from-literal=encryption_secret.bin=<RANDOM STRING> -n <namespace>

      The string can consist of uppercase letters, lowercase letters, and numbers, and must be at least 64 characters and no more than 100 characters.

    6. When the secrets are created on both DC1 and DC2, update the Management CR on DC1 and DC2 to include in the spec object:
       encryptionSecret:
          secretName: mgmt-encryption-key
      Warning: Ensure that you add the secret to the spec object and not to the multiSiteHA object.
    7. Apply the CR files to the subsystems in DC1.
      For example, to apply the ManagementCluster CR file to the DC1 cluster, run the following command:
      kubectl apply -f management_cr.yaml -n <namespace>
      Where <namespace> is the target installation namespace in the Kubernetes cluster for DC1. Repeat this process to apply the updated PortalCluster CR file to DC1.
      You can verify that the updated CR files have been applied by running the following command:
      kubectl get ManagementCluster -n <namespace>
      Change ManagementCluster to PortalCluster to verify the Developer Portal installation. For more information, see Installing the API Connect subsystems.
    8. Set your dynamic router to direct all traffic to DC1.
      This includes setting the four endpoints for the API Manager cluster, and the two endpoints for the Developer Portal cluster.
  • Installing to a passive data center

    The following example shows how to set the multi-site HA CR values for the remote data center called raleigh (DC2), which is being installed as the passive, standby location. You must use the same encryption key on both sites.

    Ensure that the installation of the active data center is complete before you start the installation of the passive data center.

    1. Set Raleigh to be passive for the API Manager service on DC2.
      Edit the ManagementCluster CR file management_cr for DC2, and set the multi-site HA section properties, for example:
      multiSiteHA:
        mode: passive
        replicationEndpoint:
          annotations:
            cert-manager.io/issuer: ingress-issuer
          hosts:
          - name: mgrreplicationraleigh.cluster2.example.com
            secretName: raleigh-mgr-replication-worker-1
        replicationPeerFQDN: mgrreplicationdallas.cluster1.example.com
        tlsClient:
          secretName: mgr-replication-client
      ...
      siteName: raleigh
      
    2. Set Raleigh to be passive for the Developer Portal service on DC2.
      Edit the PortalCluster CR file portal_cr for DC2, and set the multi-site HA section properties, for example:
      multiSiteHA:
        mode: passive
        replicationEndpoint:
          annotations:
            cert-manager.io/issuer: ingress-issuer
          hosts:
          - name: ptlreplicationraleigh.cluster2.example.com
            secretName: raleigh-ptl-replication-worker-1
        replicationPeerFQDN: ptlreplicationdallas.cluster1.example.com
        tlsClient:
          secretName: ptl-replication-client
      ...
      siteName: raleigh
      
    3. Ensure that you completed the steps that are in the Installing to an active data center section to create a secret called ptl-encryption-key on DC1 and DC2, and to update the Portal CR on DC1 and DC2 to include the ptl-encryption-key in the spec object.
    4. Apply the CR files to the subsystems in DC2.
      For example, to apply the ManagementCluster CR file to the DC2 cluster, run the following command:
      kubectl apply -f management_cr.yaml -n <namespace>
      Where <namespace> is the target installation namespace in the Kubernetes cluster for DC2. Repeat this process to apply the updated PortalCluster CR file to DC2.
      You can verify that the updated CR files are applied by running the following command:
      kubectl get ManagementCluster -n <namespace>
      Change ManagementCluster to PortalCluster to verify the Developer Portal installation. For more information, see Installing the API Connect subsystems.
  • Validating your two data center deployment

    To validate that your two data center deployment is working, execute a test failover, and confirm that all of the expected data is present and correct on the newly active site. See Failure handling of a two data center deployment for details.

Example

Example of a ManagementCluster CR file for an active data center:
apiVersion: management.apiconnect.ibm.com/v1beta1
kind: ManagementCluster
metadata:
  name: management
  labels: {
    app.kubernetes.io/instance: "management",
    app.kubernetes.io/managed-by: "ibm-apiconnect",
    app.kubernetes.io/name: "management"
  }
spec:
  siteName: dc1
  multiSiteHA:
    mode: passive
    replicationEndpoint:
      hosts:
      - name: mgrreplicationfrankfurt.apic-2ha-frankfur-683198-xxxxxxx950db7077951d4-0000.eu-de.containers.appdomain.cloud
        secretName: dc1-mgmt-replication
    replicationPeerFQDN: mgrreplicationdallas.apic-2ha-dallas-xxxxxxx950db7077951d4-0000.us-south.containers.appdomain.cloud
    tlsClient:
      secretName: dc1-mgmt-replication-client
  version: 10.0.1.0
  imagePullSecrets:
  - apic-registry-secret
  imageRegistry: apic-dev-docker-local.artifactory.swg-devops.com
  profile: n3xc4.m16
  portal:
    admin:
      secretName: portal-admin-client
  analytics:
    client:
      secretName: analytics-client-client
    ingestion:
      secretName: analytics-ingestion-client
  gateway:
    client:
      secretName: gateway-client-client
  cloudManagerEndpoint:
    hosts:
    - name: admin.ha-demo.apic.test.appdomain.cloud
      secretName: cm-endpoint
  apiManagerEndpoint:
    hosts:
    - name: manager.ha-demo.apic.test.appdomain.cloud
      secretName: apim-endpoint
  platformAPIEndpoint:
    hosts:
    - name: api.ha-demo.apic.test.appdomain.cloud
      secretName: api-endpoint
  consumerAPIEndpoint:
    hosts:
    - name: consumer.ha-demo.apic.test.appdomain.cloud
      secretName: consumer-endpoint
  databaseVolumeClaimTemplate:
    storageClassName: apic-mz-ibmc-block-gold-hourly
  microServiceSecurity: custom
  license:
    accept: true
    use: production
Example of a ManagementCluster CR file for a passive data center:
apiVersion: management.apiconnect.ibm.com/v1beta1
kind: ManagementCluster
metadata:
  name: management
  labels: {
    app.kubernetes.io/instance: "management",
    app.kubernetes.io/managed-by: "ibm-apiconnect",
    app.kubernetes.io/name: "management"
  }
spec:
  siteName: dc2
  encryptionSecret:
    secretName: dc1-enc-key
  multiSiteHA:
    mode: active
    replicationEndpoint:
      annotations:
        cert-manager.io/issuer: ingress-issuer
      hosts:
      - name: mgrreplicationdallas.apic-2ha-dallas-xxxxxxxe950db7077951d4-0000.us-south.containers.appdomain.cloud
        secretName: dc2-mgmt-replication
    replicationPeerFQDN: mgrreplicationfrankfurt.apic-2ha-frankfur-683198-xxxxxxxe950db7077951d4-0000.eu-de.containers.appdomain.cloud
    tlsClient:
      secretName: dc2-mgmt-replication-client
  version: 10.0.1.0
  imagePullSecrets:
  - apic-registry-secret
  imageRegistry: apic-dev-docker-local.artifactory.swg-devops.com
  profile: n3xc4.m16
  portal:
    admin:
      secretName: portal-admin-client
  analytics:
    client:
      secretName: analytics-client-client
    ingestion:
      secretName: analytics-ingestion-client
  gateway:
    client:
      secretName: gateway-client-client
  cloudManagerEndpoint:
    hosts:
    - name: admin.ha-demo.apic.test.appdomain.cloud
      secretName: cm-endpoint
  apiManagerEndpoint:
    hosts:
    - name: manager.ha-demo.apic.test.appdomain.cloud
      secretName: apim-endpoint
  platformAPIEndpoint:
    hosts:
    - name: api.ha-demo.apic.test.appdomain.cloud
      secretName: api-endpoint
  consumerAPIEndpoint:
    hosts:
    - name: consumer.ha-demo.apic.test.appdomain.cloud
      secretName: consumer-endpoint
  databaseVolumeClaimTemplate:
    storageClassName: apic-mz-ibmc-block-gold-hourly
  microServiceSecurity: custom
  license:
    accept: true
    use: production
Example of a PortalCluster CR file for an active data center:
apiVersion: portal.apiconnect.mycompany.com/v1
kind: PortalCluster
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":[truncated]}
  creationTimestamp: "2020-04-09T18:25:44Z"
  generation: 10
  name: portal
  namespace: cvs
  resourceVersion: "16603004"
  selfLink: /apis/portal.apiconnect.mycompany.com/v1/namespaces/cvs/portalclusters/portal
  uid: xxxx333a-ff80-4059-b9c7-6718c1c3f393
spec:
  adminVolumeClaimTemplate:
    storageClassName: apic-mz-ibmc-block-gold-hourly
    volumeSize: 20Gi
  appVersion: 10.0.0.0
  backupVolumeClaimTemplate:
    storageClassName: apic-mz-ibmc-block-gold-hourly
    volumeSize: 30Gi
  databaseLogsVolumeClaimTemplate:
    storageClassName: apic-mz-ibmc-block-gold-hourly
    volumeSize: 12Gi
  databaseVolumeClaimTemplate:
    storageClassName: apic-mz-ibmc-block-gold-hourly
    volumeSize: 30Gi
  encryptionSecret: ptl-encryption-key
  imagePullSecrets:
  - apic-registry-secret
  imageRegistry: apic-dev-docker-local.artifactory.swg-devops.com
  license: production
  multiSiteHA:
    mode: active
    replicationEndpoint:
      annotations:
        cert-manager.io/issuer: ingress-issuer
      hosts:
      - name: ptl-replication.cvs.apic-2ha-dallas-144307-xxxx14539d6e75f74e950db7077951d4-0000.us-south.containers.appdomain.cloud
        secretName: ptl-replication
    tlsClient:
      secretName: ptl-replication-client
  overrides:
    microservices:
      db-remote:
        hostAliases:
        - hostnames:
          - ptl-replication.cvs.apic-2ha-frankfur-683198-xxxx14539d6e75f74e950db7077951d4-0000.eu-de.containers.appdomain.cloud
          ip: 10.123.133.xxx
      www:
        containers:
          admin:
            image: portal-admin:master-xxxxe9777692cf513bc8b366e5fa445e982bcbed-3766
          web:
            image: portal-web:master-xxxxe9777692cf513bc8b366e5fa445e982bcbed-3766
      www-remote:
        hostAliases:
        - hostnames:
          - ptl-replication.cvs.apic-2ha-frankfur-683198-xxxx14539d6e75f74e950db7077951d4-0000.eu-de.containers.appdomain.cloud
          ip: 10.123.133.xxx
  portalAdminEndpoint:
    annotations:
      cert-manager.io/issuer: ingress-issuer
    hosts:
    - name: api.portal.ha-demo.apic.test.appdomain.cloud
      secret: portal-admin
  portalAdminSecret: portal-admin-client
  portalUIEndpoint:
    annotations:
      cert-manager.io/issuer: ingress-issuer
    hosts:
    - name: portal.ha-demo.apic.test.appdomain.cloud
      secret: portal-web
  profile: Prod
  siteName: dallas
  webVolumeClaimTemplate:
    storageClassName: apic-mz-ibmc-block-gold-hourly
    volumeSize: 20Gi
Example of a PortalCluster CR file for a passive data center:
apiVersion: portal.apiconnect.mycompany.com/v1
kind: PortalCluster
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":[truncated]}
  creationTimestamp: "2020-04-09T18:28:30Z"
  generation: 9
  name: portal
  namespace: cvs
  resourceVersion: "7946061"
  selfLink: /apis/portal.apiconnect.mycompany.com/v1/namespaces/cvs/portalclusters/portal
  uid: xxxxe850-68b4-4077-ac15-746d08f802f9
spec:
  adminVolumeClaimTemplate:
    storageClassName: apic-mz-ibmc-block-gold-hourly
    volumeSize: 20Gi
  appVersion: 10.0.0.0
  backupVolumeClaimTemplate:
    storageClassName: apic-mz-ibmc-block-gold-hourly
    volumeSize: 30Gi
  databaseLogsVolumeClaimTemplate:
    storageClassName: apic-mz-ibmc-block-gold-hourly
    volumeSize: 12Gi
  databaseVolumeClaimTemplate:
    storageClassName: apic-mz-ibmc-block-gold-hourly
    volumeSize: 30Gi
  encryptionSecret: ptl-encryption-key
  imagePullSecrets:
  - apic-registry-secret
  imageRegistry: apic-dev-docker-local.artifactory.swg-devops.com
  license: production
  multiSiteHA:
    mode: passive
    replicationEndpoint:
      annotations:
        cert-manager.io/issuer: ingress-issuer
      hosts:
      - name: ptl-replication.cvs.apic-2ha-frankfur-683198-xxxx14539d6e75f74e950db7077951d4-0000.eu-de.containers.appdomain.cloud
        secretName: ptl-replication
    replicationPeerFQDN: ptl-replication.cvs.apic-2ha-dallas-144307-xxxx14539d6e75f74e950db7077951d4-0000.us-south.containers.appdomain.cloud
    tlsClient:
      secretName: ptl-replication-client
  overrides:
    microservices:
      db-remote:
        hostAliases:
        - hostnames:
          - ptl-replication.cvs.apic-2ha-dallas-144307-xxxx14539d6e75f74e950db7077951d4-0000.us-south.containers.appdomain.cloud
          ip: 10.208.75.xxx
      www:
        containers:
          admin:
            image: portal-admin:master-xxxxe9777692cf513bc8b366e5fa445e982bcbed-3766
          web:
            image: portal-web:master-xxxxe9777692cf513bc8b366e5fa445e982bcbed-3766
      www-remote:
        hostAliases:
        - hostnames:
          - ptl-replication.cvs.apic-2ha-dallas-144307-xxxx14539d6e75f74e950db7077951d4-0000.us-south.containers.appdomain.cloud
          ip: 10.208.75.xxx
  portalAdminEndpoint:
    annotations:
      cert-manager.io/issuer: ingress-issuer
    hosts:
    - name: api.portal.ha-demo.apic.test.appdomain.cloud
      secret: portal-admin
  portalAdminSecret: portal-admin-client
  portalUIEndpoint:
    annotations:
      cert-manager.io/issuer: ingress-issuer 
    hosts:
    - name: portal.ha-demo.apic.test.appdomain.cloud
      secret: portal-web
  profile: Prod
  siteName: frankfurt
  webVolumeClaimTemplate:
    storageClassName: apic-mz-ibmc-block-gold-hourly
    volumeSize: 20Gi
Example of the status section of a PortalCluster CR file for an active data center:

status:
  conditions:
  - lastTransitionTime: "2020-04-27T08:00:25Z"
    message: 4/6
    status: "False"
    type: Ready
  customImages: true
  dbCASecret: portal-db-ca
  encryptionSecret: ptl-encryption-key
  endpoints:
    portal-director: https://api.portal.ha-demo.cloud/
    portal-web: https://portal.ha-demo.cloud/
    replication: https://ptl-replication.cvs.apic-2ha-dallas-144307-xxxx14539d6e75f74e950db7077951d4-0000.us-south.containers.appdomain.cloud/
  localMultiSiteHAPeerConfigHash: UM9VtuWDdp/328n0oCl/GQGFzy34o1V9sH2OyXxKMHw= 
  haMode: progressing to active (ready for traffic)
  microServiceSecurity: custom
  multiSiteHA:
    dbPodScale: 3
    remoteSiteDeploymentName: portal
    remoteSiteName: frankfurt
    replicationPeer: https://ptl-replication.cvs.apic-2ha-frankfur-683198-xxxx14539d6e75f74e950db7077951d4-0000.eu-de.containers.appdomain.cloud/
    wwwPodScale: 3
  phase: 0
  serviceCASecret: portal-ca
  serviceClientSecret: portal-client
  serviceServerSecret: portal-server
  services:
    db: portal-dallas-db
    db-remote: portal-frankfurt-db
    nginx: portal-nginx
    tunnel: portal-tunnel
    www: portal-dallas-www
    www-remote: portal-frankfurt-www
  versions:
    available:
      channels:
      - name: "10"
      - name: "10.0"
      - name: 10.0.0
      - name: 10.0.0.0
      versions:
      - name: 10.0.0.0-1038
    reconciled: 10.0.0.0-1038
Meaning of the haMode for the active data center:
  • active - this is the active data center, or the only data center, and all pods are ready and accepting traffic.
  • progressing to active - this is the active data center, or the only data center, and there are no pods ready yet.
  • progressing to active (ready for traffic) - this is the active data center, or the only data center, and there is at least one pod of each type, www, db, and nginx, ready, so the data center is accepting traffic.
Example of the status section of a PortalCluster CR file for a passive data center:

status:
  conditions:
  - lastTransitionTime: "2020-04-21T10:54:54Z"
    message: 6/6
    status: "True"
    type: Ready
  customImages: true
  dbCASecret: portal-db-ca
  encryptionSecret: ptl-encryption-key
  endpoints:
    portal-director: https://api.portal.ha-demo.cloud/
    portal-web: https://portal.ha-demo.cloud/
    replication: https://ptl-replication.cvs.apic-2ha-frankfur-683198-xxxx14539d6e75f74e950db7077951d4-0000.eu-de.containers.appdomain.cloud/
  localMultiSiteHAPeerConfigHash: VguODE74TkiS3LCc5ytQiaF8100PXMHUrVBtb+PbKOg=
  haMode: progressing to passive (ready for traffic)
  microServiceSecurity: custom
  multiSiteHA:
    dbPodScale: 3
    remoteSiteDeploymentName: portal
    remoteSiteName: dallas
    replicationPeer: https://ptl-replication.cvs.apic-2ha-dallas-144307-xxxx14539d6e75f74e950db7077951d4-0000.us-south.containers.appdomain.cloud/
    wwwPodScale: 3
  phase: 0
  serviceCASecret: portal-ca
  serviceClientSecret: portal-client
  serviceServerSecret: portal-server
  services:
    db: portal-frankfurt-db
    db-remote: portal-dallas-db
    nginx: portal-nginx
    tunnel: portal-tunnel
    www: portal-frankfurt-www
    www-remote: portal-dallas-www
  versions:
    available:
      channels:
      - name: "10"
      - name: "10.0"
      - name: 10.0.0
      - name: 10.0.0.0
      versions:
      - name: 10.0.0.0-1038
    reconciled: 10.0.0.0-1038
Meaning of the haMode for the passive data center:
  • passive - this is the passive data center, and all pods are ready and accepting traffic.
  • progressing to passive - this is the passive data center, and there are no pods ready yet.
  • progressing to passive (ready for traffic) - this is the passive data center, and there is at least one pod of each type, www, db, and nginx, ready, so the data center is accepting traffic.