Maintaining a two data center deployment

How to maintain a two data center disaster recovery (DR) deployment on Kubernetes and OpenShift, including information about normal operation, failure handling, and recovery.

The following information explains how the data flows between the two API Connect subsystems, the API Manager and the Developer Portal, when the two data center DR deployment is running in normal operational mode. Both subsystems have a different internal topology and operate in a slightly different way, and therefore are described separately.

For a general overview of how endpoints and the traffic between ingresses flow in API Connect, see Deployment overview for endpoints and certificates.

For general information about two data center disaster recovery in API Connect, see A two data center deployment strategy on Kubernetes and OpenShift.

Normal operation for the API Manager

In normal operation each data center contains an API Manager service which includes one primary database pod, and some (usually two) secondary pods. As well as the cluster replication that occurs within the data center from the primary to the secondary pods, there is also a Publish/Subscribe logical replication of changes from the active data center to the passive one, as shown in the following diagram.

Diagram of normal operation for the API Manager.

API Manager data flows

The following data flows exist to and from the API Manager.

Data traffic from the API Manager to the DataPower® gateway director to publish APIs to the gateway via webhooks, as well as to update the gateway with consumer subscriptions, credentials, and so on. This connection does not use mutual TLS (mTLS) by default, but it can be enabled; see TLS profiles overview for more information.
Data traffic from the API Manager to the Developer Portal director endpoint. This traffic contains both REST calls to create and delete Portal sites, and also webhook traffic to send the Portal updated data when changes are made in API Manager. This connection is secured by mTLS.
Incoming data traffic to the consumer API endpoint of the API Manager, in order to carry out consumer actions such as login authentication, create a new application, subscribe to a product, and so on. These calls could come from the Portal, or from direct REST calls made by a consumer or a custom application.
Incoming data traffic to the platform API endpoint of the API Manager. These calls could come from the Portal, the gateway, and possibly from custom applications.
Incoming web UI data traffic to the Cloud and Manager UI endpoints of the API Manager.

Normal operation for the Developer Portal

An API Connect deployment can have multiple Developer Portal services, each of which can be deployed to the primary data center, or the secondary data center, or to both. Obviously if the Developer Portal service is only deployed to a single data center, there is no disaster recovery strategy. So, on the assumption that the Portal service is deployed to both data centers, then there will be multiple Portal pods in each data center. The ideal configuration is to have two or three Portal pods in each data center; any more than three and the costs of data replication outweigh any benefits in additional load handling.

The Developer Portal cluster replication has two components: a database and a filesystem. These components are independent, and both replicate in a star model whereby each pod replicates changes to every other pod. The database also has the added advantage of supporting segment replication, which can be used to optimize the replication across data centers. However, the filesystem replication, which uses CSync2, does not currently support segments.

As all Developer Portal pods are active primary servers, there is no need to be concerned if any writes occur in the passive data center. If any traffic does get routed to the passive data center, this traffic is replicated back to the active data center, and data consistency is maintained.

The following diagram gives an overview of the normal operation for the Developer Portal.

Diagram of normal operation for the Developer Portal.

Developer Portal data flows

The following data flows exist to and from the Developer Portal.

Data traffic from the API Manager to the Developer Portal director endpoint. This traffic contains both REST calls to create and delete Portal sites, and also webhook traffic to send the Portal updated data when changes are made in API Manager. This connection is secured by mTLS.
Data traffic from the Developer Portal to the consumer API endpoint of the API Manager, in order to carry out consumer actions such as login authentication, create a new application, subscribe to a product, and so on.
Data traffic from the Developer Portal to the Analytics client endpoint, in order to display the Portal consumer analytics. However, if there is more than one analytics service in use on a Catalog, then Portal consumer analytics is automatically disabled.
There is no data traffic from the Developer Portal to the DataPower Gateway, or vice versa.
Incoming web UI data traffic to the Developer Portal on the Portal UI endpoint. This data is user traffic from the browsers of Developer Portal consumers.
All calls between the API Manager and the Developer Portal (in either direction), must go via a dynamic router in order to allow for either service to have independent failover between the data centers.

See the following topics for information about failure handling and recovery.