Two data center warm-standby deployment strategy on VMware

An overview of the two data center disaster recovery deployment strategy in API Connect.

The two data center disaster recovery warm-standby deployment configuration (referred to as 2DCDR) provides continuous replication of the management and portal subsystem databases to a warm-standby deployment in another data center.
Important: The 2DCDR deployment adds complexity to the API Connect installation procedures, and all subsequent maintenance procedures such as backup, restore, and upgrade. Before you decide to use the 2DCDR deployment strategy, review the Multiple data center deployment strategies to see whether any of the alternatives are better suited to your requirements.

Key points of the two data center disaster recovery (DR) solution

  • Two data center DR is an active/warm-standby deployment for the management and portal subsystems, and must use manual failover.
  • Two DataPower® Gateway subsystems must be deployed to provide high availability for the gateway service. However, this scenario doesn't provide high availability for the analytics service.
  • If high availability is required for the analytics service, two analytics subsystems must be configured, one per gateway subsystem, but with this configuration Developer Portal analytics isn't possible.
  • Data consistency is prioritized over data availability.
  • Latency between the two data centers must be less than 80 ms.
  • Replication of the management database is asynchronous, so it is possible that the most recent updates do not transfer to the warm-standby data center if there is an active data center failure.
  • Replication of the portal database is synchronous, and therefore the latency is limited to 80 ms or less.
  • The management and the portal subsystems in the two data centers must use the same deployment profile.
  • It is not possible to use the Automated API behavior testing application (Installing the Automated API behavior testing application) in a two data center disaster recovery configuration.

Deployment architecture

A two data center deployment model is optimized for data consistency ahead of data availability, and must use manual failover when a fault occurs. For high availability of Management and Portal subsystems ensure that you use a three replica deployment profile. For more information on deployment profiles, see: Planning your deployment topology and profiles.

To achieve high availability for the DataPower Gateway, you must deploy two gateway subsystems. One subsystem in the active data center, and a separate subsystem in the warm-standby data center. Publish all Products and APIs to both gateway subsystems. The gateway subsystems are independent, and so are insulated if an issue occurs in one of them. A global dynamic router can then be used to route traffic to one gateway subsystem or the other. If high availability is also required for the analytics service, two analytics subsystems must be configured, one per gateway subsystem, but with this configuration Developer Portal analytics isn't possible.

Dynamic routing
Note: A dynamic routing device, such as a load balancer, is required to route traffic to either data center. However, neither this device nor its configuration is part of the API Connect offering. Contact IBM Services if you require assistance with configuring a dynamic router.

The dynamic router configuration must handle the traffic between the subsystems and between the data centers. For example, the consumer API calls from the portal subsystem to the management subsystem must transfer through a dynamic router so that the portal can use a single endpoint regardless of which data center the management subsystem is active in. The same is needed for calls to the Platform and Admin APIs from the other subsystems, as well as for incoming UI traffic for the portal UI, Cloud Manager UI, and API Manager UI.

The dynamic router must support SSL passthrough, so that it routes the Mutual TLS (mTLS) connections between the management subsystem and the portal, and between the management subsystem and the DataPower Gateway. The router should not do TLS termination, it should do layer 4 based routing by using SNI.

If you do not want to use mTLS between API Connect subsystems, you can Enable JWT security instead of mTLS.

Service status

When a failure occurs, it is common practice to display an interstitial system outage web page. The dynamic router can be configured to display this web page when there is a failure, while the manual failover to the warm-standby data center is taking place.

Deployment profiles

Both one replica and three replica deployment profiles can be used with two data center DR, but they must be the same at each data center. For more information about the deployment profiles, see VMware deployment overview and requirements.

For more information about a two data center DR deployment, see the following topics:
VMware