High availability disaster recovery

Learn how to prevent unplanned outages. High availability disaster recovery (HADR) is supported for a hybrid deployment. It is not supported in a full cloud deployment.

To download a PDF file of the hybrid HADR deployment process, click here.

HADR features

A HADR hybrid deployment is composed of cloud native components on Red Hat® OpenShift® Container Platform along with an on-premises installation that has multiple IBM® Netcool®/OMNIbus WebGUI instances.

The on-premises WebGUI or DASH servers can be set up with load balancing by using an HTTP Server that balances the on-premises UI load. If the primary WebGUI fails, then the user is routed to the backup WebGUI seamlessly.

For disaster recovery, automatic and manual failover and failback between Netcool Operations Insight® deployments is supported. If the primary ObjectServer fails, then the secondary ObjectServer takes over. In a HADR hybrid deployment, only cloud native analytics policies get pushed to the backup cluster through the backup or restore pods. No event and topology data is synchronized across the Cassandra instances, as the Cassandra instances do not communicate with each other.

HADR features include:
  • Supporting continuous grouping of events between two hybrid deployments.
  • Allowing more than one WebGUI instance to connect to the same hybrid deployment.
  • Supporting automatic and manual failover and failback between deployments.
  • Backup and restore of cloud native analytics policies.
A general overview of the HADR architecture is presented in Figure 1. For more information, see Setting up high availability disaster recovery in a hybrid deployment.
Figure 1. HADR architecture on a Netcool Operations Insight hybrid deployment
HADR architecture diagram

On-premises WebGUI access is through the HTTP load balancer. The HTTP load balancer enables high availability by distributing the workload among the WebGUI instances.

DASH is set up to use single sign-on (SSO) with the ObjectServer as the repository to store the OAuth tokens. Public-private key pairs on each DASH instance confirm the validity of the LTPA tokens.

ObjectServer traffic flows between the on-premises aggregation ObjectServer and the WebGUI instances. The traffic includes UI configuration metadata, authentication, and event data.

The console integration with the on-premises HTTP load balancer is updated by the active deployment. At any one time, either the primary or the backup cloud deployment updates the console integration.

Certificate authority (CA) signed certificates allow communication between the WebGUI instances. These CA signed certificates are loaded in to the HTTP load balancer. CA signed certificates are also added to the user-certificates configmap. The common UI services load the CA signed certificates from the configmap for the cluster connection to the HTTP load balancer.

The HAproxy directs users to the currently active deployment. The cloud NOI UI components query the HAproxy to determine the OAuth token for the associated Web GUI instance.

The coordinator service in the backup deployment tries to connect to the coordinator service in the primary deployment through the HAproxy, to determine the state of the primary deployment. If the primary coordinator service is not reachable, the backup coordinator service does the failover.