Outage prevention
Learn about the solutions that are available to prevent planned and unplanned outages.
Comparison of solutions
Capability | Full cloud stack | Hybrid |
---|---|---|
High Availability Disaster Recovery (HADR) | No | Yes Setting up high availability disaster recovery in a hybrid deployment |
Geo-redundancy | Yes (as a technology preview in a full cloud deployment) Full cloud geo-redundant deployments Note: Geo-redundancy is deprecated in version
1.6.13.
|
Yes (as a technology preview in a hybrid cloud deployment) Hybrid geo-redundant deployments |
Geo-redundancy
In a geo-redundant deployment, the primary deployment is located on one Red Hat® OpenShift® Container Platform cluster and the secondary deployment is located on a different cluster. The individual Cassandra data centers in each deployment are replicated to synchronize the event and topology data across the clusters. Geo-redundancy is available only as a technology preview in a full or hybrid cloud deployment. For more information, see Full cloud geo-redundant deployments and Hybrid geo-redundant deployments.
High availability disaster recovery
A high availability disaster recovery (HADR) hybrid deployment is composed of cloud native components on Red Hat OpenShift Container Platform along with an on-premises installation that has multiple IBM® Netcool®/OMNIbus WebGUI instances.
The on-premises WebGUI or DASH servers can be set up with load balancing by using an HTTP Server that balances the on-premises UI load. If the primary WebGUI fails, then the user is routed to the backup WebGUI seamlessly.
For disaster recovery, automatic and manual failover and failback between Netcool Operations Insight® deployments is supported. If the primary ObjectServer fails, then the secondary ObjectServer takes over. In a HADR hybrid deployment, only cloud native analytics policies get pushed to the backup cluster through the backup or restore pods. No event and topology data is synchronized across the Cassandra instances, as the Cassandra instances do not communicate with each other.
- Supporting continuous grouping of events between two hybrid deployments.
- Allowing more than one WebGUI instance to connect to the same hybrid deployment.
- Supporting automatic and manual failover and failback between deployments.
- Backup and restore of cloud native analytics policies.
On-premises WebGUI access is through the HTTP load balancer. The HTTP load balancer enables high availability by distributing the workload among the WebGUI instances.
DASH is set up to use single sign-on (SSO) with the ObjectServer as the repository to store the OAuth tokens. Public-private key pairs on each DASH instance confirm the validity of the LTPA tokens.
ObjectServer traffic flows between the on-premises aggregation ObjectServer and the WebGUI instances. The traffic includes UI configuration metadata, authentication, and event data.
The console integration with the on-premises HTTP load balancer is updated by the active deployment. At any one time, either the primary or the backup cloud deployment updates the console integration.
Certificate authority (CA) signed certificates allow communication between the WebGUI instances. These
CA signed certificates are loaded in to the HTTP load balancer. CA signed certificates are also
added to the user-certificates
configmap. The common UI services load the CA signed
certificates from the configmap for the cluster connection to the HTTP load balancer.
The HAproxy directs users to the currently active deployment. The cloud NOI UI components query the HAproxy to determine the OAuth token for the associated Web GUI instance.
The coordinator service in the backup deployment tries to connect to the coordinator service in the primary deployment through the HAproxy, to determine the state of the primary deployment. If the primary coordinator service is not reachable, the backup coordinator service does the failover.