High availability overview
Cloud Pak System provides a high availability framework to eliminate single points of failure and provide peer to peer failover for multiple Platform System Managers.
High availability for management nodes
The high availability of the management node is a preconfigured feature that relies on the primary election model. Each management node that can be a primary is a candidate to become the primary management node. Candidacy for a management node is based on the eligibility of the Platform System Manager virtual machine election. There should only be one primary management node at a time. By default, the primary management node is the device that is powered on first. When a primary management node is established, the primary and secondary constantly communicate with each other to ensure that one of them is the primary management node. If a primary management node cannot be detected, one of the devices assumes the primary role that is based on the Platform System Manager election model. When a management node becomes the primary it is responsible for managing the primary Platform System Manager virtual machine.
Management node availability
- Redundant hardware, such as networking, storage, compute, and power supplies.
- No single points of failure for cloud groups with active high availability containing two management nodes.
- Virtual machine instances remain available during system maintenance updates or hardware failures, leveraging reserved capacity and mobility actions within the system.
- Additional capacity can be added and used with no service interruption.