High availability for a Db2 Warehouse MPP deployment
For an MPP deployment, Db2® Warehouse provides high availability, offering you the ability to have your data warehouse carry on with its activities if failures occur.
The HA solution is based on a heartbeat mechanism, automatic restart of services, and node
failover. The heartbeat detects when a node, a database partition, or the web console is down, and
the cluster manager takes the appropriate action. For instance, the cluster manager attempts to
restart any failed data partitions or the web console. Figure 1 shows a Db2 Warehouse HA group in a healthy
state. The file system is not a part of the HA group, so use whatever HA solution that is
appropriate for the technology you are using. Similarly, you can use a method such as a load
balancer to make head node failures not apparent to connected applications.
If a data node fails and does not restart within the heartbeat interval, all services are stopped on that node. The data partitions (and their workload) that are assigned to that node are automatically redistributed across the surviving nodes in the cluster. There is no way to automatically reintegrate failed nodes; you must perform some manual steps to have a failed node rejoin the cluster.
If the head node fails and does not restart within the heartbeat interval, its data partitions
are redistributed, and an election occurs. In the election, a new head node is selected from the
first seven active data nodes in the cluster. As you can see in Figure 2, the web console is restarted on the new
head node.
After a head node failover, if the original head node becomes reachable again, restart the system for the original head node to become the current head node again.