Built-in high availability feature for IBM Db2 Warehouse MPP deployments

For MPP deployments, Db2® Warehouse provides a built-in, out-of-the-box high availability (HA) feature that enables your data warehouse to continue to operate even after failures occur.

The HA capability is based on a heartbeat mechanism, automatic restart of services, and node failover. The heartbeat detects when a node, a database partition, or the web console is down, and the cluster manager takes the appropriate action. For instance, the cluster manager attempts to restart any failed data partitions or the web console. Figure 1 shows a Db2 Warehouse HA group in a healthy state. The file system is not a part of the HA group, so use whatever HA solution that is appropriate for the technology you’re using. Similarly, you should use a method such as a load balancer to make head node failures transparent to connected applications.
Figure 1. Steady state for HA group
Applications and web console connected to head node, with heartbeat monitoring HA group

If a data node fails and does not restart within the heartbeat interval, all services are stopped on that node. The data partitions (and their workload) that are assigned to that node are automatically redistributed across the surviving nodes in the cluster. There is no way to automatically reintegrate failed nodes; you must perform some manual steps to have a failed node rejoin the cluster.

If the head node fails and does not restart within the heartbeat interval, its data partitions are redistributed, and an election occurs. In the election, a new head node is selected from the first seven active data nodes in the cluster. As you can see in Figure 2, the web console is restarted on the new head node.
Figure 2. HA group after head node failover
Head node is down, so web console and applications connect to a new head node, and data partitions are reassigned.

After a head node failover, if the original head node becomes reachable again, restarting the system causes the original head node to become the current head node again.