Built-in high availability feature for IBM Db2 Warehouse MPP deployments
For MPP deployments, Db2® Warehouse provides a built-in, out-of-the-box high availability (HA) feature that enables your data warehouse to continue to operate even after failures occur.
The HA capability is based on a heartbeat mechanism, automatic restart of services, and node
failover. The heartbeat detects when a node, a database partition, or the web console is down, and
the cluster manager takes the appropriate action. For instance, the cluster manager attempts to
restart any failed data partitions or the web console. Figure 1 shows a Db2 Warehouse HA group in a
healthy state. The file system is not a part of the HA group, so use whatever HA solution that is
appropriate for the technology you’re using. Similarly, you should use a method such as a load
balancer to make head node failures transparent to connected applications.
If a data node fails and does not restart within the heartbeat interval, all services are stopped on that node. The data partitions (and their workload) that are assigned to that node are automatically redistributed across the surviving nodes in the cluster. There is no way to automatically reintegrate failed nodes; you must perform some manual steps to have a failed node rejoin the cluster.
If the head node fails and does not restart within the heartbeat interval, its data partitions
are redistributed, and an election occurs. In the election, a new head node is selected from the
first seven active data nodes in the cluster. As you can see in Figure 2, the web console is restarted on the new
head node.
After a head node failover, if the original head node becomes reachable again, restarting the system causes the original head node to become the current head node again.