Implementing a high availability solution does not prevent
hardware or software failures. However, having redundant systems
and a failover mechanism enables your solution to detect and respond
to failures, and reroute workload so that user applications are still
able to do work.
Procedure
When a failure occurs, your database solution must do
the following:
- Detect the failure.
Failover software
can use heartbeat monitoring to confirm the availability of system
components. A heartbeat monitor listens for regular communication
from all the components of the system. If the heartbeat monitor stops
hearing from a component, the heartbeat monitor signals to the system
that the component has failed.
- Respond to the failure: failover.
- Identify, bring online, and initialize a secondary component
to take over operations for the failed component.
- Reroute workload to the secondary component.
- Remove the failed component from the system.
- Recover from the failure.
If a primary
database server fails, the first priority is to redirect clients to
an alternate server or to failover to a standby database so that client
applications can do their work with as little interruption as possible.
Once that failover succeeds, you must repair whatever went wrong
on the failed database server so that is can be reintegrate it back
into the solution. Repairing the failed database server may just
mean restarting it.
- Return to normal operations.
Once the
failed database system is repaired, you must integrate it back into
the database solution. You could reintegrate a failed primary database
as the standby database for the database that took over as the primary
database when the failure occurred. You could also force the repaired
database server to take over as the primary database server again.
What to do next
Db2® database can
perform some of these steps for you. For example:
The Db2 High
Availability Disaster Recovery (HADR) heartbeat monitor element,
hadr_heartbeat, can detect when a primary database has failed.
Db2 client
reroute can transfer workload from a failed database server to a secondary one.
The Db2 fault
monitor can restart a database instance that terminates unexpectedly.