Member restart is the process of restarting
the database server processes on a failed member and
performing member crash
recovery on each database that requires it.
If a software or hardware failure
on a host causes a
member to
fail,
DB2® cluster
services detects
the failure and automatically restarts the
member.
The
member restart
can either be a
local restart, meaning that it is restarted
on its original host (
home host) or a
restart light,
meaning it is restarted on a different host:
- Local restart
- If a member fails
because of a software failure but the member's
home host is still active, DB2 cluster
services attempts
a local restart. The local member restart uses a reduced memory model
to ensure the in-flight data is recovered to a consistent state quickly.
Once the in-flight data has been recovered, the database's normal
memory model is then initialized for full transaction processing.
- Restart light
- If a member's
home host is inactive or if a local restart attempt fails, the member is
automatically restarted as a guest member on another member's
home host using a minimal amount of resources. A member running in
restart light mode does not process new transactions because its sole
purpose is to perform member crash
recovery.
Multiple
member failures
are generally recoverable with independent concurrent
member restart recoveries, and, as a result,
a group restart is not usually required. The database continues to
remain open for access through other surviving
members.
Only data that was in-flight on the failed
members is
unavailable for the duration of the member restarts.
Member crash recovery
Member crash recovery is responsible for the
rolling back of incomplete transactions and the completing of committed
transactions as a part of member restart. This will ensure that the
database data modified by this member is brought to a consistent state.
If indoubt transactions exist at the end of member crash recovery,
the member will be made available to resolve them.
Member crash recovery will be performed when
a viable cluster caching facility is
still available and the database on a member is activated and it is
determined that the member's log stream is inconsistent on disk
due to an abnormal termination. In most cases, member crash recovery
will be automatically invoked through member restart and the automatic
recovery agent. The automatic recovery agent that is started when
the instance is brought up, takes action when it detects that a database
on the member is inconsistent.
Once member crash recovery
of a database completes, the member will be able to accept incoming
connection requests from other applications if that member was restarted
on its original host.