Before you start
IBM DB2 pureScale Feature for Enterprise Server Edition offers clustering technology that helps deliver high availability and exceptional scalability transparent to applications, and brings best-of-breed architecture to the distributed platform. The DB2 pureScale Feature enables the database to continue processing through unplanned outages and provides nearly unlimited capacity for any transactional workload. Scaling your system is simply a matter of connecting a host and issuing two simple commands. The cluster-based, shared-disk architecture of the DB2 pureScale Feature also helps reduce costs through efficient use of system resources.
The DB2 pureScale Feature combines several tightly integrated software components, which are installed
and configured automatically when you deploy the DB2 pureScale Feature. You interact with components
such as the DB2 cluster manager and DB2 cluster services through DB2 administration views and commands,
such as db2instance, db2icrt,
db2iupdt, and the db2cluster tool. The
db2cluster tool also provides options for troubleshooting and problem
determination. Additionally, the messages that are generated by the subsystems of the DB2 cluster
manager are an excellent source of information for problem determination. For example, the resource
managers of the resource classes utilized by DB2 cluster services each write status information to their
log files. The db2diag log files also provide useful information. Often, messages in the db2diag log
files explain the reason for a failure and give advice on how to resolve it.
DB2 cluster services is able to automatically handle the majority of run-time failures. However, there are specific types of failures that require you to take action to resolve the failures. For example, the power cord may become unplugged from the host or a network cable could get disconnected. If DB2 cluster services cannot resolve the failure automatically, then an alert field is set to notify the DBA that a problem has occurred that requires attention. DBAs can see the alert when they check the status of the DB2 instance, as shown later.
Understanding the DB2 pureScale Feature resource model
The Version 9.8 DB2 pureScale Feature resource model differs from the resource model utilized in a HA DB2 instance in Version 9.7 single partition and multi-partition database environments. For additional information on HA DB2 instances in DB2 versions prior to 9.8 DB2 pureScale Feature, please refer to the background information links in the Resources section at the end of the tutorial.
The new resource model implemented in Version 9.8 DB2 pureScale Feature is necessary to represent cluster caching facilities (CFs) and the shared clustered file system.
In a DB2 pureScale shared data instance, one CF fulfills the primary role, which contains the currently active data for the shared data instance. The second CF maintains a copy of pertinent information for immediate recovery of the primary role.
The new resource model allows IBM Tivoli® System Automation for Multiplatforms (Tivoli SA MP) to appropriately automate the movement of the primary role in case of failure of the primary CF node.
DB2 cluster services includes three major components:
- Cluster manager: Tivoli SA MP, which includes Reliable Scalable Cluster Technology (RSCT)
- Shared clustered file system: IBM General Parallel File System (GPFS)
- DB2 cluster administration: DB2 commands and administration views for managing and monitoring the cluster
Figure 1. DB2 Cluster services
DB2 cluster services provide essential infrastructure for the shared data instance to be highly available and to provide automatic failover and restart as soon as the instance has been created.
DB2 cluster elements are representations of entities that are monitored and whose status changes are managed by DB2 cluster services. For the purposes of this tutorial, we will address three types of DB2 cluster elements:
- Hosts: A host can be a physical machine, LPAR (Logical Partition of a physical machine), or a virtual machine.
- DB2 members: A DB2 member is the core processing engine and normally resides on its home host. The home host of a DB2 member is the host name that was provided as the member's location when the member was added to the DB2 shared data instance. A DB2 member has single home host. DB2 members can accept client connections only when they are running on their home host.
- Cluster caching facilities (CFs): The cluster caching facility (CF) is a software application managed by DB2 cluster services that provides internal operational services for a DB2 shared data instance.
There is not necessarily a one-to-one mapping between DB2 cluster elements and the underlying cluster manager resources and resource groups.
Understanding how the DB2 pureScale Feature automatically handles failure
When a failure occurs in the DB2 pureScale instance, DB2 cluster services automatically attempts to restart the failed resources. When and where the restart occurs depends on different factors, such as the type of resource that failed and the point in the resource life cycle at which the failure occurred.
If a software or hardware failure on a host causes a DB2 member to fail, DB2 cluster services automatically restarts the member. DB2 members can be restarted on either the same host (local restart) or if that fails, on a different host (member restart in restart light mode). Restarting a member on another host is called failover.
Member restart includes restarting failed DB2 processes and performing member crash recovery (undoing or reapplying log transactions) in order to roll back any 'in-flight' transactions and to free any locks held by them. Member restart also ensures that updated pages have been written to the CF.
When a member is restarted on a different host in restart light mode, minimal resources are used on the new host (which is the home host of another DB2 member). A member running in restart light mode does not process new transactions, because its sole purpose is to perform member crash recovery. The databases on the failed member are recovered to a point of consistency as quickly as possible. This enables other active members to access and change database objects that were locked by the abnormally terminated member. All in-flight transactions from the failed member are rolled back and all locks that were held at the time of the abnormal termination of the member are released. Although the member does not accept new transactions, it remains available for resolution of in-doubt transactions. When a DB2 member has failed-over to a new host, the total processing capability of the whole cluster is reduced temporarily. When the home host is active and available again, the DB2 member automatically fails back to the home host, and the DB2 member is restarted on its home host. The cluster's processing capability is restored as soon as the DB2 member has failed back and restarted on its home host. Transactions on all other DB2 members are not affected during the failback process.



