Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Solving problems in the DB2 pureScale cluster services environment

A systematic approach to diagnostic information gathering and problem determination

Oleg Tyschenko (tyschenko@ie.ibm.com), DB2 pureScale Development Team Leader, IBM
Photo of Oleg Tyschenko
Oleg Tyschenko leads the DB2 engine development team effort in Ireland and is a member of the DB2 High Availability team for DB2 pureScale development. He has more than fifteen years technical and engineering experience in information management and related technologies. He is currently involved in a number of DB2 pureScale deployment and proof-of-concept projects across Europe. His areas of expertise include the DB2 pureScale Feature, DB2 high availability and cluster management, Tivoli SA for Multiplatforms, and GPFS deployment. Oleg holds a degree in computer science and a MBA degree from the Warwick Business School (UK).
Massimiliano Gallo (gallomas@ie.ibm.com), DB2 Functional Test Engineer, IBM
Photo of Massimiliano Gallo
Massimiliano Gallo is a senior member of the DB2 Functional Verification Test team in Dublin and has worked on DB2 pureScale testing for the last several years. He has more than 10 years experience in software development and product testing, working on a broad range of technologies. His areas of expertise include the DB2 pureScale Feature, DB2 high availability and cluster management, and Tivoli SA for Multiplatform. Massimiliano holds a M.Sc. in Aerospace Engineering from the Sapienza University of Rome.

Summary:  This tutorial guides DBAs and system administrators in problem determination for IBM® DB2® pureScale® cluster services. As you deploy IBM DB2 pureScale Feature for DB2 Enterprise Server Edition systems into production, you need to acquire appropriate problem determination skills. This tutorial provides information about gathering diagnostic information when failures occur, and provides additional information to aid in understanding the tightly integrated subcomponents of the DB2 pureScale Feature, such as the Cluster Caching Facility (CF), General Parallel File System (GPFS), Reliable Scalable Cluster Technology (RSCT), and IBM Tivoli Systems Automation for Multiplatforms (Tivoli SA MP).

Date:  18 Aug 2011
Level:  Intermediate PDF:  A4 and Letter (145 KB | 33 pages)Get Adobe® Reader®

Activity:  24453 views
Comments:  

Before you start

Introduction

IBM DB2 pureScale Feature for Enterprise Server Edition offers clustering technology that helps deliver high availability and exceptional scalability transparent to applications, and brings best-of-breed architecture to the distributed platform. The DB2 pureScale Feature enables the database to continue processing through unplanned outages and provides nearly unlimited capacity for any transactional workload. Scaling your system is simply a matter of connecting a host and issuing two simple commands. The cluster-based, shared-disk architecture of the DB2 pureScale Feature also helps reduce costs through efficient use of system resources.

The DB2 pureScale Feature combines several tightly integrated software components, which are installed and configured automatically when you deploy the DB2 pureScale Feature. You interact with components such as the DB2 cluster manager and DB2 cluster services through DB2 administration views and commands, such as db2instance, db2icrt, db2iupdt, and the db2cluster tool. The db2cluster tool also provides options for troubleshooting and problem determination. Additionally, the messages that are generated by the subsystems of the DB2 cluster manager are an excellent source of information for problem determination. For example, the resource managers of the resource classes utilized by DB2 cluster services each write status information to their log files. The db2diag log files also provide useful information. Often, messages in the db2diag log files explain the reason for a failure and give advice on how to resolve it.

DB2 cluster services is able to automatically handle the majority of run-time failures. However, there are specific types of failures that require you to take action to resolve the failures. For example, the power cord may become unplugged from the host or a network cable could get disconnected. If DB2 cluster services cannot resolve the failure automatically, then an alert field is set to notify the DBA that a problem has occurred that requires attention. DBAs can see the alert when they check the status of the DB2 instance, as shown later.

Understanding the DB2 pureScale Feature resource model

The Version 9.8 DB2 pureScale Feature resource model differs from the resource model utilized in a HA DB2 instance in Version 9.7 single partition and multi-partition database environments. For additional information on HA DB2 instances in DB2 versions prior to 9.8 DB2 pureScale Feature, please refer to the background information links in the Resources section at the end of the tutorial.

The new resource model implemented in Version 9.8 DB2 pureScale Feature is necessary to represent cluster caching facilities (CFs) and the shared clustered file system.

In a DB2 pureScale shared data instance, one CF fulfills the primary role, which contains the currently active data for the shared data instance. The second CF maintains a copy of pertinent information for immediate recovery of the primary role.

The new resource model allows IBM Tivoli® System Automation for Multiplatforms (Tivoli SA MP) to appropriately automate the movement of the primary role in case of failure of the primary CF node.

DB2 cluster services includes three major components:

  • Cluster manager: Tivoli SA MP, which includes Reliable Scalable Cluster Technology (RSCT)
  • Shared clustered file system: IBM General Parallel File System (GPFS)
  • DB2 cluster administration: DB2 commands and administration views for managing and monitoring the cluster

Figure 1. DB2 Cluster services
Diagram shows client         workstations connected to DB2 data server, which has Primary CF, Secondary CF,         members, DB2 cluster services, and shared file system

DB2 cluster services provide essential infrastructure for the shared data instance to be highly available and to provide automatic failover and restart as soon as the instance has been created.

DB2 cluster elements are representations of entities that are monitored and whose status changes are managed by DB2 cluster services. For the purposes of this tutorial, we will address three types of DB2 cluster elements:

  • Hosts: A host can be a physical machine, LPAR (Logical Partition of a physical machine), or a virtual machine.
  • DB2 members: A DB2 member is the core processing engine and normally resides on its home host. The home host of a DB2 member is the host name that was provided as the member's location when the member was added to the DB2 shared data instance. A DB2 member has single home host. DB2 members can accept client connections only when they are running on their home host.
  • Cluster caching facilities (CFs): The cluster caching facility (CF) is a software application managed by DB2 cluster services that provides internal operational services for a DB2 shared data instance.

There is not necessarily a one-to-one mapping between DB2 cluster elements and the underlying cluster manager resources and resource groups.

Understanding how the DB2 pureScale Feature automatically handles failure

When a failure occurs in the DB2 pureScale instance, DB2 cluster services automatically attempts to restart the failed resources. When and where the restart occurs depends on different factors, such as the type of resource that failed and the point in the resource life cycle at which the failure occurred.

If a software or hardware failure on a host causes a DB2 member to fail, DB2 cluster services automatically restarts the member. DB2 members can be restarted on either the same host (local restart) or if that fails, on a different host (member restart in restart light mode). Restarting a member on another host is called failover.

Member restart includes restarting failed DB2 processes and performing member crash recovery (undoing or reapplying log transactions) in order to roll back any 'in-flight' transactions and to free any locks held by them. Member restart also ensures that updated pages have been written to the CF.

When a member is restarted on a different host in restart light mode, minimal resources are used on the new host (which is the home host of another DB2 member). A member running in restart light mode does not process new transactions, because its sole purpose is to perform member crash recovery. The databases on the failed member are recovered to a point of consistency as quickly as possible. This enables other active members to access and change database objects that were locked by the abnormally terminated member. All in-flight transactions from the failed member are rolled back and all locks that were held at the time of the abnormal termination of the member are released. Although the member does not accept new transactions, it remains available for resolution of in-doubt transactions. When a DB2 member has failed-over to a new host, the total processing capability of the whole cluster is reduced temporarily. When the home host is active and available again, the DB2 member automatically fails back to the home host, and the DB2 member is restarted on its home host. The cluster's processing capability is restored as soon as the DB2 member has failed back and restarted on its home host. Transactions on all other DB2 members are not affected during the failback process.

1 of 11 | Next

Comments



Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Tivoli
ArticleID=752744
TutorialTitle=Solving problems in the DB2 pureScale cluster services environment
publish-date=08182011
author1-email=tyschenko@ie.ibm.com
author1-email-cc=
author2-email=gallomas@ie.ibm.com
author2-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.