Collecting diagnostic data for specific performance problems

Collecting diagnostic data for performance problems that is as problem-specific as possible helps you avoid the impact of collecting the full set of performance-related diagnostic data, which is especially important on a system that is already resource constrained. The performance problems that you can collect specific diagnostic data for must be related to processor usage, memory or database connections.

Collecting diagnostic data for these problems can start once you observe that one or more of the problem symptoms exists. To collect the data required to diagnose the problem, use the db2fodc command together with one of the -connections, -cpu, or -memory parameters.

For problems that occur only sometimes, you can also create a threshold rule with the db2fodc -detect command that will detect a specific problem condition and start the collection of diagnostic data if the trigger conditions you specify are exceeded.

Symptoms

Depending on the specific problem that exists, you might observe some of the following symptoms on your data server:
  • For performance problems related to database connections, you might observe sudden spikes in the number of application in the executing or compiling state or new database connections are being denied.
  • For performance problems related to processor usage, you might observe high processor utilization rates, a high number of running processes, or high processor wait times.
  • For performance problems related to memory usage, you might observe that no free memory is available, that swap space is being used at a high rate, that excessive paging is taking place, or you might suspect a memory leak.

Diagnosing the problem

To check for application connections to the database, you can issue the db2pd -applications command. Apart from determining how many connections to the database exist, the Status field can tell you if the applications are in the executing or compiling state. If the number of applications connected to the database continues to grow over time without ever becoming lower, or if the number of application connections spikes suddenly, you might have a problem related to connections are that needs to be diagnosed further.

On UNIX and Linux® operating systems, you can check whether you have a potential problem related to processor usage by issuing the vmstat command with the appropriate parameters for the specific operating system to return the user processor usage value (us). You can also check the values returned for sy to obtain system processor usage and the combined user and system processor usage information. You can issue the vmstat command several times to arrive at a more accurate conclusion about whether the problem symptoms are persisting over time. If you notice that the us value is frequently greater than 90% then this indicates that you might have a problem related to processor usage that needs to be diagnosed further. On Windows operating systems, you can use the db2pd -vmstat command to obtain similar information.

To check if you have a potential memory-related problem, you can issue the appropriate operating system command that returns the used swap space value (on Linux operating systems you can use the free command, for example). If the used swap space value is consistently greater than 90% then you might have a memory-related performance problem.

Resolving the problem

The following examples use the db2fodc command together with the -connection, -cpu, -memory, and -detect parameters. You can run these commands yourself, but you need to be aware of the potential impact: on a very resource constrained system even very problem-specific diagnostic data collection can place additional demands on the system that might not be acceptable. It is preferable to run these commands under the guidance of IBM support. Typically, the collected data is sent to IBM technical support for analysis.
  • To collect diagnostic data for problems related to database connections, issue the following command:
    db2fodc -connections
    This command collects connection-related diagnostic data and places it in a FODC_Connections_timestamp_member directory, where timestamp is the time when the db2fodc -connections command was executed and member is the member or members the collection was performed for.
  • To collect diagnostic data for problems related to processor usage when you are already observing related problem symptoms, you can issue the following command:
    db2fodc -cpu
    The db2fodc -cpu command collects processor-related diagnostic data and places it in a FODC_Cpu_timestamp_member directory, where timestamp is the time when the db2fodc -connections command was executed and member is the member or members the collection was performed for. As an alternative when the problem is intermittent or if you want to configure your system ahead of time to collect diagnostic data when a specific problem condition exists, you can issue a variation of the following command:
    db2fodc -cpu basic -detect us">=90" rqueue"<=1000" condition="and" interval="2" sleeptime="500" triggercount="3" iteration="4" duration="500" -member all
    The -detect parameter with the threshold rules specified delays the collection of processor-related information until the trigger conditions specified by the threshold rule are detected. You can specify your own rules for the -detect parameter to determine when to start diagnostic data collection. In the previous example, conditions for user processor usage and the run queue must both be met three times over the course of three iterations. This means that the trigger conditions must exist for 6 seconds total to trigger diagnostic data collection on all members (a trigger count of 3 x 2 second intervals = a trigger condition that must exist for 6 seconds). The iteration option specifies that trigger condition detection followed by diagnostic data collection is performed tree times, with a sleep time of 500 seconds between each iteration.
  • To collect diagnostic data for problems related to memory usage when you are already observing related problem symptoms, you can issue the following command:
    db2fodc -memory
    The db2fodc -memory command collects memory-related diagnostic data and places it in the FODC_Memory_timestamp_member, where timestamp is the time when the db2fodc -connections command was executed and member is the member or members the collection was performed for. As an alternative when the problem is intermittent or if you want to configure your system ahead of time to collect diagnostic data when a specific problem condition exists, you can issue a variation of the following command:
    db2fodc -memory basic -detect free"<=10" connections">=1000" sleeptime="1" iteration="10" interval="10" triggercount="4" duration="5" -member 3
    The -detect parameter with the rules specified delays collection until the rules are detected. In this example, the trigger condition for free memory and the number of connections to the database must exist for 40 seconds to trigger diagnostic data collection on member 3 (trigger count of 4 x 10 second intervals = 40 seconds total). Ten iterations of detection and diagnostic data collection can be performed, enabled over a duration of 5 hours.
  • To only detect trigger conditions but not to perform diagnostic data collection, you can use the db2fodc -detect command together with the nocollect option. An entry is logged in the db2diag log files anytime the problem condition specified is detected as a threshold hit. If you choose not to collect diagnostic data, then, for detailed threshold hits, the db2diag log entry must be used to determine if a problem condition that you created a threshold rule for was detected.
    2011-05-31-15.53.42.639270-240 I2341E790 LEVEL: Event
    PID : 13279 TID : 47188095859472PROC : db2fodc
    INSTANCE: kunxu NODE : 000
    FUNCTION: Db2, RAS/PD component, pdFodcDetectAndRunCollection, probe:100
    CHANGE :
    
    Hostname: hotel36 Member(s): 0 Iteration: 0
    Thresholds hit 0: cs(0)>=0 us(0)<=50 rQueue(0)>=0 free(7980936)>=100 avm(7297432)>=1 swapused(0)>=0 UOW_Executing(4)>=3
    Thresholds hit 1: cs(0)>=0 us(0)<=50 rQueue(0)>=0 free(7981000)>=100 avm(7297528)>=1 swapused(0)>=0 UOW_Executing(4)>=3
    Thresholds hit 2: cs(0)>=0 us(0)<=50 rQueue(0)>=0 bQueue(1)>=1 free(7981420)>=100 avm(7297596)>=1 swapused(0)>=0 UOW_Executing(4)>=3
    Thresholds hit 3: cs(0)>=0 us(0)<=50 rQueue(0)>=0 free(7981436)>=100 avm(7297668)>=1 swapused(0)>=0 UOW_Executing(4)>=3
    In this example, the problem condition specified is detected as a threshold hit four times on member 0. Both the values you specified for each threshold rule and the actual values detected are logged.