Combining Db2 database and OS diagnostics

Diagnosing some problems related to memory, swap files, CPU, disk storage, and other resources requires a thorough understanding of how a given operating system manages these resources. At a minimum, defining resource-related problems requires knowing how much of that resource exists, and what resource limits might exist per user. (The relevant limits are typically for the user ID of the Db2 instance owner.)

Here is some of the important configuration information that you must obtain:

  • Operating system patch level, installed software, and upgrade history
  • Number of CPUs
  • Amount of RAM
  • Swap and file cache settings
  • User data and file resource limits and per user process limit
  • IPC resource limits (message queues, shared memory segments, semaphores)
  • Type of disk storage
  • What else is the machine used for? Does your Db2 server compete for resources?
  • Where does authentication occur?

Most platforms have straightforward commands for retrieving resource information. However, you will rarely be required to obtain that information manually, since the db2support utility collects this data and much more. The detailed_system_info.html file produced by db2support (when the options -s and -m are specified) contains the syntax for many of the operating system commands used to collect this information.

The following exercises are intended to help you discover system configuration and user environment information in various Db2 diagnostic files. The first exercise familiarizes you with the steps involved in running the db2support utility. Subsequent exercises cover trap files, which provide more generated data about the Db2 server that can be useful in understanding the user environment and resource limits.

Exercise 1: Running the db2support command

  1. Start the Db2 instance with the db2start command.
  2. Assuming you already have the SAMPLE database available, create a directory for storing the output from db2support.
  3. Change to that directory and issue:
    db2support <directory> -d sample -s -m
  4. Review the console output, especially the types of information that are collected.

    You should see output like this (when run on Windows):

    ...
    Collecting "System files"
         "db2cache.prf"
         "db2cos9402136.0"
         "db2cos9402840.0"
         "db2dbamr.prf"
         "db2diag.bak"
         "db2eventlog.000"
         "db2misc.prf"
         "db2nodes.cfg"
         "db2profile.bat"
         "db2systm"
         "db2tools.prf"
         "HealthRulesV82.reg"
         "db2dasdiag.log"
         ...
    Collecting "Detailed operating system and hardware information"
    Collecting "System resource info (disk, CPU, memory)"
    Collecting "Operating system and level"
    Collecting "JDK Level"
    Collecting "Db2 Release Info"
    Collecting "Db2 install path info"
    Collecting "Registry info"
    ...
    Creating final output archive
         "db2support.html"
         "db2_sqllib_directory.txt"
         "detailed_system_info.html"
         "db2supp_system.zip"
         "dbm_detailed.supp_cfg"
         "db2diag.log"
    db2support is now complete.
     An archive file has been produced: "db2support.zip"
  5. Now use a Web browser to view the detailed_system_info.html file. On each of your systems, identify the following information:
    • Number of CPUs
    • Operating system level
    • User environment
    • User resource limits (UNIX ulimit command)

Exercise 2: Locating environment information in a Db2 trap file

  1. Ensure a Db2 instance is started, then issue
    db2pd -stack all

    The call stacks are placed in files in the diagnostic directory (as defined by the diagpath database manager configuration parameter).

  2. Locate the following in one of the trap files:
    • Db2 code level
    • Data seg top (this is the maximum private address space that has been required)
    • Cur data size (this is the maximum private address space limit)
    • Cur core size (this is the maximum core file limit)
    • Signal Handlers (this information might not appear in all trap files)
    • Environment variables (this information might not appear in all trap files)
    • map output (shows loaded libraries)

Example trap file from Windows (truncated):

...
<DB2TrapFile version="1.0">
<Trap>
<Header>
Db2 build information: Db2 v9.7.800.683 n130210 SQL09078
timestamp: 2013-03-15-10.32.37.894000
uname: S:Windows
comment: IP23428
process id: 7224
thread id: 6032
</Header>
<SystemInformation>
Number of Processors: 2
Processor Type: AMD64 Family 6 Model 44 Stepping 2
OS Version: Microsoft Windows Longhorn, Service Pack 1 (6.1)
Current Build: 7601
</SystemInformation>
<MemoryInformation>
<Usage>
Physical Memory:    8191 total,    2545 free.
Virtual Memory : 8388607 total, 8387728 free.
Paging File    :   16381 total,   11030 free.
Ext. Virtual   :       0 free.
</Usage>
</MemoryInformation>
<EnvironmentVariables>
M![CDATA[
[e] DB2PATH=D:\SQLLIB
[n] DB2INSTPROF=C:\ProgramData\IBM\DB2\db2build
[g] DB2_EXTSECURITY=YES
[g] DB2_COMMON_APP_DATA_PATH=C:\ProgramData
[g] DB2SYSTEM=JTANG
[g] DB2PATH=D:\SQLLIB
[g] DB2INSTDEF=DB2
[g] DB2ADMINSERVER=DB2DAS00
]]></EnvironmentVariables>

Correlating Db2 and system events or errors

System messages and error logs are too often ignored. You can save hours, days, and even weeks on the time it takes to solve a problem if you take the time to perform one simple task at the initial stage of problem definition and investigation. That task is to compare entries in different logs and take note of any that appear to be related both in time and in terms of what resource the entries are referring to.

While not always relevant to problem diagnosis, in many cases the best clue is readily available in the system logs. If you can correlate a reported system problem with Db2 errors, you will have often identified what is directly causing the Db2 symptom. Obvious examples are disk errors, network errors, and hardware errors. Not so obvious are problems reported on different machines, for example domain controllers which can affect connection time or authentication.

System logs can be investigated in order to assess stability, especially when problems are reported on brand new systems. Intermittent traps occurring in common applications can be a sign that there is an underlying hardware problem.

Here is some other information provided by system logs.

  • Significant events such as when the system was rebooted
  • Chronology of Db2 traps on the system (and errors, traps, or exceptions from other software that is failing)
  • Kernel panics, out-of-filesystem-space, and out-of-swap-space errors (which can prevent the system from creating or forking a new process)

System logs can help to rule out crash entries in the db2diag log files as causes for concern. If you see a crash entry in Db2 administration notification or Db2 diagnostic logs with no preceding errors, the Db2 crash recovery is likely a result of a system shutdown.

This principle of correlating information extends to logs from any source and to any identifiable user symptoms. For example, it can be very useful to identify and document correlating entries from another application's log even if you can't fully interpret them.

The summation of this information is a very complete understanding of your server and of all of the varied events which are occurring at the time of the problem.