Problem Determination: Traps

Traps are a specific type of crash, a term used to broadly describe any situation in which Db2® stops after encountering an unexpected condition. A trap occurs when a thread receives a signal or exception as the result of an instruction that cannot be executed by the operating system. For example, invalid memory access or a stack overflow could cause a trap. A trap is a specific term that should not be confused with "panic", "shutdown", "stop", or the more generic term "crash".

A trap can fall into two categories:

  1. "Instance crash," in which the entire database manager (DBM) shuts down. When an instance crash occurs, all database connections are terminated, processing halts completely, and the Db2 engine processes disappear. If db2start needs to be run, you are experiencing an instance crash.
  2. "Database crash," in which only a specific database shuts down. All connections to that database are terminated, but the Db2 engine and other databases continue functioning normally.
Common errors associated with database crashes include:
  • SQL1224N - The database manager refuses new requests, has terminated all requests in progress or has terminated your particular request due to a problem with it.
  • SQL1032N - No start database manager command was issued.
The below example used to cause a trap in older versions of Db2:
$ db2 "connect to sample"
SQL authorization ID = DB2INST1
Local database alias = SAMPLE
$ db2 "alter tablespace IBMDB2SAMPLEREL managed by automatic storage"
DB20000I The SQL command completed successfully.

$ db2 "alter tablespace IBMDB2SAMPLEREL lower high water mark" DB20000I The SQL command completed successfully.

$ db2 "list tables"
SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032

Determining the place of failure

Db2 consists of two main component areas:
  1. Program executable files, usually containing basic functionality.
  2. Application libraries, which make up the majority of the code.

Db2 executables and libraries call C runtime libraries, the helper libraries that allow the developer to utilize standard functionality and APIs. This code is not owned or created by Db2 development.

Each component (executable or library) is contained within its own address space, assigned by the operating system.

To determine the place of failure, you will need to know the name of the program or library you are executing in, the address range in which the executable or program has been loaded, and the offset at which you are executing, relative to the beginning of the library.

Trap files

A trap file is a "snapshot" of a Db2 EDU. The file reflects the Db2 EDU state at the moment the trap file is generated. A trap file will be generated automatically if an exception occurs and processing is forced to stop.

On Unix/Linux, trap files are text-based and use the following naming convention:
<pid>.<eduid>.<node>.trap.txt 
On Windows, trap files are binary and can be converted to text with the formatting utility (db2xprt). Windows trap files use the following naming convention:
<pid>.<tid>.trap.bin

Some of the contents contained in trap files include:

  • Db2 build date
  • Db2 version number
  • Operating system version
  • Time of the dump
  • Signal or exception which generated the dump
  • Process and thread ID
  • Loaded libraries (commonly referred to as the “map”)
  • Address of signal handlers
  • Register dumps
  • Call stack – a detailed call stack
  • A dump of the operating system memory sets
  • Latch information for the EDU
  • Locks being waited on
  • Assembly code dump

Pay special attention to the signal or exception which generated the dump, process and thread ID, register dumps, and call stack.

Reading the call stack

A call stack is a sequence of Db2 function or routine calls, leading to the moment of the outage. Reading the stack from the bottom up can help you determine where the trap occurred. Here’s an example call stack:
0 strcpy – Function on top of stack

1 sqlbObjFileName – Potential problematic function to begin searching on

2 sqlbSMSOpenContainer

3 sqlbSMSGetOpenInfo

4 sqlbSMSDeleteObject

5 sqldDropObj

6 sqldDropTable

7 sqlbPFPrefetcherEntryPoint

8 sqloCreateEDU

9 sqloRunGDS

10 sqloInitEDUServices

11 sqloRunInstance

12 DB2main

13 main

The most recent call on this stack is strcpy, indicating that the problem occurs here. In this case, the issue is most likely tied to the caller directly below it, sqlbObjFileName.
Common Trap Signals/Exceptions
Unix/Linux
UNIX/Linux Signal ID Description
SIGILL(4), SIGFPE(8), SIGTRAP(5), SIGBUS(10, Linux: 7), SIGSEGV(11), SIGKILL(9) Instance trap. Bad programming, HW errors, invalid memory access, stack and heap collisions, problems with vendor libraries, OS problems. The instance shuts down.
Windows
Windows Exception Description
ACCESS_VIOLATION - (0xC0000005), ILLEGAL_INSTRUCTION - (0xC000001D), INTEGER_DIVIDE_BY_ZERO - (0xC0000094), PRIVILEGED_INSTRUCTION - (0xC0000096), STACK_OVERFLOW - (0xC00000FD) Instance trap. Bad programming, HW errors, invalid memory access, stack overflows, problems with vendor libraries, OS problems. The instance shuts down.

Prefixes

Db2 routines use a prefix that points to the area the routine belongs to:
Prefix Description
sql, squ Backup and Restore
sqb Buffer Pool Services: buffer pools, data storage management, table spaces, containers, I/O, prefetching, page cleaning
sqf Configuration - database, database manager, configuration settings
sqd, sqdx, sqdl Data Management Services: tables, records, long field and lob columns, REORG TABLE utility
sqp, sqdz Data Protection Services: logging, crash recovery, rollforward
hdr High Availability Disaster Recovery (HADR)
sqx Index Manager
sqrl Catalog Cache and Catalog Services
sqng Code Generation (SQL Compiler)
squ, sqi, squs, sqs Load, Sort, Import, Export
sqpl Locking
sqno, sqnx, sqdes Optimizer
sqo, sqz, oss Operating System Services: AIX, Linux

Core files

For UNIX-Based systems, when Db2 terminates abnormally, the operating system generates a core file. The core file includes most or all Db2 memory allocations, which you may need for problem analysis. By default, Db2 core files are located in the following path: $HOME/sqllib/db2dump/<core_directory>.

If the core file ulimit parameter is set to unlimited, Db2 will override this with a smaller number unless instructed otherwise (with DB2FODC). This behavior prevents filling up the file system if an outage happens and a core needs to be generated.

For Windows systems, the core file is called a process (mini) dump. Process dumps can be configured at the operating system level or by using advanced debug techniques (such as ADPlus, WinDbg, or Userdump).

DB2SLEEP

For problems you can reproduce, it can be useful to “freeze” the instance while the problem is occurring. Use the DB2SLEEP command to achieve this effect:
db2set DB2SLEEP=ON
To resume the sleeping instance, use this command:
db2pdcfg -wakeupinstance

When enabled, DB2SLEEP suspends the instance after creating the FODC package, meaning the problematic process/EDU will still exist. When the instance is frozen using DB2SLEEP, you will not be able to execute any SQL commands or establish new connections to the database. However, you can collect snapshots with the db2pd command, or attach to the sleeping instance using a debugger.

Debuggers

These frequently used debug commands can help you with Db2 issues:
Action dbx gdb Windbg
Attach to process dbx [-a pid] prog [core] gdb [prog[core|procID]] windbg [-p pid | -z core | prog]
Call stack where bt, where kb, kp, kd
Registers registers info registers r
Loaded libraries map info sharedlibraries lm
Running threads thread info threads ~
Switch thread thread <tid> thread <tid> ~ <tid>
Switch frame frame <tid> frame <tid> .frame <tid>
Examine memory x <addr>/<fmt> x/<fmt> <addr> dw, db, dc <addr>
Disassemble listi <addr> disas <addr> u <addr>
Print expression print <exp> print <exp> ? <exp>