Problem Determination: Traps
Traps are a specific type of crash, a term used to broadly describe any situation in which Db2® stops after encountering an unexpected condition. A trap occurs when a thread receives a signal or exception as the result of an instruction that cannot be executed by the operating system. For example, invalid memory access or a stack overflow could cause a trap. A trap is a specific term that should not be confused with "panic", "shutdown", "stop", or the more generic term "crash".
A trap can fall into two categories:
- "Instance crash," in which the entire database manager (DBM) shuts down. When an instance crash occurs, all database connections are terminated, processing halts completely, and the Db2 engine processes disappear. If db2start needs to be run, you are experiencing an instance crash.
- "Database crash," in which only a specific database shuts down. All connections to that database are terminated, but the Db2 engine and other databases continue functioning normally.
- SQL1224N - The database manager refuses new requests, has terminated all requests in progress or has terminated your particular request due to a problem with it.
- SQL1032N - No start database manager command was issued.
$ db2 "connect to sample"
SQL authorization ID = DB2INST1
Local database alias = SAMPLE
$ db2 "alter tablespace IBMDB2SAMPLEREL managed by automatic storage"
DB20000I The SQL command completed successfully.
$ db2 "alter tablespace IBMDB2SAMPLEREL lower high water mark"
DB20000I The SQL command completed successfully.
$ db2 "list tables"
SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032
Determining the place of failure
- Program executable files, usually containing basic functionality.
- Application libraries, which make up the majority of the code.
Db2 executables and libraries call C runtime libraries, the helper libraries that allow the developer to utilize standard functionality and APIs. This code is not owned or created by Db2 development.
Each component (executable or library) is contained within its own address space, assigned by the operating system.
To determine the place of failure, you will need to know the name of the program or library you are executing in, the address range in which the executable or program has been loaded, and the offset at which you are executing, relative to the beginning of the library.
Trap files
A trap file is a "snapshot" of a Db2 EDU. The file reflects the Db2 EDU state at the moment the trap file is generated. A trap file will be generated automatically if an exception occurs and processing is forced to stop.
<pid>.<eduid>.<node>.trap.txt
<pid>.<tid>.trap.bin
Some of the contents contained in trap files include:
- Db2 build date
- Db2 version number
- Operating system version
- Time of the dump
- Signal or exception which generated the dump
- Process and thread ID
- Loaded libraries (commonly referred to as the “map”)
- Address of signal handlers
- Register dumps
- Call stack – a detailed call stack
- A dump of the operating system memory sets
- Latch information for the EDU
- Locks being waited on
- Assembly code dump
Pay special attention to the signal or exception which generated the dump, process and thread ID, register dumps, and call stack.
Reading the call stack
0 strcpy – Function on top of stack
1 sqlbObjFileName – Potential problematic function to begin searching on
2 sqlbSMSOpenContainer
3 sqlbSMSGetOpenInfo
4 sqlbSMSDeleteObject
5 sqldDropObj
6 sqldDropTable
7 sqlbPFPrefetcherEntryPoint
8 sqloCreateEDU
9 sqloRunGDS
10 sqloInitEDUServices
11 sqloRunInstance
12 DB2main
13 main
The most recent call on this stack is strcpy, indicating that the problem
occurs here. In this case, the issue is most likely tied to the caller directly below it,
sqlbObjFileName.- Unix/Linux
-
UNIX/Linux Signal ID Description SIGILL(4), SIGFPE(8), SIGTRAP(5), SIGBUS(10, Linux: 7), SIGSEGV(11), SIGKILL(9) Instance trap. Bad programming, HW errors, invalid memory access, stack and heap collisions, problems with vendor libraries, OS problems. The instance shuts down. - Windows
-
Windows Exception Description ACCESS_VIOLATION - (0xC0000005), ILLEGAL_INSTRUCTION - (0xC000001D), INTEGER_DIVIDE_BY_ZERO - (0xC0000094), PRIVILEGED_INSTRUCTION - (0xC0000096), STACK_OVERFLOW - (0xC00000FD) Instance trap. Bad programming, HW errors, invalid memory access, stack overflows, problems with vendor libraries, OS problems. The instance shuts down.
Prefixes
Prefix | Description |
---|---|
sql, squ | Backup and Restore |
sqb | Buffer Pool Services: buffer pools, data storage management, table spaces, containers, I/O, prefetching, page cleaning |
sqf | Configuration - database, database manager, configuration settings |
sqd, sqdx, sqdl | Data Management Services: tables, records, long field and lob columns, REORG TABLE utility |
sqp, sqdz | Data Protection Services: logging, crash recovery, rollforward |
hdr | High Availability Disaster Recovery (HADR) |
sqx | Index Manager |
sqrl | Catalog Cache and Catalog Services |
sqng | Code Generation (SQL Compiler) |
squ, sqi, squs, sqs | Load, Sort, Import, Export |
sqpl | Locking |
sqno, sqnx, sqdes | Optimizer |
sqo, sqz, oss | Operating System Services: AIX, Linux |
Core files
For UNIX-Based systems, when Db2 terminates
abnormally, the operating system generates a core file. The core file includes most or all Db2 memory allocations,
which you may need for problem analysis. By default, Db2 core files are
located in the following path: $HOME/sqllib/db2dump/<core_directory>
.
If the core file ulimit parameter is set to unlimited, Db2 will override this with a smaller number unless instructed otherwise (with DB2FODC). This behavior prevents filling up the file system if an outage happens and a core needs to be generated.
For Windows systems, the core file is called a process (mini) dump. Process dumps can be configured at the operating system level or by using advanced debug techniques (such as ADPlus, WinDbg, or Userdump).
DB2SLEEP
For problems you can reproduce, it can be useful to “freeze” the instance while the problem is occurring. Use the DB2SLEEP command to achieve this effect:db2set DB2SLEEP=ON
db2pdcfg -wakeupinstance
When enabled, DB2SLEEP suspends the instance after creating the FODC package, meaning the problematic process/EDU will still exist. When the instance is frozen using DB2SLEEP, you will not be able to execute any SQL commands or establish new connections to the database. However, you can collect snapshots with the db2pd command, or attach to the sleeping instance using a debugger.
Debuggers
Action | dbx | gdb | Windbg |
---|---|---|---|
Attach to process | dbx [-a pid] prog [core] | gdb [prog[core|procID]] | windbg [-p pid | -z core | prog] |
Call stack | where | bt, where | kb, kp, kd |
Registers | registers | info registers | r |
Loaded libraries | map | info sharedlibraries | lm |
Running threads | thread | info threads | ~ |
Switch thread | thread <tid> | thread <tid> | ~ <tid> |
Switch frame | frame <tid> | frame <tid> | .frame <tid> |
Examine memory | x <addr>/<fmt> | x/<fmt> <addr> | dw, db, dc <addr> |
Disassemble | listi <addr> | disas <addr> | u <addr> |
Print expression | print <exp> | print <exp> | ? <exp> |