Troubleshooting
Problem
Symptom
The key entry in db2diag.log looks like :
-------------------------------------------------------------------------------------------------------------------------------------------------------
2010-10-09-01.06.16.347313+660 E13087E552 LEVEL: Severe
PID : 1120 TID : 46912711420224PROC : db2wdog 0
INSTANCE: db2inst1 NODE : 000
EDUID : 2 EDUNAME: db2wdog 0
FUNCTION: DB2 UDB, base sys utilities, sqleWatchDog, probe:20
MESSAGE : ADM0503C An unexpected internal processing error has occurred.
ALL DB2 PROCESSES ASSOCIATED WITH THIS INSTANCE HAVE BEEN SHUTDOWN.
Diagnostic information has been recorded. Contact IBM Support for further assistance.
2010-10-09-01.06.17.119332+660 E13640E422 LEVEL: Error
PID : 1120 TID : 46912711420224 PROC : db2wdog 0
INSTANCE: db2inst1 NODE : 000
EDUID : 2 EDUNAME: db2wdog 0
FUNCTION: DB2 UDB, base sys utilities, sqleWatchDog, probe:21
DATA #1 : Process ID, 4 bytes
1122
DATA #2 : Hexdump, 8 bytes
0x00002AAAB77FC378 : 0201 0000 0900 0000
-------------------------------------------------------------------------------------------------------------------------------------------------------
Cause
The Linux kernel has an interesting way of dealing with memory exhaustion, and it comes in the way of the Linux OOM (Out-Of-Memory) killer. When invoked, the OOM killer will begin terminating processes in order to free up enough memory to keep the system operational. In this scenario, OOM Killed process 1126 (db2sysc). This occurs because all available memory, including disk swap space, has been allocated and can be verified using 'free' command.
Environment
- This issue only occurs for DB2 running on supported Linux platforms.
Diagnosing The Problem
The footprints of OOM killer can be seen in the operating system error log /var/log/messages or dmesg command. Out of memory condition : all available memory, including disk swap space, has been allocated.
Below is the example, snip from /var/log/messages:
-------------------------------------------------------------------------------------------------------------------------------------------------------
Oct 9 01:06:09 lqportdb1 kernel: db2sysc invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Oct 9 01:06:09 lqportdb1 kernel: Call Trace: <ffffffff8015d94e>{oom_kill_process+87}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff8015dd82>{out_of_memory+299} <ffffffff8015f96b>{__alloc_pages+600}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff801612b8>{__do_page_cache_readahead+265} <ffffffff80137446>{del_timer_sync+12}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff802e4431>{schedule_timeout+146} <ffffffff88012c00>{:dm_mod:dm_any_congested+61}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff8015d079>{filemap_nopage+336} <ffffffff8016b068>{__handle_mm_fault+830}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff801455d8>{lock_hrtimer_base+37} <ffffffff802e78bf>{do_page_fault+2919}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff802e4870>{schedule_hrtimer+49} <ffffffff8014580d>{hrtimer_nanosleep+130}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff8010a883>{error_exit+0}
.
.
Oct 9 01:06:09 lqportdb1 kernel: Free swap = 0kB
Oct 9 01:06:09 lqportdb1 kernel: Total swap = 4194296kB
Oct 9 01:06:09 lqportdb1 kernel: Free swap: 0kB
Oct 9 01:06:09 lqportdb1 kernel: 2099200 pages of RAM
Oct 9 01:06:09 lqportdb1 kernel: 41113 reserved pages
Oct 9 01:06:09 lqportdb1 kernel: 69027 pages shared
Oct 9 01:06:09 lqportdb1 kernel: 191 pages swap cached
Oct 9 01:06:09 lqportdb1 kernel: Out of Memory: Kill process 1121 (db2syscr) score 79429 and children.
Oct 9 01:06:09 lqportdb1 kernel: Out of memory: Killed process 1126 (db2sysc).
-------------------------------------------------------------------------------------------------------------------------------------------------------
Resolving The Problem
If there is no evidence of OOM-Killer, troubleshoot further by using one of Linux tools below to find the process issuing SIGKILL.
a) auditctl
OR
b) Install and configure Redhat Linux SystemTap
Then run this script SYSTEMTAP: KILL() [WHO KILLED MY PROCESS?]
#! /usr/bin/env stap
/*
* signal2.st: Track sender of SIGKILL to a given process.
*
* Run as user 'root' using the following command line:
*
* stap -o signal2.out signal2.st
*
* dalla
*/
probe syscall.kill
{
if (sig == 9) {
printf("[%s - %d - %d] sent SIGKILL to pid %d\n",
execname(), pid(), tid(), pid);
}
}
Sample Run
# stap -o signal2.out signal2.st
$ ps -elf|grep db2inst1
4 S root 28185 28125 0 80 0 - 56581 poll_s 13:53 pts/1 00:00:00 sudo su - db2inst1
4 S root 28202 28185 0 80 0 - 55144 do_wai 13:53 pts/1 00:00:00 su - db2inst1
4 S db2inst1 28203 28202 0 80 0 - 29113 do_wai 13:53 pts/1 00:00:00 -bash
4 S root 28317 1 2 80 0 - 330940 futex_ 13:53 pts/1 00:00:00 db2wdog 0 [db2inst1]
4 S db2inst1 28319 28317 4 80 0 - 413578 futex_ 13:53 pts/1 00:00:00 db2sysc 0
(removed some output)
$ kill -9 28319
From another session, stap output signal2.out shows PID 28329 was terminated
[sshd - 27830 - 27830] sent SIGKILL to pid 27833
[dbus-daemon - 769 - 769] sent SIGKILL to pid 28147
[bash - 28203 - 28203] sent SIGKILL to pid 28319
[db2syscr - 28317 - 28318] sent SIGKILL to pid 28339
[db2syscr - 28317 - 28318] sent SIGKILL to pid 28329
Was this topic helpful?
Document Information
Modified date:
04 January 2019
UID
swg21449871