APAR status
Closed as fixed if next.
Error description
defunct process keep appearing on Linux / PPC after db2start and goes away after db2stop. 1-bluebear [dbguest4] $ uname -a Linux bluebear 2.4.21-278-pseries64 #1 SMP Mon Mar 7 08:52:23 UTC 2005 ppc64 unknown 1-bluebear [dbguest4] $ ps -ef | grep [d]efunct | grep dbguest4 1-bluebear [dbguest4] $ strace -o out -f db2start SQL1063N DB2START processing was successful. < strace stucks there > <suspend strace here> 1-bluebear [dbguest4] $ ps -ef | grep [d]efunct | grep dbguest4 dbguest4 8299 8296 0 19:12 pts/0 00:00:00 [sh] <defunct> dbguest4 8310 8307 0 19:12 pts/0 00:00:00 [sh] <defunct> 1-bluebear [dbguest4] $ ps -ef | grep -E "[8]296|[8]307" dbguest4 8296 1 0 19:12 pts/0 00:00:00 db2wdog dbguest4 8299 8296 0 19:12 pts/0 00:00:00 [sh] <defunct> dbguest4 8301 8296 0 19:12 pts/0 00:00:00 db2sysc dbguest4 8307 8301 0 19:12 pts/0 00:00:00 db2hmon ,0,0,0,1,0,0,0,1e014,2,0,1,9fe0,0x21000000,0x21000000,15fc000,25 18801d,2,29cb0078 dbguest4 8310 8307 0 19:12 pts/0 00:00:00 [sh] <defunct> dbguest4 8312 8307 0 19:12 pts/0 00:00:00 db2hmon ,0,0,0,1,0,0,0,1e014,2,0,1,9fe0,0x21000000,0x21000000,15fc000,25 18801d,2,29cb0078 Seems db2wdog process is waiting for db2sysc process to exit, thus it leaves PID 8299 hanging around: 1-bluebear [dbguest4] $ grep "^8296 " out | tail 8296 fcntl64(4, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 8296 rt_sigprocmask(SIG_SETMASK, [INT ALRM CHLD RTMIN], NULL, 8) = 0 8296 fstat64(4, {st_mode=S_IFREG|0666, st_size=52240, ...}) = 0 8296 _llseek(4, 0, [52240], SEEK_END) = 0 8296 _llseek(4, 0, [52240], SEEK_CUR) = 0 8296 write(4, "\n2007-09-26-19.12.46.232155-240 "..., 347) = 347 8296 close(4) = 0 8296 rt_sigprocmask(SIG_SETMASK, [INT RTMIN], NULL, 8) = 0 8296 rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0 8296 wait4(8301, <unfinished ...> Not sure about db2hmon process (PID 8307), it is not stuck waiting for some PID, 1-bluebear [dbguest4] $ grep "^8307 " out | grep -i wait 8307 wait4(8308, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 8308 1-bluebear [dbguest4] $ After a db2stop in another telnet session, all defunct process are gone 1-bluebear [dbguest4] $ j [1] + 8279 Stopped strace -o out -f db2start 1-bluebear [dbguest4] $ fg strace -o out -f db2start 1-bluebear [dbguest4] $ ps -ef | grep [d]efunct | grep dbguest4 1-bluebear [dbguest4] $ But strace does show those defunct PIDs, 8299 and 8310, got waited: 1-bluebear [dbguest4] $ grep "^8296 " out | grep -i wait 8296 wait4(8297, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 8297 8296 wait4(8301, <unfinished ...> 8296 <... wait4 resumed> [WIFEXITED(s) && WEXITSTATUS(s) == 0], WUNTRACED, NULL) = 8301 1-bluebear [dbguest4] $ grep "^8307 " out | grep -i wait 8307 wait4(8308, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 8308 8307 wait4(8312, NULL, __WCLONE, NULL) = -1 ECHILD (No child processes) 1-bluebear [dbguest4] $ Seems we missed wait() calls to those child PIDs.
Local fix
No inpact at all because of this. defunct process is only leaved in OS process table and doesn't consume resources.
Problem summary
Please see above.
Problem conclusion
Temporary fix
N/A
Comments
APAR Information
APAR number
LI72619
Reported component name
DB2 UDB ESE LIN
Reported component ID
5765F4106
Reported release
820
Status
CLOSED FIN
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2007-10-29
Closed date
2009-06-04
Last modified date
2009-06-04
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Applicable component levels
R910 PSY
UP
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"820","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
04 June 2009