IBM Support

LI72619: DEFUNCT PROCESS KEEP APPEARING ON LINUX / PPC AFTER DB2START

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as fixed if next.

Error description

  • defunct process keep appearing on Linux / PPC after
    db2start and goes away after db2stop.
    
    
         1-bluebear [dbguest4] $ uname -a
    Linux bluebear 2.4.21-278-pseries64 #1 SMP Mon Mar 7 08:52:23
    UTC 2005 ppc64 unknown
    1-bluebear [dbguest4] $ ps -ef | grep [d]efunct | grep dbguest4
    1-bluebear [dbguest4] $ strace -o out -f db2start
    SQL1063N DB2START processing was successful.
    
    < strace stucks there >
    
    <suspend strace here>
    1-bluebear [dbguest4] $ ps -ef | grep [d]efunct | grep dbguest4
    dbguest4 8299 8296 0 19:12 pts/0 00:00:00 [sh] <defunct>
    dbguest4 8310 8307 0 19:12 pts/0 00:00:00 [sh] <defunct>
    1-bluebear [dbguest4] $ ps -ef | grep -E "[8]296|[8]307"
    dbguest4 8296 1 0 19:12 pts/0 00:00:00 db2wdog
    dbguest4 8299 8296 0 19:12 pts/0 00:00:00 [sh] <defunct>
    dbguest4 8301 8296 0 19:12 pts/0 00:00:00 db2sysc
    dbguest4 8307 8301 0 19:12 pts/0 00:00:00 db2hmon
    ,0,0,0,1,0,0,0,1e014,2,0,1,9fe0,0x21000000,0x21000000,15fc000,25
    18801d,2,29cb0078
    dbguest4 8310 8307 0 19:12 pts/0 00:00:00 [sh] <defunct>
    dbguest4 8312 8307 0 19:12 pts/0 00:00:00 db2hmon
    ,0,0,0,1,0,0,0,1e014,2,0,1,9fe0,0x21000000,0x21000000,15fc000,25
    18801d,2,29cb0078
    
    
    Seems db2wdog process is waiting for db2sysc process to exit,
    thus it leaves PID 8299 hanging around:
    
    1-bluebear [dbguest4] $ grep "^8296 " out | tail
    8296 fcntl64(4, F_SETLK, {type=F_WRLCK, whence=SEEK_SET,
    start=0, len=0}) = 0
    8296 rt_sigprocmask(SIG_SETMASK, [INT ALRM CHLD RTMIN], NULL, 8)
    = 0
    8296 fstat64(4, {st_mode=S_IFREG|0666, st_size=52240, ...}) = 0
    8296 _llseek(4, 0, [52240], SEEK_END) = 0
    8296 _llseek(4, 0, [52240], SEEK_CUR) = 0
    8296 write(4, "\n2007-09-26-19.12.46.232155-240 "..., 347) = 347
    8296 close(4) = 0
    8296 rt_sigprocmask(SIG_SETMASK, [INT RTMIN], NULL, 8) = 0
    8296 rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
    8296 wait4(8301, <unfinished ...>
    
    Not sure about db2hmon process (PID 8307), it is not stuck
    waiting for some PID,
    1-bluebear [dbguest4] $ grep "^8307 " out | grep -i wait
    8307 wait4(8308, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL)
    = 8308
    1-bluebear [dbguest4] $
    
    After a db2stop in another telnet session, all defunct process
    are gone
    
    1-bluebear [dbguest4] $ j
    [1] + 8279 Stopped strace -o out -f db2start
    1-bluebear [dbguest4] $ fg
    strace -o out -f db2start
    1-bluebear [dbguest4] $ ps -ef | grep [d]efunct | grep dbguest4
    1-bluebear [dbguest4] $
    
    But strace does show those defunct PIDs, 8299 and 8310, got
    waited:
    
    1-bluebear [dbguest4] $ grep "^8296 " out | grep -i wait
    8296 wait4(8297, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL)
    = 8297
    8296 wait4(8301, <unfinished ...>
    8296 <... wait4 resumed> [WIFEXITED(s) && WEXITSTATUS(s) == 0],
    WUNTRACED, NULL) = 8301
    1-bluebear [dbguest4] $ grep "^8307 " out | grep -i wait
    8307 wait4(8308, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL)
    = 8308
    8307 wait4(8312, NULL, __WCLONE, NULL) = -1 ECHILD (No child
    processes)
    1-bluebear [dbguest4] $
    
    Seems we missed wait() calls to those child PIDs.
    

Local fix

  • No inpact at all because of this.
     defunct process is only leaved in OS process table
    and doesn't  consume resources.
    

Problem summary

  • Please see above.
    

Problem conclusion

Temporary fix

  • N/A
    

Comments

APAR Information

  • APAR number

    LI72619

  • Reported component name

    DB2 UDB ESE LIN

  • Reported component ID

    5765F4106

  • Reported release

    820

  • Status

    CLOSED FIN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2007-10-29

  • Closed date

    2009-06-04

  • Last modified date

    2009-06-04

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

  • R910 PSY

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"820","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
04 June 2009