IBM Support

IC89548: STALLED WDISERVICE TERMINATION MAY RESULT IN WDISERVER TERMINATING ON AIX

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Environment:  AIX  WDI 3.3
    
    WDIServer process crashes/coredumps when it tries to terminate
    slow translators. It seems to get confused and ends up
    terminating itself.
    
    This is excerpt from WDIServer_Out showing the WDIServer
    crashing:
    
    19:37:28 16449646:(D) WDISystemMonitor::Health Check
                    Trigger:state/action/reports:Active/CarryOn/2
    19:37:28 16449646:(D) WDIQResourceMgr::checking for terminated
    children
    19:37:28 16449646:(D) WDIQResourceMgr:: Obs     task    state
    action
    Q       rpts
    19:37:28 16449646:(D) WDIQResourceMgr:: 32      10      Slow
    CarryOn
    NOBAT_T 156952
    19:37:28 16449646:(D) WDIQResourceMgr:: 26      14      Slow
    CarryOn
    PLAININ 291415
    19:37:28 16449646:(D) WDIQResourceMgr:: 6       107     Slow
    Shutdown        EDIIN   0
    19:37:28 16449646:(I) WDIResourceManager::slow queue monitor
    task/process/WDITransCmdQ open:107/61931694/0
    19:37:28 16449646:(D) WDIResourceManager::slow
    obs./timeout/cycle
    time*count:6/60000/2500*5
    19:37:28 16449646:(I) 19:37:28 16449646:(D)
    WDIStatus(0/110)::change
    action from/to: CarryOn/Shutdown
    19:37:28 16449646:(E) WDIServer:: terminating from signal/pid
    (6/16449646)
    19:40:18 51839030:(I) ------- Trigger message loaded -----
    19:40:18 51839030:(I) QName       =EDIIN
    19:40:18 51839030:(I) ProcessName =WDI.TRANSLATOR.PROC
    19:40:18 51839030:(I) EnvData     =NumThreads(1) Timeout(10000)
    
    19:40:18 51839030:(I) TriggerData =NumThreads(5) Timeout(60000)
    applid(NOSTORE)
    19:40:18 51839030:(I) UserData    =
    
    19:40:18 51839030:(I) ------- End of trigger message ----------
    19:40:18 51839030:(I) calling trigger exit
    
    19:40:18 51839030:(I) WDICmd:: [MonQ   ] to Q [WDIAdapterCmd]
    [AppQMgrName(GTWPXF2)AppQName(EDIIN)NumThreads(5)Timeout(60000)A
    pplId(NO
    STORE)Convert(Y)]
    19:45:29 51839030:(E) Trigger Monitor found closed INITQ:
    WDIAdapterCmd
    19:45:29 51839030:(D) WDIStatus(1/54)::change action from/to:
    CarryOn/Pause
    19:45:31 51839030:(E) Trigger Monitor found closed INITQ:
    WDIAdapterCmd
    19:45:34 51839030:(E) Trigger Monitor found closed INITQ:
    WDIAdapterCmd
    19:45:36 51839030:(E) Trigger Monitor found closed INITQ:
    WDIAdapterCmd
    19:45:36 51839030:(D) WDIStatus(1/51839030)::change state
    from/to
    Active/Halted
    19:45:36 51839030:(D) WDIStatus(1/51839030)::change state
    from/to
    Halted/Stopped
    
    This is excerpt from the stacktrace of the core:
    
    $ dbx /opt/IBM/WDIServer/V3.3/bin/WDIServer core
    Type 'help' for help.
    warning: The core file is not a fullcore. Some info may
    not be available.
    [using memory image in core]
    reading symbolic information ...warning: no source compiled
    with -g
    
    
    IOT/Abort trap in pthread_kill at 0xd05098c0
    0xd05098c0 (pthread_kill+0xa0) 80410014         lwz
    r2,0x14(r1)
    (dbx) where
    pthread_kill(??, ??) at 0xd05098c0
    _p_raise(??) at 0xd0508d28
    raise.raise(??) at 0xd01373e0
    abort() at 0xd01c576c
    xehInterpretSavedSigaction() at 0xd4619574
    xehExceptionHandler() at 0xd461e2f0
    _doprnt(??, ??, ??) at 0xd014a9f4
    vfprintf(??, ??, ??) at 0xd014317c
    WDILog.WDILog::writeLog(const char*,char*,char)() at 0x100102b8
    WDILog::logEvent(const char*,...)() at 0x1000feac
    WDITranslatorProc::killQMonitor()() at 0x10015624
    WDIQResourceMgr::checkHealthStatus()() at 0x10012490
    WDISystemMonitor::doHealthCheck()() at 0x10003018
    WDISystemMonitor::startListner()() at 0x10002a34
    main() at 0x100020d4
    (dbx)
    
    Note:  This was not recreatable from level 2.
    
    Files on RTPGSA:
    
    core.stacktrace.txt
    wdiserver_out.zip
    problem.txt
    
    
    
    Keywords:  WDIServer WDIService termination wdi 3.3 aix
    terminating crash core
    

Local fix

Problem summary

  • WDIServer incorrectly sent a signal 15, SIGTERM, to process ID 0
    on AIX when the WDIService has closed its Command Queue
    without transitioning to an inactive (stopped, idle, crashed)
    stateand has been requested to terminate (action is shutdown)
    due to excessive processing time.
    WDI is changed to protect against situations where the failing
    state might exist and prevent sending signal to process 0
    under any condition.
    This will prevent this symptom though the cause may persist.
    

Problem conclusion

  • WDIServer incorrectly sent a signal 15, SIGTERM, to process ID 0
    on AIX when the WDIService has closed its Command Queue
    without transitioning to an inactive (stopped, idle, crashed)
    stateand has been requested to terminate (action is shutdown)
    due to excessive processing time.
    WDI is changed to protect against situations where the failing
    state might exist and prevent sending signal to process 0
    under any condition.
    This will prevent this symptom though the cause may persist.
    

Temporary fix

Comments

  • WDI 3.3 open system only
    CMVC defect P8012334
    

APAR Information

  • APAR number

    IC89548

  • Reported component name

    WEBS DI MP

  • Reported component ID

    5724C5003

  • Reported release

    330

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2013-01-15

  • Closed date

    2013-02-13

  • Last modified date

    2013-03-08

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • WDISERVR CPP
    

Fix information

  • Fixed component name

    WEBS DI MP

  • Fixed component ID

    5724C5003

Applicable component levels

  • R330 PSY

       UP

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCFJHL","label":"WDI 3.3 MP"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3.3","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
08 March 2013