APAR status
Closed as program error.
Error description
Environment: AIX WDI 3.3 WDIServer process crashes/coredumps when it tries to terminate slow translators. It seems to get confused and ends up terminating itself. This is excerpt from WDIServer_Out showing the WDIServer crashing: 19:37:28 16449646:(D) WDISystemMonitor::Health Check Trigger:state/action/reports:Active/CarryOn/2 19:37:28 16449646:(D) WDIQResourceMgr::checking for terminated children 19:37:28 16449646:(D) WDIQResourceMgr:: Obs task state action Q rpts 19:37:28 16449646:(D) WDIQResourceMgr:: 32 10 Slow CarryOn NOBAT_T 156952 19:37:28 16449646:(D) WDIQResourceMgr:: 26 14 Slow CarryOn PLAININ 291415 19:37:28 16449646:(D) WDIQResourceMgr:: 6 107 Slow Shutdown EDIIN 0 19:37:28 16449646:(I) WDIResourceManager::slow queue monitor task/process/WDITransCmdQ open:107/61931694/0 19:37:28 16449646:(D) WDIResourceManager::slow obs./timeout/cycle time*count:6/60000/2500*5 19:37:28 16449646:(I) 19:37:28 16449646:(D) WDIStatus(0/110)::change action from/to: CarryOn/Shutdown 19:37:28 16449646:(E) WDIServer:: terminating from signal/pid (6/16449646) 19:40:18 51839030:(I) ------- Trigger message loaded ----- 19:40:18 51839030:(I) QName =EDIIN 19:40:18 51839030:(I) ProcessName =WDI.TRANSLATOR.PROC 19:40:18 51839030:(I) EnvData =NumThreads(1) Timeout(10000) 19:40:18 51839030:(I) TriggerData =NumThreads(5) Timeout(60000) applid(NOSTORE) 19:40:18 51839030:(I) UserData = 19:40:18 51839030:(I) ------- End of trigger message ---------- 19:40:18 51839030:(I) calling trigger exit 19:40:18 51839030:(I) WDICmd:: [MonQ ] to Q [WDIAdapterCmd] [AppQMgrName(GTWPXF2)AppQName(EDIIN)NumThreads(5)Timeout(60000)A pplId(NO STORE)Convert(Y)] 19:45:29 51839030:(E) Trigger Monitor found closed INITQ: WDIAdapterCmd 19:45:29 51839030:(D) WDIStatus(1/54)::change action from/to: CarryOn/Pause 19:45:31 51839030:(E) Trigger Monitor found closed INITQ: WDIAdapterCmd 19:45:34 51839030:(E) Trigger Monitor found closed INITQ: WDIAdapterCmd 19:45:36 51839030:(E) Trigger Monitor found closed INITQ: WDIAdapterCmd 19:45:36 51839030:(D) WDIStatus(1/51839030)::change state from/to Active/Halted 19:45:36 51839030:(D) WDIStatus(1/51839030)::change state from/to Halted/Stopped This is excerpt from the stacktrace of the core: $ dbx /opt/IBM/WDIServer/V3.3/bin/WDIServer core Type 'help' for help. warning: The core file is not a fullcore. Some info may not be available. [using memory image in core] reading symbolic information ...warning: no source compiled with -g IOT/Abort trap in pthread_kill at 0xd05098c0 0xd05098c0 (pthread_kill+0xa0) 80410014 lwz r2,0x14(r1) (dbx) where pthread_kill(??, ??) at 0xd05098c0 _p_raise(??) at 0xd0508d28 raise.raise(??) at 0xd01373e0 abort() at 0xd01c576c xehInterpretSavedSigaction() at 0xd4619574 xehExceptionHandler() at 0xd461e2f0 _doprnt(??, ??, ??) at 0xd014a9f4 vfprintf(??, ??, ??) at 0xd014317c WDILog.WDILog::writeLog(const char*,char*,char)() at 0x100102b8 WDILog::logEvent(const char*,...)() at 0x1000feac WDITranslatorProc::killQMonitor()() at 0x10015624 WDIQResourceMgr::checkHealthStatus()() at 0x10012490 WDISystemMonitor::doHealthCheck()() at 0x10003018 WDISystemMonitor::startListner()() at 0x10002a34 main() at 0x100020d4 (dbx) Note: This was not recreatable from level 2. Files on RTPGSA: core.stacktrace.txt wdiserver_out.zip problem.txt Keywords: WDIServer WDIService termination wdi 3.3 aix terminating crash core
Local fix
Problem summary
WDIServer incorrectly sent a signal 15, SIGTERM, to process ID 0 on AIX when the WDIService has closed its Command Queue without transitioning to an inactive (stopped, idle, crashed) stateand has been requested to terminate (action is shutdown) due to excessive processing time. WDI is changed to protect against situations where the failing state might exist and prevent sending signal to process 0 under any condition. This will prevent this symptom though the cause may persist.
Problem conclusion
WDIServer incorrectly sent a signal 15, SIGTERM, to process ID 0 on AIX when the WDIService has closed its Command Queue without transitioning to an inactive (stopped, idle, crashed) stateand has been requested to terminate (action is shutdown) due to excessive processing time. WDI is changed to protect against situations where the failing state might exist and prevent sending signal to process 0 under any condition. This will prevent this symptom though the cause may persist.
Temporary fix
Comments
WDI 3.3 open system only CMVC defect P8012334
APAR Information
APAR number
IC89548
Reported component name
WEBS DI MP
Reported component ID
5724C5003
Reported release
330
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2013-01-15
Closed date
2013-02-13
Last modified date
2013-03-08
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Modules/Macros
WDISERVR CPP
Fix information
Fixed component name
WEBS DI MP
Fixed component ID
5724C5003
Applicable component levels
R330 PSY
UP
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCFJHL","label":"WDI 3.3 MP"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3.3","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
08 March 2013