A readme is available
Fixes are available
APAR status
Closed as program error.
Error description
Environment: 6.30 FP7 UNIX OS Agent on AIX-only Problem Description: On AIX, the Monitoring Agent for UNIX OS sometimes hangs at startup when running 6.30 FP7. The version of GSKit in 6.30 F7 includes some updates that cause there to be a deadlock between a GSKit thread and a Watchdog thread. This results in the agent hanging and not returning results or connecting to TEMS. This is a timing issue and does not always occur. Related Files and Output: With tracing set to (UNIT:kgl ALL), you will see the message (59CA9623.64B9-1:kglcry.c,2985,"initializeICC") Calling ICC_Init for GSKit 8. in the agent RAS1 log file, but will not see the message: (59CA9623.79CC-1:kglcry.c,2989,"initializeICC") ICC_Init completed. Context returned 110E5B4F0 If there is a corefile or a process stack trace (e.g. taken with "procstack -p <pid of kuxagent>), they will show two threads with "setlocale" in the trace. Once will be from GSKit which originates from ICC_Init. The other originates from Watchdog in kca_mbstowcs (dbx) where all Thread $t1 warning: Thread is in kernel mode, not all registers can be accessed. .() at 0x0 _rec_mutex_lock(??) at 0x90000000002ae08 setlocale(??, ??) at 0x9000000000170b0 <== setlocale expand_catname(??, ??, ??) at 0x900000000043b04 catopen(??, ??) at 0x900000000044c7c __strerror(??, ??, ??) at 0x900000000059b34 strerror(??) at 0x90000000005a0e4 build_SYS_str_reasons() at 0x900000003cd0110 ERR_load_ERR_strings() at 0x900000003cd1f80 ERR_load_crypto_strings() at 0x900000003e044cc iccLoadErr(??) at 0x900000003cbc0f4 OpenSSL_Init(??, ??) at 0x900000003cbc2b4 ICCLoad() at 0x900000003cbc9b4 iccSLInit() at 0x900000003e1b90c mod_init1(??, ??) at 0x9fffffff0002a50 usl_init_mods(??, ??) at 0x9fffffff0003c30 uload(??, ??, ??, ??, ??, ??) at 0x9fffffff00023e0 load1(??, ??, ??, ??) at 0x9000000000006f4 load(??, ??, ??) at 0x900000000001770 loadAndInit(??, ??, ??) at 0x9000000000eadac dlopen(??, ??) at 0x900000000090f88 ICC_LoadLibrary(??) at 0x900000003623714 ICCN_Init(??, ??) at 0x900000003623c30 ICC_Init(??, ??) at 0x900000003618bb8 initializeICC(0xfffffffffff8890) at 0x900000002f383dc CRY_RAND() at 0x900000002f3ba28 AccessAuthorizationGroupProfile::addUserToAADB(char*,int,char*)( 0x110bd8 e50, 0xfffffffffffd2a8, 0x700000007, 0xfffffffffffd2b1) at 0x9000000033cc284 kgeaagpx.AAGPUserEnd(void*,const char*)(0xfffffffffffa318, 0x110bdaa30) at 0x9000000033d69f0 endKGEelement(void*,const char*)(0xfffffffffffa318, 0x110bdaa30) at 0x9000000033d5c44 doContent(0x110bd9670, 0x0, 0x9001000a075f570, 0x9000000033f3e6a, 0x9000000033f3e7a, 0x0) at 0x900000002ee3950 contentProcessor(0x110bd9670, 0x9000000033f3af4, 0x9000000033f3e7a, 0x0) at 0x900000002ee9bd8 doProlog(0x110bd9670, 0x9001000a075f570, 0x9000000033f3af4, 0x9000000033f3e7a, 0x1d0000001d, 0x9000000033f3af4, 0x0) at 0x900000002eea4bc prologProcessor(0x110bd9670, 0x9000000033f3af4, 0x9000000033f3e7a, 0x0) at 0x900000002eec02c prologInitProcessor(0x110bd9670, 0x9000000033f3af4, 0x9000000033f3e7a, 0x0) at 0x900000002eec180 XML1_Parse(0x110bd9670, 0x9000000033f3af4, 0x38600000386, 0x100000001) at 0x900000002ee6d98 KGE_AccessAuthorizationGroupPolicyProcessor(KGE_XMLreq_t*)(0xfff ffffffff a318) at 0x9000000033d5124 AccessAuthorizationGroupProfile::AccessAuthorizationGroupProfile (char*)( 0x110bd8e50, 0x110bd8a0f) at 0x9000000033d172c KGE_InitAccessAuthorizationGroupProfile(void*)(0x110bd8a0f) at 0x9000000033d1ae4 BSS1_InitializeOnce(0x9001000a088b72c, 0x9001000a08925b0, 0x110bd8a0f, 0x9000000033f43ac, 0x9f0000009f) at 0x900000002ead320 KGE_GetAAGP(char*)(0x110bd8a0f) at 0x9000000033d0d3c kgeaagpa.KGE_AAGP_CheckAuthorization(0x110bd86f0) at 0x9000000033ef000 kramain(0x100000001, 0xfffffffffffe360) at 0x900000002983ba8 main(argc = 1, argv = 0x0fffffffffffe360, env = 0x0fffffffffffe370), line 1149 in "kuxmain.cpp" Thread $t18 warning: Thread is in kernel mode, not all registers can be accessed. .() at 0x0 _rec_mutex_lock(??) at 0x90000000002ae08 __modinit_lock(??) at 0x90000000003d6b0 load1(??, ??, ??, ??) at 0x900000000000644 load(??, ??, ??) at 0x900000000001770 __lc_load@AF5_1(??, ??, ??) at 0x900000000201058 load_locale(??, ??, ??) at 0x9000000000136fc setlocale(??, ??) at 0x9000000000166cc <== setlocale kca_mbstowcs(char*)(__classReturn = &(...), str = "amqzmur0"), line 49 in "kcautil.cpp" unnamed block in KcaCmdAIX::getRunningProcesses(std::vector<KcaProcess*,std::allo cator<Kc aProcess*> >&)(this = 0x0000000110bb13f0, procList = &(...)), line 363 in "kcacmdaix.cpp" KcaCmdAIX::getRunningProcesses(std::vector<KcaProcess*,std::allo cator<Kc aProcess*> >&)(this = 0x0000000110bb13f0, procList = &(...)), line 363 in "kcacmdaix.cpp" Controller::initialDiscovery()(this = 0x0000000110f9baf0), line 348 in "kcactrl.cpp" Controller::PASThreadExecution()(this = 0x0000000110f9baf0), line 4822 in "kcactrl.cpp" PASThreadEntry(param = (nil)), line 4944 in "kcactrl.cpp"
Local fix
The temporary workaround is to disable Watchdog. 1. Stop the OS Agent (if hung will need to stop with force). 2. Edit the $CANDLEHOME/config/ux.ini file Comment out the line: # KCA_CAP_DIR=$CANDLEHOME$/config/CAP:/opt/IBM/CAP Add the line: KCA_CAP_DIR= 3. Restart the OS Agent. This will avoid the conflict that causes the agent to hang.
Problem summary
On AIX, The 6.30 FP7 OS agent sometimes hangs on startup. On AIX, the 6.30 FP7 Monitoring Agent for UNIX OS sometimes hangs at startup. The version of GSKit provided with 6.30 F7 includes some updates that cause there to be a deadlock between a GSKit thread and a Watchdog thread. This results in the agent hanging and not returning results or connecting to TEMS. This is a timing issue and does not always occur.
Problem conclusion
The version of GSKit provided in 6.30 FP7 SP1 was uplifted and the deadlock issue was resolved. The fix for this APAR is contained in the following maintenance packages: | service pack | 6.3.0.7-TIV-ITM-SP0001 | provisional fix | 6.3.x-TIV-ITM-GSK-8.0.50.84-IJ00337 http://www.ibm.com/support/docview.wss?uid=swg24044365
Temporary fix
Comments
APAR Information
APAR number
IJ00337
Reported component name
ITM AGENT UNIX
Reported component ID
5724C040U
Reported release
630
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2017-09-26
Closed date
2018-01-17
Last modified date
2019-05-08
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
TEMA
Fixed component ID
5724C04TE
Applicable component levels
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTFXA","label":"Tivoli Monitoring"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"630","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
08 March 2023