Fixes are available
APAR status
Closed as program error.
Error description
Severity: 2 Approver:BEH Compid: 5724K1000 Tivoli Universal Agent Abstract:UA:Switching TEMS every 10 mins may result in UA crash Environment: ITM 6.2.3 GA / AIX 6.1 Problem Description: Universal Agent (kuma620) crashes due to frequent TEMS switching. Detailed Recreation Procedure: 1. Build two TEMS with hot-standby configuration 2. Make UA connect to primary TEMS 3. Recycle primary TEMS frequently 4. UA sometimes crashes when switching TEMS Related Files and Output: With KBB_RAS1=(UNIT:kum ALL), you can see that UA crashes in KUM0_FormatDataField().
Local fix
Problem summary
On very, very rare occasions there has been a case where, during Universal Agent processing, Situation Requests or Start/Stop of Situations that the Universal Agent process has crashed. The sole condition that causes this crash event is the repetitive switching of Universal Agent from a primary monitoring server to a mirror monitoring server. The crash event has only been encountered by deliberately stopping primary monitoring server, waiting 10 minutes and then starting primary monitoring server again, and then repeat the stop/start sequence over and over. This repetitive cycling of the primary monitoring server must be performed numerous times before crash event occurs. On very, very rare occasions there has been a case where, during Universal Agent processing, Situation Requests or Start/Stop of Situations that the Universal Agent process has crashed. Among the thousands of deployed Universal Agents, there has been a single reported case of this problem. The problem occurs due to a critical, internal data structure not being thread safe. This exposure only exists during the repetitive switching of Universal Agent from a primary monitoring server to a mirror monitoring server. In order for this APAR to be properly implemented in your environment, a new environment variable has been added. See the "Install Actions" section of the APAR conclusion for more details.
Problem conclusion
Install Actions Two steps were taken to address the thread safety exposure. First a mutex lock was implemented on the internal data structure; this step is the fix to address this APAR. The second step was to introduce a new Universal Agent environment variable named KUMA_DCHCLIENT_LOCK, used to arm or disarm the mutex lock. By default this environment variable is not defined, thus the mutex lock, added per first step, is disabled - meaning the mutex lock is not being used. In order to arm the mutex lock, thus realizing effects of this APAR fix, user must add to um.ini ( UNIX/Linux) or KUMENV (Windows) the environment variable as -> KUMA_DCHCLIENT_LOCK=Y Because this APAR has been so rarely encountered and DCH client<->DCH Server thread interactions are so fundamentally critical to Universal Agent, it is highly recommended that user NOT choose to adopt this fix, albeit the mutex lock by declaring KUMA_DCHCLIENT_LOCK=Y, unless user does in fact become only the second user to experience this problem. The fix for this APAR is contained in the following maintenance packages: | fix pack | 6.2.3-TIV-ITM-FP0002
Temporary fix
Comments
APAR Information
APAR number
IV23876
Reported component name
UNIVERSAL AGENT
Reported component ID
5724K1000
Reported release
623
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-06-27
Closed date
2012-07-19
Last modified date
2012-10-08
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
UNIVERSAL AGENT
Fixed component ID
5724K1000
Applicable component levels
R623 PSY
UP
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSSHL9","label":"Tivoli Universal Agent"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"623","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
08 October 2012