Fixes are available
APAR status
Closed as program error.
Error description
REL: 6.30.02.02 Problem: Under the condition that the UNIX OS agent fails to read the /proc/PID/psinfo file for a PID of a process it has just checked to be running, it may occur that the raw data missing for that process will then lead to a crash. While the failure in reading the psinfo file can be explained as a timing issue due to a process dying in between the agent check and the file access, which is a very rare condition, the agent code is not robust enough to handle this condition, and the serviceability is poor because the reason for the failure (that might differ from above timing issue) is not logged in RAS1 logs. Affected Platforms / Versions: This issue affects only 6.30 FP2 IF2 because the new dynamic memory management of process' data introduced in IF2 makes the agent more sensitive to uninitialized data. It is not expected to depend on the Solaris version. Diagnostics: set KBB_RAS1=ERROR (UNIT:kux04 ALL) (UNIT:get_p ALL) (UNIT:proc ALL) The RAS1 logs end abnormally in "get_processes". Example: normal case is as follows: (52CD6D18.445F-D:get_process-sol.cpp,619,"get_processes") Reading directory 28081. Previous directory 29290 (52CD6D18.4460-D:get_process-sol.cpp,259,"get_psinfo") Entry (52CD6D18.4461-D:get_process-sol.cpp,344,"get_psinfo") Pid: 28081, CPU time: 0.038925100 .... (52CD6D18.4490-D:get_ps_table.cpp,1118,"display_ras1_rawps") Entry the case of the crash is when "get_psinfo" does not appear between "get_processes" and "display_ras1_rawps" tracepoints, like here: (52CC052B.23ED-5:get_process-sol.cpp,619,"get_processes") Reading directory 21661. Previous directory 2053 (52CC052B.23EE-5:get_ps_table.cpp,1118,"display_ras1_rawps") Entry Initial Impact: High, the agent is no longer running Additional Keywords: UNIXPS 6.3.0.2-TIV-ITM_UNIX-IF0002
Local fix
Problem summary
On Solaris, the Monitoring Agent for UNIX OS may crash while monitoring processes. In a highly dynamic environment with hundreds of running processes frequently going down and up, the agent may fail to read one of the files in the /proc file system reporting metrics for the specified process ID (PID), because that PID no longer exists. This condition is not correctly handled and may lead to a crash because of memory left uninitialized. The following is an example of the call stack: /opt/IBM/ITM/sol296/ux/bin/kuxagent:__1cMProcessUtilsOcmd_to_dis play6Fpc 1i_i_+0x184 /opt/IBM/ITM/sol296/ux/bin/kuxagent:__1cRsave_process_data6FpnIp s_table ii_v_+0x9c0 /opt/IBM/ITM/sol296/ux/bin/kuxagent:__1cSget_entire_process6FpnI ps_table __i_+0x1c0 /opt/IBM/ITM/sol296/ux/bin/kuxagent:__1cMget_ps_table6FpnIps_tab le__i_+0 x53c /opt/IBM/ITM/sol296/ux/bin/kuxagent:__1cJProcUsageGsample6MpnDst dDmap4Cx nOProcessCpuInfo_n0BEless4Cx__n0BJallocator4n0BEpair4Ckxn0C ___v_+0x 134 /opt/IBM/ITM/sol296/ux/bin/kuxagent:__1cZProcessStatisticsTempla teGupdat e6MnDstdMbasic_string4Ccn0BLchar_traits4Cc__n0BJallocator4Cc _v_+0x43 4 /opt/IBM/ITM/sol296/ux/bin/kuxagent:__1cSupdateProcessStats6Fpv 0_+0x36c
Problem conclusion
Added logic to make the agent more robust under the above conditions. The fix for this APAR will be contained in the following maintenance packages: | FixPack | 6.3.0-TIV-ITM-FP0003 | InterimFix | 6.3.0.1-TIV-ITM_UNIX-IF0002
Temporary fix
Comments
APAR Information
APAR number
IV54057
Reported component name
ITM AGENT UNIX
Reported component ID
5724C040U
Reported release
630
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2014-01-16
Closed date
2014-01-24
Last modified date
2014-08-08
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
ITM AGENT UNIX
Fixed component ID
5724C040U
Applicable component levels
R623 PSY
UP
R630 PSY
UP
R610 PSN
UP
R620 PSN
UP
R621 PSN
UP
R622 PSN
UP
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTFXA","label":"Tivoli Monitoring"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"630"}]
Document Information
Modified date:
30 December 2022