Fixes are available
IBM Tivoli Monitoring: Linux(R) OS Agent 6.2.3.4-TIV-ITM_LINUX-IF0001
IBM Tivoli Monitoring: Linux(R) OS Agent 6.3.0.5-TIV-ITM_LINUX-IF0003
IBM Tivoli Monitoring: Linux(R) OS Agent 6.3.0.6-TIV-ITM_LINUX-IF0001
IBM Tivoli Monitoring: Linux(R) OS Agent 6.2.3.3-TIV-ITM_LINUX-IF0007
IBM Tivoli Monitoring 6.3.0 Fix Pack 7 (6.3.0-TIV-ITM-FP0007)
IBM Tivoli Monitoring: Linux(R) OS Agent 6.2.3.5-TIV-ITM_LINUX-IF0004
APAR status
Closed as program error.
Error description
Random crashes may occur in the Linux OS Agent while collecting metrics Affected Platforms / Versions: This issue affects all the LINUX OS agent versions since 6.23 FP1 and it does not depend on the Linux version or architecture. Diagnostics: RAS1 logs at ERROR level show a sequence like this before ending: filestats.cpp,314,"executeStatfsInSeparateThread") WARNING: The statfs timeout expired! filestats.cpp,315,"executeStatfsInSeparateThread") WARNING: The mounted file system "/example/file/system is probably unreachable filestats.cpp,123,"GetFileStats") statfs64 timed out for /example/file/system . With more detailed RAS1 logging enabled, RAS logs will end with the thread for "exec_statfs": filestats.cpp,180,"exec_statfs") statfs64 executed successfully for "/example/file/system" filestats.cpp,204,"exec_statfs") Exit: 0x0 . The traceback of the coring thread is variable as this issue is the result of memory corruption when the call to statfs64() returns and the "exec_statfs" thread exits, where the calling thread has previously timed out. . Initial Impact: High - monitoring agent crashes. . Additional Keywords: KLZDISK core dump sigsegv klzagent Local Fix: Increase KBB_NFS_TIMEOUT to prevent timeout of statfs64
Local fix
Increase KBB_NFS_TIMEOUT setting to larger value than the statfs64 call takes to return. Set in lz.ini and value is specified in seconds.
Problem summary
Monitoring Agent for Linux OS core dumps while monitoring disks. The Monitoring Agent for Linux OS can randomly crash during Disk attribute group data collection. This is due to a memory corruption caused by a timing condition between two agent internal threads. This condition may occur when one or more filesystems do not respond within the 2 seconds default timeout for the statfs64 system call. The combination of APARs IV78033 and IV80542 also have addressed agent high memory issues, especially when the agent is running in ICP nodes.
Problem conclusion
Code was updated to avoid the memory corruption. The fix for this APAR will be contained in the following maintenance packages: | FixPack | 6.3.0-TIV-ITM-FP0007 | InterimFix | 6.3.0.6-TIV-ITM_LINUX-IF0001 | InterimFix | 6.3.0.5-TIV-ITM_LINUX-IF0003 | InterimFix | 6.2.3.5-TIV-ITM_LINUX-IF0004 | InterimFix | 6.2.3.4-TIV-ITM_LINUX-IF0001
Temporary fix
Comments
APAR Information
APAR number
IV80542
Reported component name
ITM AGENT LINUX
Reported component ID
5724C04LN
Reported release
623
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2016-01-19
Closed date
2016-02-15
Last modified date
2019-04-29
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
ITM AGENT LINUX
Fixed component ID
5724C04LN
Applicable component levels
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTFXA","label":"Tivoli Monitoring"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"623","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
08 March 2023