APAR status
Closed as program error.
Error description
Problem Description: When a situation or historical collection is collecting data for the Linux File Information attribute group where a filename is not specified, and the agent encounters the pthread_create error, the agent will go in a loop trying to collect the data resulting in high CPU for the agent process. (6177F69A.40B6-6:klz20agt.cpp,1424,"executeSystemCallInSeparate Thread")WARNING: pthread_create failed with error code 11 (Resou temporarily unavailable) The error above occurs when there are insufficient resources to create another thread. A system-imposed limit on the number of threads was encountered. A Technote has been written which contains details os a SLES 12 environment setting that causes the resource issue. See the technote for more details. High CPU running IBM Tivoli Monitoring Linux OS Agent on SLES 12 after upgrade to 6.30 FP7 SP7 (or later): https://www.ibm.com/support/pages/node/6552616 Related Files and Output: ------------------------------------ In the agent RAS1 log file <hostname>_lz_klzagent_<timestamp>-<nn>.log, with tracing set to (UNIT:klz20agt ALL), the following WARNING messages will displayed followed by the executeSystemCallInSeparateThread(READDIR) for the same path over and over. (6177F69A.40B2-6:klz20agt.cpp,762,"getFileInfo") calling executeSystemCallInSeparateThread(READDIR) for path=/opt/IBM/ITM/bin/ (6177F69A.40B3-6:klz20agt.cpp,1324,"executeSystemCallInSeparate Thread")Entry (6177F69A.40B4-6:klz20agt.cpp,1337,"executeSystemCallInSeparate Thread")name passed in:/opt/IBM/ITM/bin/ (6177F69A.40B5-6:klz20agt.cpp,1384,"executeSystemCallInSeparate Thread")Thread for file "/opt/IBM/ITM/bin/" found in cache (6177F69A.40B6-6:klz20agt.cpp,1424,"executeSystemCallInSeparate Thread")WARNING: pthread_create failed with error code 11 (Resou temporarily unavailable) (6177F69A.40B7-6:klz20agt.cpp,1425,"executeSystemCallInSeparate Thread")Exit: 0xFFFFFFFF (6177F69A.40B8-6:klz20agt.cpp,801,"getFileInfo") readdir failed for path /opt/IBM/ITM/bin/. file:itmCANDLEDATA_xx.sh errno = 11; Resource temporarily unavailable (6177F69A.40B9-6:klz20agt.cpp,762,"getFileInfo") calling executeSystemCallInSeparateThread(READDIR) for path=/opt/IBM/ITM/bin/ (6177F69A.40BA-6:klz20agt.cpp,1324,"executeSystemCallInSeparate Thread")Entry (6177F69A.40BB-6:klz20agt.cpp,1337,"executeSystemCallInSeparate Thread")name passed in:/opt/IBM/ITM/bin/ (6177F69A.40BC-6:klz20agt.cpp,1384,"executeSystemCallInSeparate Thread")Thread for file "/opt/IBM/ITM/bin/" found in cache (6177F69A.40BD-6:klz20agt.cpp,1396,"executeSystemCallInSeparate Thread")The thread for file "/opt/IBM/ITM/bin/" is probably in h (6177F69A.40BE-6:klz20agt.cpp,1397,"executeSystemCallInSeparate Thread")Exit: 0xFFFFFFFE (6177F69A.40BF-6:klz20agt.cpp,793,"getFileInfo") readdir timed out for path:/opt/IBM/ITM/bin/ file:itmCANDLEDATA_xx.sh situation:CC_Linux_LZ_Chk777W (6177F69A.40C0-6:klz20agt.cpp,762,"getFileInfo") calling executeSystemCallInSeparateThread(READDIR) for path=/opt/IBM/ITM/bin/ (6177F69A.40C1-6:klz20agt.cpp,1324,"executeSystemCallInSeparate Thread")Entry (6177F69A.40C2-6:klz20agt.cpp,1337,"executeSystemCallInSeparate Thread")name passed in:/opt/IBM/ITM/bin/ (6177F69A.40C3-6:klz20agt.cpp,1384,"executeSystemCallInSeparate Thread")Thread for file "/opt/IBM/ITM/bin/" found in cache (6177F69A.40C4-6:klz20agt.cpp,1396,"executeSystemCallInSeparate Thread")The thread for file "/opt/IBM/ITM/bin/" is probably in h (6177F69A.40C5-6:klz20agt.cpp,1397,"executeSystemCallInSeparate Thread")Exit: 0xFFFFFFFE (6177F69A.40C6-6:klz20agt.cpp,793,"getFileInfo") readdir timed out for path:/opt/IBM/ITM/bin/ file:itmCANDLEDATA_xx.sh situation:CC_Linux_LZ_Chk777W (6177F69A.40C7-6:klz20agt.cpp,762,"getFileInfo") calling executeSystemCallInSeparateThread(READDIR) for path=/opt/IBM/ITM/bin/
Local fix
1. Turn off any situations for Linux File Information (LNXFILE) where the filename is not specified (e.g. only the path is specified). The looping problem would not occur if the path and filename are both specified. 2. The documentation for the error encountered (EAGAIN) for the pthread_create indicates the following possible causes. Check these settings to see if they are set too low and adjust. A system-imposed limit on the number of threads was encountered. There are a number of limits that may trigger this error: the RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which limits the number of processes and threads for a real user ID, was reached; the kernel's system-wide limit on the number of processes and threads, /proc/sys/kernel/threads-max, was reached (see proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max, was reached (see proc(5)).
Problem summary
High CPU when collecting Linux File Information when resource error. When a situation or historical collection is collecting data for the Linux File Information attribute group where a filename is not specified, and the agent encounters the pthread_create error, the agent will go in a loop trying to collect the data resulting in high CPU for the agent process. (6177F69A.40B6-6:klz20agt.cpp,1424,"executeSystemCallInSeparateT hread")WARNING: pthread_create failed with error code 11 (Resource temporarily unavailable) The error above occurs when there are insufficient resources to create another thread. A system-imposed limit on the number of threads was encountered.
Problem conclusion
The code has been updated to handle the error condition so it does not result in the code looping. The fix for this APAR is contained in the following maintenance packages: | service pack | 6.3.0.7-TIV-ITM-SP0012
Temporary fix
Comments
APAR Information
APAR number
IJ35907
Reported component name
ITM AGENT LINUX
Reported component ID
5724C04LN
Reported release
630
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-11-02
Closed date
2022-05-02
Last modified date
2022-05-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
ITM AGENT LINUX
Fixed component ID
5724C04LN
Applicable component levels
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTFXA","label":"Tivoli Monitoring"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"630","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
08 March 2023