Technical Blog Post
Abstract
ITM Agent Insights: 50 Shades of Blue – Detecting APAR IV81845
Body
One of the things that never ceases to surprise me about working in support is the way different individuals describe the symptoms of the same problem. The following APAR is a good example. It was first identified in February 2016, but still surfaces on a regular basis.
APAR IV81845: THE IFSTAT DAEMON BECOMES UNRESPONSIVE ON AIX.
http://www.ibm.com/support/docview.wss?uid=swg1IV81845
This blog helps you identify and resolve the problem sooner without increasing the trace level as recommended in the link above.
On the surface the web page describes how one of the UX agent’s seven processes, the ifstat daemon, becomes unresponsive while running in the background. The link provides trace levels that can be increased to help you identify the problem. This blog will provide instructions that will help customers look at the current logs on their system, with default tracing in place, and be able to determine if APAR IV81845 is the cause of their problem.
You also need to be aware that there is a typo in the link shown. If you search the text for the following:
Details:
Gather logs with the following trace in lz.ini
The typo is:
lz.ini.
The product code for the UNIX OS agent is UX. To set the traces as suggested in the link, you need to modify the \IBM\ITM\conf\ux.ini file, not the lz.ini, and restart the agent.
Here are some criteria that can help you determine if you may be encountering APAR IV81845
1
If you are running an ITM UX agent that is prior to these versions:
IBM Tivoli Monitoring: Unix(R) OS Agent 6.3.0.6-TIV-ITM_UNIX-IF0001
IBM Tivoli Monitoring: Unix(R) OS Agent 6.3.0.4-TIV-ITM_UNIX-IF0004
IBM Tivoli Monitoring 6.3.0 Fix Pack 7 (6.3.0-TIV-ITM-FP0007)
2
If you are running AIX 6.1 or 7.1. The problem appears to have surfaced AIX 6.1 TL9 SP6 (6100-09-06-1543) but has also been reported in AIX 7.1.
To check your version of AIX run this command:
oslevel -s
3
If are experiencing a symptom similar to any of the following problem descriptions used by customers to report APAR IV81845
UX OS agent real time and history data is not collecting
UX Agent issue on AIX Server
UX Agent hung & hourly situation did not trigger
UX agent connected but not displaying metrics
UX agent restarts itself
tacmd Executecommand doesn't work
UX agent no alerts
UX agent offline
4
Check the following logs to determine if you are encountering IV81845. I find the search capabilities in Windows to be lacking. There are more efficient and effective tools such as UltraSearch, Everything, or Agent Ransack that are available online that can make this task easier. They can be used to search all the logs. Make certain to only use the tools that are approved by your company.
In the \IBM\ITM\logs\logs\ directory look for logs with names similar to the following:
<HOSTNAME>_ux_ifstat_<HEXDATE>-##.log,
<HOSTNAME>_ux_stat_daemon_<HEXDATE>-##.log, and
<HOSTNAME>_ux_kuxagent_<HEXDATE>-##.log
** You may be encountering this APAR if you see messages like these in the <HOSTNAME>_ux_ifstat_<HEXDATE>-##.logs.
(5A7A4EFF.0004-1:ifstat-bsd.cpp,588,"main") IOCTL error retrieving netmask for en2; errno: 68
(5A7A4EFF.0005-1:ifstat-bsd.cpp,337,"setGateway") Interface 'en2' not found in map
(5A7A4F1D.0000-1:ifstat-bsd.cpp,262,"getDomain") Unable to find domain
(5A7A4F1D.0001-1:ifstat-bsd.cpp,588,"main") IOCTL error retrieving netmask for en2; errno: 68
OR
5A7A4F1D.0001-1:ifstat-bsd.cpp,588,"main") IOCTL error retrieving netmask for 0; errno:6
** The log <HOSTNAME>_ux_stat_daemon_<HEXDATE>-##.log may show a lot of messages like:
5963E573.0000-1:stat_daemon_ifstat.cpp,558,"mapInterfaceTypeToEnum")
Unknown Interface Mapping for Interface Type: 184671131
(5963E573.0001-1:stat_daemon_ifstat.cpp,558,"mapInterfaceTypeToEnum")
Unknown Interface Mapping for Interface Type: -1572282712
(5963E573.0002-1:stat_daemon_ifstat.cpp,558,"mapInterfaceTypeToEnum")
Unknown Interface Mapping for Interface Type: 214256915
(5963E591.0000-1:stat_daemon_ifstat.cpp,558,"mapInterfaceTypeToEnum")
Unknown Interface Mapping for Interface Type: 5303
(5963E591.0001-1:stat_daemon_ifstat.cpp,558,"mapInterfaceTypeToEnum")
Unknown Interface Mapping for Interface Type: 8891
(5963E591.0002-1:stat_daemon_ifstat.cpp,558,"mapInterfaceTypeToEnum")
Unknown Interface Mapping for Interface Type: 650
(5963E5AF.0000-1:stat_daemon_ifstat.cpp,558,"mapInterfaceTypeToEnum")
Unknown Interface Mapping for Interface Type: 997294254
(5963E5AF.0001-1:stat_daemon_ifstat.cpp,558,"mapInterfaceTypeToEnum")
Unknown Interface Mapping for Interface Type: 755970048
(5963E5AF.0002-1:stat_daemon_ifstat.cpp,558,"mapInterfaceTypeToEnum")
Unknown Interface Mapping for Interface Type: 207524978
&&&
(585C1B8B.0000-1:stat_daemon.cpp,350,"stop_exit") **** UNIX OS Agent
stat_daemon terminated ****
** Also look at the <HOSTNAME>_ux_kuxagent_<HEXDATE>-##.log it may show messages like these:
(585AE2D1.0019-11:sock.c,344,"fd_read") ERROR: read failed for socket 6
with errno 0 (Error 0)
(585AE2D1.001A-11:sock.c,346,"fd_read") ERROR: buf: 110061FA0, len:
1144, num: 0 bytes, amount_left: 1144 bytes
(585AE2D1.001B-11:kuxmain.cpp,641,"handle_subdaemon_failure()") ERROR: stat_daemon processing terminated!
(585AE2D1.001C-11:kuxmain.cpp,642,"handle_subdaemon_failure()") ERROR:
The metrics from this data collector will not be available anymore
(585AE2D1.001D-11:kuxmain.cpp,643,"handle_subdaemon_failure()") ERROR:
Please contact IBM support to determine the cause of the stat_daemon
failure
(585AE2D1.001E-11:kuxmain.cpp,658,"handle_subdaemon_failure()") INFO:
socket 6 closed
(585AE2D1.001F-11:sock.c,492,"fd_write") ERROR: write failed for socket
12 with errno 32 (Broken pipe)
(585AE2D1.0020-11:sock.c,494,"fd_write") ERROR: buf: 110061AB0, len:
12, num: -1 bytes, amount_left: 12 bytes
(585AE2D1.0021-11:kuxmain.cpp,641,"handle_subdaemon_failure()") ERROR:
aixdp_daemon processing terminated!
(585AE2D1.0022-11:kuxmain.cpp,642,"handle_subdaemon_failure()") ERROR:
The metrics from this data collector will not be available anymore
(585AE2D1.0023-11:kuxmain.cpp,643,"handle_subdaemon_failure()") ERROR:
Please contact IBM support to determine the cause of the aixdp_daemon
failure
(585AE2D1.0024-11:kuxmain.cpp,658,"handle_subdaemon_failure()") INFO:
socket 12 closed
(585AE2D1.0025-11:sock.c,263,"fd_read") ERROR: The socket provided is
null!
(585AE2D1.0026-11:kux14agt.cpp,623,"TakeSample") No Physical Memory
data available
(585AE2D1.0027-11:sock.c,404,"fd_write") ERROR: The socket provided is
null!
(585AE2D1.0028-11:kux14agt.cpp,675,"TakeSample") No Virtual Memory data
available
(585AE2D1.0029-11:kux14agt.cpp,711,"TakeSample") No Paging Space data
available
***
If you think you are encountering APAR IV81845 after reviewing the steps described above, go the following link and apply the solution that best applies to your environment.
http://www.ibm.com/support/docview.wss?uid=swg1IV81845
Hopefully these steps can help you identify and resolve this surprisingly common problem.
If you have any questions or concerns, please contact IBM Support for assistance.
LZ
Additional ITM Agent Insights and IBM Tivoli Monitoring Agent blogs are indexed under ITM Agent Insights: Introduction.
Subscribe and follow us for all the latest information directly on your social feeds:
|
|
|
Check out all our other posts and updates: | |
Academy Blog | ht |
A | ht |
A | ht |
A | ht |
UID
ibm11084563