IBM Support

Unix OS Agent in VIOS unexpectedly hangs

Technical Blog Post


Abstract

Unix OS Agent in VIOS unexpectedly hangs

Body

Virtual I/O Server is a software located into a SystemP logical partition that it used to configure and share physical I/O resources between all the others logical partitions
of the server.
This means it has visibility of all the physical and logical storage resources defined on the machine.

In the VIOS, you can run a pre-installed ITM agent called VIOS Agent, but you can also run Unix OS Agent like any other AIX LPAR.

It can happen that Unix OS Agent, once started, shows no data on TEP despite it is correctly registered and online to TEMS/TEPS.

When something similar happens for Unix OS agents running in a VIOS, this likely depends because of an unhandled exception that occurs in the aixdp_daemon process.
If you look at the agent logs (log having name like <hostname>_ux_<epochtime>.log, for example myvios_ux_1495095330.log) you can notice that aixdp_daemon generates a stack trace like this one:


   
**** Fatal Error (11) Detected in kuxagent or a helper binary ****  
**** Stacktrace in standard logs. Enable KBB_SIG1=-dumpoff in ini for  core dumps ****  
+++PARALLEL TOOLS CONSORTIUM LIGHTWEIGHT COREFILE FORMAT version 1.0
+++LCB 1.0 Thu May 18 10:30:50 2017 Generated by IBM AIX 6.1  
#  
+++ID Node 0 Process 19988686 Thread 1  
***FAULT "SIGSEGV - Segmentation violation"  
+++STACK  
leftmost : 0x0000000c  
malloc_y : 0x00000534  
malloc_common@AF104_86 : 0x00000028  
get_cu_vtargets : 0x00000228  
init_odm : 0x0000016c  
create_cu_hashtbl : 0x000001a4  
get_diskstats : 0x000004d8  
adp_get_diskstats : 0x00000108  
dt_CollectDiskData : 0x0000010c  
dt_CollectData : 0x000002b4  
ux_CollectData : 0x00000110
reply_disks_request__FiPv : 1361 # in file <aixdp_daemon.cpp>  
reply_data_request__FiPv : 349 # in file <aixdp_daemon.cpp>  
process_data__FiPv : 298 # in file <aixdp_daemon.cpp>  
main : 192 # in file <aixdp_daemon.cpp>  

 



In this condition, the aixdp_daemon did not crash and causes the  remaining subprocesses and agent threads to hang.  

If you set KBB_SIG1=-dumpoff, a core is generated and the aixdp_daemon process is closed, freeing the remaining UNIX OS Agent processes and allowing
them to work and return data.

Continuing with the analysis, if we look at the last row written by the crashing process, we can see:
   
(592FC9B1.07A2-1:dkstats.c,1984,"get_cu_vtargets") Entry  
 
Considering that the code was trying to allocate more memory, there is a meaningful chance the problem occurs because of the big amount  of disk entries.  

Of course, being this one a VIOS, the aixdp_daemon will discover a lot of I/O resources and this could hit system resource and process limits,
that are then highlighted in the above segment violation exception.

Considering that we are dealing with a VIOS, there is likely a VIOS agent already running on the same system.
The simplest fix is to turn off the AIX collection in the ux agent,  since this is collecting redundant data that the VIOS agent (va) is already
collecting.  The Unix OS Agent (ux) and VIOS agent (va) share a common code base for AIX called  aixDataProvider.
The va agent runs this code, the same as the AIX Premium agent on AIX LPARs that are not VIOS,but the va agent also collects more data specific to VIOS.
So you don't really need to collect this data for both the va and ux agent, and the va agent is specifically designed for VIOS.
It is unusual for users to have both the va and ux agents running on the VIOS, but if you need UX agent on VIOS for any reason, you can disable aixdp_daemon process in case you experience the problem described in this blog article.
   
You can do it by editing the /opt/IBM/ITM/config/ux.ini file and changing:  
   
KUX_AIXDP=true  
   
to:  
   
KUX_AIXDP=false  
   
Then restart the ux agent.

In this way Unix OS Agent will be initialized without aixdp_daemon process and will be then able to collect and show all the other metrics in TEP.

Thanks for reading.

 

 

Tutorials Point

 

Subscribe and follow us for all the latest information directly on your social feeds:

 

 

image

 

image

 

image

 

 

  

Check out all our other posts and updates:

Academy Blogs:https://goo.gl/U7cYYY
Academy Videos:https://goo.gl/FE7F59
Academy Google+:https://goo.gl/Kj2mvZ
Academy Twitter :https://goo.gl/GsVecH


image

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSVJUL","label":"IBM Application Performance Management"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

UID

ibm11277068