IBM Support

IJ44934: ON UNIX, AGENT SOMETIMES CRASHES WHEN MONITORING AIX NETWORK ADAPTERS

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • The code collects the device data in two parts.  It calls
    "entstat" to collect the totals data.   Then for the attributes
    the are rates (e.g.per_Sec_ are calculated calling the "entstat"
    again.
    
    The problem occurs when the first call to "entstat" returns a
    device name but then when calling "entstat" again to get the
    rate data no data is returned for the device.  The code handles
    this scenario, however there was an issue in that a pointer was
    incremented before it should have been.
    
    Environment:  UNIX OS Agent:  ITM 6.30 FP7 SP2 and Higher
    
    Description:  On UNIX, the agent sometimes crashes when
    collecting data for the AIX Network Adapters attribute group. A
    core file is generated, however it was obsereved that Watchdog
    detected the crash and restarted the monitoring agent.
    
    Workaround: Stop situations and historical data collection for
    the AIX Network Adapters attribute group.
    
    Logs:
    With tracing set to ERROR, message towards the end of the log:
    (63B4C96E.0000-8:kux45agt.cpp,322,"TakeSample") WARNING: Network
    Adapter Rates data not found for key:'ent6@pci3
    
    Core file:
    A core file may be generated in the $CANDLEHOME/<interp>/ux/bin
    directory.  When dbx (on AIX) was run against the core file the
    following stack trace was reported listing "line 397 in
    kux45agt.cpp".
    
    (dbx) where
    pthread_kill(??, ??) at 0x900000000624418
    _p_raise(??) at 0x900000000623ca4
    raise.raise(??) at 0x9000000000cde08
    abort() at 0x9000000000f8708
    std::myabort()() at 0x900000000e5858c
    std::terminate()() at 0x900000000e58468
    exceptio.std::myabort().terminate()() at 0x900000000e5838c
    __DoThrowV6() at 0x900000000e5b9f0
    kux45agt.std::_Tree<std::_Tmap_traits<std::_LFS_ON::basic_strin
    g<char,std::char_traits<char>,std::allocator<char>>,aixnetadpt_t
    <char,std::char_traits<char>,std::allocator<char>> >,std::alloca
    har,std::char_traits<char>,std::allocator<char>>,aixnetadpt_tota
    _traits<std::_LFS_ON::basic_string<char,std::char_traits<char>,
    std::allocator<char>>,aixnetadpt_total_data_t*,std::less<std::_L
    <char,std::char_traits<char>,std::allocator<char>> >,std::alloca
    d::char_traits<char>,std::allocator<char>>,aixnetadpt_total_data
    0x0000000110062740, __classReturn = &(...), _P = (...)), line
    142 in "xtree.t"
    unnamed block in omunx_aixnetadpt_agent::TakeSample()(this =
    0x0000000110bf41b0), line 397 in "kux45agt.cpp"
    unnamed block in omunx_aixnetadpt_agent::TakeSample()(this =
    0x0000000110bf41b0), line 397 in "kux45agt.cpp"
    unnamed block in omunx_aixnetadpt_agent::TakeSample()(this =
    0x0000000110bf41b0), line 397 in "kux45agt.cpp"
    omunx_aixnetadpt_agent::TakeSample()(this = 0x0000000110bf41b0),
    line 397 in "kux45agt.cpp"
    ctira::DriveDataCollection()(0x110bf41b0) at 0x9000000053f18b8
    TableManager::checkForExpiredRequests(long)(0x110cb1530,
    0x63bdb448) at 0x9000000054014f0
    TableManager::timeout(CTRA_Timerspec_*)(0x110cb1678) at
    0x9000000054037c8
    CTRA_timer_base::TimerCallbackHandler()(0x110bf64b0) at
    0x90000000540a9e4
    krabutmr.Handler_base(void*)(0x110bf64b0) at 0x90000000540b14c
    krabuptm.CTRA_timer_task(void*)(0x110bf90f0) at
    0x90000000540deb0
    

Local fix

  • Workaround: Stop situations and historical data collection for
    the AIX Network Adapters attribute group.
    

Problem summary

  • On UNIX, agent sometimes crashes when monitoring AIX Network
    Adapters attribute group.
    
    
    On UNIX, the agent sometimes crashes when collecting data for
    the AIX Network Adapters attribute group.  A core file is
    generated, however it was obsereved that Watchdog detected the
    crash and restarted the monitoring agent.
    

Problem conclusion

  • The issue occurs in the code path where the device totals have
    been collected (using entstat), however when calling entstat
    again to calculate the rate (per_sec) values, no data is
    returned from entstat for a device name.  The code handles this
    scenario, however there was an issue with the pointer.  The code
     has been updated to handle the pointer update correctly for
    this scenario.
    
    
    The fix for this APAR is contained in the following maintenance
    packages:
    
       | service pack | 6.3.0.7-TIV-ITM-SP0014
    

Temporary fix

  • Stop situations and historical collections for AIX Network
    Adapters attribute group.
    

Comments

APAR Information

  • APAR number

    IJ44934

  • Reported component name

    ITM AGENT UNIX

  • Reported component ID

    5724C040U

  • Reported release

    630

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2023-01-11

  • Closed date

    2023-04-17

  • Last modified date

    2023-04-17

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    ITM AGENT UNIX

  • Fixed component ID

    5724C040U

Applicable component levels

[{"Business Unit":{"code":"BU029","label":"Software"},"Product":{"code":"SSZ8F3","label":"IBM Tivoli Monitoring V6"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"630"}]

Document Information

Modified date:
18 April 2023