IBM Support

IV78033: LINUX OS AGENT RANDOMLY GOES IN HANG DURING DISK DATA COLLECTION

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • REL: Linux OS Agent 6.30 FP5 IF01
    Problem: Due to a lack of synchronization with the internal
    thread that collects data for the "Disk Usage Trends"
    attribute group it may happen that the  agent crashes while
    responding to a query or a situation on the "Linux
    Disk" attribute group, as the two groups share the same object.
    The event is rare because the "Disk Usage Trends" thread runs
    only once per hour by default.
    Affected Platforms / Versions:
    This issue may affect all the Linux OS agent versions since
    6.30 FP5 IF1
    Diagnostics:
    *** glibc detected *** /opt/IBM/ITM//lx8266/lz/bin/klzagent:
    double free
    or corruption (fasttop): 0x00002b24801c2570 ***
    ======= Backtrace: =========
    /lib64/libc.so.6(+0x75e66)[0x2b24617f4e66]
    /usr/lib64/libstdc++.so.6(_ZNSsD1Ev+0x39)[0x2b246107b4c9]
    /opt/IBM/ITM//lx8266/lz/bin/klzagent(_ZN7CmntentD1Ev+0x44)[0x52e
    f38]
    /opt/IBM/ITM//lx8266/lz/bin/klzagent(_ZSt8_DestroyI7CmntentEvPT_
    +0x15)[0x531199]
    /opt/IBM/ITM//lx8266/lz/bin/klzagent(_ZSt13__destroy_auxIN9__gnu
    _cxx17__
    normal_iteratorIP7CmntentSt6vectorIS2_SaIS2_EEEEEvT_S8_12__false
    _type+0x32)[0x53138a]
    Additional Keywords:
    IV71612
    Disk
    Disk Trends
    6.30.05.01
    KLZ_DISK_SAMPLE_HRS
    

Local fix

  • Increase the value of KLZ_DISK_SAMPLE_HRS to reduce the
    possibility of memory corruption
    

Problem summary

  • Monitoring Agent for Linux OS randomly hangs during Disk data
    collection.
    
    Due to a lack of synchronization with the internal thread that
    collects data for the "Disk Usage Trends" attribute group it may
    happen that the agent hangs while responding to a query or a
    situation on the "Linux Disk" attribute group, as the two groups
    share the same object.  The event is rare because the "Disk
    Usage Trends" thread runs only once per hour by default.
    

Problem conclusion

  • Made mutex recursive and introduced the lock/unlock pair in any
    calls coming from the two attribute groups.
    
    The fix for this APAR is included in the following maintenance
    vehicle:
    
       | fix pack | 6.3.0-TIV-ITM-FP0007 |
       | interim fix | 6.3.0.5-TIV-ITM_LINUX-IF0002 |
       | interim fix | 6.3.0.6-TIV-ITM_LINUX-IF0001 |
    

Temporary fix

  • Set the environment variable KLZ_DISK_SAMPLE_HRS to a very large
     number of seconds in order to reduce the likelihood of this
    rare condition.  For example KLZ_DISK_SAMPLE_HRS=86400 set the
    frequency of the "Disk Usage Trends" thread at 1 day.
    

Comments

APAR Information

  • APAR number

    IV78033

  • Reported component name

    ITM AGENT LINUX

  • Reported component ID

    5724C04LN

  • Reported release

    630

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2015-10-14

  • Closed date

    2015-10-29

  • Last modified date

    2017-01-06

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    ITM AGENT LINUX

  • Fixed component ID

    5724C04LN

Applicable component levels

  • R630 PSY

       UP

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCTLMN","label":"ITM Agent Linux V6"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"630","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
06 January 2017