IBM Support

IZ40989: SSM can hang on Solaris if 20 or more partitions reside on unresponsive disks

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • SSMv4 agent hands when there are 20 or more disk partitions on
    one or more disks being monitored
    

Local fix

  • Add the following variable to init.cfg
    before starting the SSM:
    
    HostresProbeDisks=off
    
    Thais will disable physical hardware probing (hrDiskStorage and
    hrPartition) which triggers the  problem, it will not
    disable other features such as filesystem reporting.
    
    The workaround only works on Fixpack 4 and later.
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    Netcool/SSM 4.0 on Solaris
    ****************************************************************
    PROBLEM DESCRIPTION:
    hostres would get stuck in a feedback loop if the host had more
    than 20 partitions residing on disks that were not responding
    (slow or faulty). This feedback loop would cause hostres to
    update too frequently to be able to respond to any further SNMP
    requests.
    ****************************************************************
    RECOMMENDATION:
    Upgrade to Fix Pack 7 for SSM 4.0.
    ****************************************************************
    

Problem conclusion

  • Fixed in two ways (in the Solaris HrPartitionTable code)
    1. Open() calls on each partition device are now started in
    parallel, instead of sequentially as they were. We then gather
    the results of the parallel queries to populate the table. This
    gives slow or large numbers of disks/partitions time to return
    results in a more responsive manner and reduce the likely
    duration of any hang (if at all).
    2. The HrParitionTable update code will now abort the update
    (un-hang the agent) if it has still been processing for 30
    seconds straight, and then, if no setting is in place will
    switch all disk probing (including hrPartition) off, until
    re-enabled manually by the user. That ensures that if the 30
    second hang occurs even once, it will not happen again unless
    the user allows it. If enabled manually by the user (init.cfg:
    HostresProbeDisks=on) then the code will never automatically
    switch off disk probing.
    
    The fix for this APAR is contained in the following maintenance
    packages:
     | fix pack | 4.0.0-TIV-SSM-FP0007
    

Temporary fix

Comments

APAR Information

  • APAR number

    IZ40989

  • Reported component name

    NETCOOL SYS SVC

  • Reported component ID

    5724P4300

  • Reported release

    400

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2009-01-19

  • Closed date

    2009-02-10

  • Last modified date

    2009-02-10

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    NETCOOL SYS SVC

  • Fixed component ID

    5724P4300

Applicable component levels

  • R400 PSN

       UP

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCP7NT","label":"Netcool System Service Monitor"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"400","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
10 February 2009