IBM Support

IZ42267: MULTIPLE SOLARIS OS AGENTS EXECUTING.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Environment:
    ITM 6.2 FP1  Solaris OS Agent 6.2 FP1
      Do you think latest patch applied was involved?  No.
    
    Problem Description:
      When running the Solaris OS Agent,  occasionally the forked
      child process spawned by the Agent deadlocks waiting on
      a mutex to become available. The OS Agent continues to
      spawning additional children all waiting for the availability
      of the same mutex.
    
    Detailed Recreation Procedure:
      Start the Solaris OS Agent, running all factory provided
      Situations. Wait for condition to arise, potentially days.
    
    Related Files and Output:
      Log files are available on ECuRep under this PMR.
    
    Approver:
      RL
    

Local fix

Problem summary

  • When running the Solaris OS Agent and executing situations,
    sometimes multiple copies of an OS Agent process will exist at
    the same time in a suspended wait state.
    
    The Solaris OS Agent spawns external programs to collect its
    metrics, such as ifconfig, stat_daemon, vm_stat, proc_stat, and
    nfs_stat.  In order to do this the OS Agent creates a copy of
    its process using the Unix fork() system call and directs the
    child process to execute the external program.  Since forking of
    processes copies all of the parent's memory, excluding mutexes,
    the state of the memory location of the mutex is indeterminant.
    This can result in a process waiting for the mutex to clear when
    it is not truely set due to the memory location value.  When the
    parent process returns to fork itself again in order to spawn
    another external program, it forks a child process that then
    waits indefinitley for the mutex to clear, which never happens.
    The results is the presence of many OS Agent "children" forked
    by the parent OS Agent process, all waiting for the mutex to
    clear so that they may spawn their external programs.
    

Problem conclusion

  • When the OS Agent is forked to create a child OS Agent process
    all of the parent process' memory is duplicated except for the
    state of mutexes.  In the OS Agent child process, the KBB_RAS1=
    environment variable is set to a new trace file name.  This is
    done to redirect the RAS1 tracing output of the spawned program
    to this trace file and not to the original trace file reserved
    for the parent OS Agent.
    
    In the assignment of this environment variable, the use of the
    NLS2_toUTF16 and NLS2_fromUTF16 functions are used to construct
    the string which is assigned to the environment variable.  These
    functions use a mutex to guard a linked list which is used in
    validating the NLS2_Locale object.  It is this mutex that can,
    under rare circumstances, be in an indeterminant state when the
    child OS Agent is forked.  So if this mutex is unlocked by the
    parent OS Agent process, and the child is forked with 50-50%
    chance of the correct state, then when that child attempts to
    acquire the lock for that mutex which is randomly set to the
    locked state, then that child will wait forever until that lock
    is freed.
    
    
    The fix for this APAR is contained in the following maintenance
    packages:
    
       | fix pack | 6.2.0-TIV-ITM-FP0003
    

Temporary fix

Comments

APAR Information

  • APAR number

    IZ42267

  • Reported component name

    TEMS

  • Reported component ID

    5724C04MS

  • Reported release

    620

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2009-01-21

  • Closed date

    2009-04-22

  • Last modified date

    2009-04-22

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TEMS

  • Fixed component ID

    5724C04MS

Applicable component levels

  • R620 PSY

       UP

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSCTLMP","label":"ITM Tivoli Enterprise Mgmt Server V6"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"620","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
22 April 2009