IBM Support

JR42993: DSGetLinkInfo returns incorrect row counts for parallel jobs on heavily loaded Windows server

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • DSGetlinkInfo intermittently returns incorrect row counts when a
    large number of multi instance parallel jobs are running
    concurrently on a Windows system.  This problem is related to
    the rapid  recycling of Process ID numbers on Windows.
    DataStage keeps a cache of stage information for jobs which is
    supposed to go back about 10 minute.  The PID number is part of
    the internal key to access the information. If the PID recycles
    too quickly the cache gets inconsistent information.
    

Local fix

  • N/A
    Running fewer instances of the same job will eliminate this
    problem. However, this will limit job throughput.
    

Problem summary

  • Customer has complex environment where many multi-instance
    parallel jobs are run simultaneously and the results of job runs
    are retrieved using DSGetLinkInfo.
    
    For parallel jobs row count data is obtained as follows
    - we launch the job as an osh process and capture the process id
      the runtime osh processes register with JobMonApp using the
    process oid as a jobid and send it rowcount and other monitoring
    data.
    - we launch DSD.OshMonitor passing the process id as an argument
      this connects to JobMonApp and requests information about the
    running job identified by the process id.
      The returned data is stored in the RT_STATUS? file form where
    it is retrieved by DSGetLinkInfo
    
    Analysis of debug trace data for JobMonApp and DSD.OshMonitor
    show that JobMonApp is returning information about an earlier
    job run that used the same process id and this results in
    DSD.OshMonitor reporting incorrect link row counts.
    
    Recommendation:
    Apply patch JR42993
    

Problem conclusion

  • Changed code to optionally construct a job id that is guaranteed
    to be unique this has to be explicitly enabled by adding an
    environment variable DS_GENERATE_UNIQUE_JOBID with a value of 1
    

Temporary fix

Comments

APAR Information

  • APAR number

    JR42993

  • Reported component name

    WIS DATASTAGE

  • Reported component ID

    5724Q36DS

  • Reported release

    850

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-06-04

  • Closed date

    2012-07-03

  • Last modified date

    2012-07-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WIS DATASTAGE

  • Fixed component ID

    5724Q36DS

Applicable component levels

  • R850 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSVSEF","label":"IBM InfoSphere DataStage"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
09 July 2012