APAR status
Closed as program error.
Error description
DSGetlinkInfo intermittently returns incorrect row counts when a large number of multi instance parallel jobs are running concurrently on a Windows system. This problem is related to the rapid recycling of Process ID numbers on Windows. DataStage keeps a cache of stage information for jobs which is supposed to go back about 10 minute. The PID number is part of the internal key to access the information. If the PID recycles too quickly the cache gets inconsistent information.
Local fix
N/A Running fewer instances of the same job will eliminate this problem. However, this will limit job throughput.
Problem summary
Customer has complex environment where many multi-instance parallel jobs are run simultaneously and the results of job runs are retrieved using DSGetLinkInfo. For parallel jobs row count data is obtained as follows - we launch the job as an osh process and capture the process id the runtime osh processes register with JobMonApp using the process oid as a jobid and send it rowcount and other monitoring data. - we launch DSD.OshMonitor passing the process id as an argument this connects to JobMonApp and requests information about the running job identified by the process id. The returned data is stored in the RT_STATUS? file form where it is retrieved by DSGetLinkInfo Analysis of debug trace data for JobMonApp and DSD.OshMonitor show that JobMonApp is returning information about an earlier job run that used the same process id and this results in DSD.OshMonitor reporting incorrect link row counts. Recommendation: Apply patch JR42993
Problem conclusion
Changed code to optionally construct a job id that is guaranteed to be unique this has to be explicitly enabled by adding an environment variable DS_GENERATE_UNIQUE_JOBID with a value of 1
Temporary fix
Comments
APAR Information
APAR number
JR42993
Reported component name
WIS DATASTAGE
Reported component ID
5724Q36DS
Reported release
850
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-06-04
Closed date
2012-07-03
Last modified date
2012-07-09
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WIS DATASTAGE
Fixed component ID
5724Q36DS
Applicable component levels
R850 PSY
UP
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSVSEF","label":"IBM InfoSphere DataStage"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
09 July 2012