JR35910: AFTER APPLYING FP1, SOME JOBS HAVE INTERMITTENTLY THE FOLLOWING FATAL ERRORS.

A fix is available

Download Fix Pack 2 for InfoSphere Information Server Version 8.1

APAR status

Closed as program error.

Error description

[Problem]
After applying FP1, some jobs have intermittently the following
FATAL errors. (since I translated the Japanese message to
English, the wording might not be accurate...)
Jobs were running successfully before applying FP1.


   Item #: 22
   Event ID: 153
   Timestamp: 2010-01-26 17:32:26
   Type:FATAL
   Username: dsadm
   Message ID: IIS-DSEE-TFIO-00231
   Message: /gpf/data/mid/GPF_DS_CD/GPFDSU1200_2.ds,1:
Configured timeout of 600 seconds reached for accepting player
connections for pid 13,636. Pending fifo count: 0. Pending
shared memory count: 1.  This is most likely due to the failure
of an upstream operator.

   Item #: 23
   Event ID: 154
   Timestamp: 2010-01-26 17:32:26
   Type: FATAL
   Username: dsadm
   Message ID: IIS-DSEE-TFPM-00123
   Message: /gpf/data/mid/GPF_DS_CD/GPFDSU1200_2.ds,1: Fatal
Error: Cannot start  ORCHESTRATE network connection on Node
node2 (gpfds). APT_PMConnectionSetup::acceptConnection: Cannot
accept the connection.

 [Additional info.]
- The same issue happens on several jobs.
- This happens intermittently. Some times the job aborts but
some times the job finishes without any problem even though the
same job and same data is used.
- The error message shows 600sec Timeout, but it does not take
600 sec. when the issue happens.
- If the number of node is 1, the issue does not happen even if
he tries to test 10 times. But the issue happens when the number
of node is more than 2.
- now I'm confirming if there is any change on the system around
when applying FP1.
- I'm requesting the job design by using which it is possible to
reproduce the issue.

Local fix

Problem summary

When using multi-node APT_CONFIG_FILE, a job or jobs may abort
with following error even the time interval is much less than 10
minute (600 seconds.)

Message ID: IIS-DSEE-TFIO-00231
Message: <the-stage-name with node-number>: Configured timeout
of 600 seconds reached for accepting player connections for pid
<the-pid>. Pending fifo count: 0. Pending shared memory count:
1.  This is most likely due to the failur of an upstream
operator.

Problem conclusion

```
Install the patch.
```

Temporary fix

```
Using 1 node configuration file.
```

Comments

APAR Information

APAR number
JR35910
Reported component name
WIS DATASTAGE
Reported component ID
5724Q36DS
Reported release
810
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2010-03-14
Closed date
2011-05-13
Last modified date
2011-05-13

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
WIS DATASTAGE
Fixed component ID
5724Q36DS

Applicable component levels

R810 PSY
UP
R850 PSY
UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSVSEF","label":"InfoSphere DataStage"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.1","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
12 October 2021

Tips

JR35910: AFTER APPLYING FP1, SOME JOBS HAVE INTERMITTENTLY THE FOLLOWING FATAL ERRORS.

A fix is available

Subscribe

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R810 PSY

R850 PSY

Document Information

Share your feedback

Need support?