Parallel job ends with an APT_PMsyncWithSectionLeaders: Non-zero status 4 error.


The InfoSphere® DataStage® Parallel job log shows the following information:
APT_PMsyncWithSectionLeaders: non-zero status 4 from APT_PMpollUntilZero  broadcastStepIR: step timed out sending 66,135,142-bytescore; status = 4  Error during score broadcast or reload. Score size is 66,135,142 bytes


The error usually indicates that a resource issue is the cause of the problem.

Resolving the problem

The environment variable APT_PM_NODE_TIMEOUT controls the number of seconds that the conductor waits for a section leader to start and load a score before you decide that something failed. The default for starting a section leader process is 30 seconds. The default for loading a score is 120 seconds. Set the following environment variable at the project level: APT_PM_NODE_TIMEOUT=300

You can increase the value of this environment variable to 600 if 300 does not resolve the problem. If the APT_PM_NODE_TIMEOUT environment variable does not correct the issue, monitor processor, disk space, memory, and swap when the job is running. Check with your network administrator to see if the nodes are on a SAN or NFS mount.