IBM Support

DataStage client crashes frequently with different action codes with connection is broken (81002) error

Troubleshooting


Problem

The DataStage client crashes frequently with the following error messages: - Error calling subroutine: DSR_NLS (Action=4_; check DataStage is set up correctly in project (The connection is broken (81002)) - Error calling subroutine: *DataStage*DSR_PROJECT (Action=7); check DataStage is set up correctly in project . (The connection is broken (81002)) - Error calling subroutine: DSR_EXECJOB (Action=30), check DataStage is set up correctly in project: (The connection is broken (81002)) Note: Other action numbers like 2,3,5,9,24,27 and 31 have been noted also.

Diagnosing The Problem

DataStage clients use an RPC mechanism to call helper subroutines on the DSEngine server.

The 81002 error indicates that one of those RPC calls failed because the previously established socket connection had been broken.

This can occur if the server-side process associated with the connection dsapi_slave has crashed or because of a network failure.

If the 81002 error occurs and the dsapi_slave process is still running it suggests a network failure.

The fact that the failures occur from different actions on different helper subroutines also suggests a network failure.

Another diagnosis when the DataStage Designer crashes with the error::



Error calling subroutine: DSR_EXECJOB (Action=30), check DataStage is set up correctly in project

and the following error occurs in DataStage Director:

Error calling subroutine: *DataStage*DSR_PROJECT (Action=7); check
DataStage is set up correctly in project

Error calling subroutine: *DataStage*DSR.ADMIN (Action=36); check
DataStage is set up correctly in project

It is found that the DSChecksum routine failed to write to file ?. checksum.

The main reason is because the dsapi_slave process owner (ie. the operating system user which has been credential mapped to an Information Server user) does not have the permission to access the ?.checksum file (ie. Designer session belonged to a different operating system group from the dsadm user).

In the server side trace, the following is seen:

2013-02-21 14:19:51: DSR_EXECJOB IN =
Arg1=30
Arg2=DW_TRF_ACTIVE_MAP_MST_TI_R
Arg3=0
\Jobs\TEST\JHK\ONEID

Program "DSCheckSum": Line 71, WRITE failure.

Resolving The Problem

To resolve the issue:

  1. If the arguments to the dscs process indicate that the idle timeout is set to unlimited, these processes will not go away until DataStage services are restarted. To remove orphaned client connections, DSEngine services can be restarted.
  2. If the 81002 errors occur after the client has been left idle for some time, the firewall might be disconnecting apparently idle connections. Reduce the tcp_keepalive_timeout from its current value to a lower value.
    For example:

    # sysctl -w net.ipv4.tcp_keepalive_time=600


For a similar issue, "DataStage Client gets disconnected after long period of inactivity", refer to this technote:



http://www.ibm.com/support/docview.wss?uid=swg21515279


To check the group ownership of the ?.checksum file:
  1. change directory to /.../RT_BPnnn.O
  2. type the command, ls -l

    For example,
    $ -rwxrwxr-x 1 o3442kdh dwadmin 11 Feb 25 11:18 ?.checksum
  3. Change the operating system user's primary group to match the primary group of the dsadm user

[{"Product":{"code":"SSVSEF","label":"IBM InfoSphere DataStage"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF033","label":"Windows"}],"Version":"8.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21589157