Process Recovery

Sterling Connect:Direct® provides facilities to recover from most errors that occur during Process execution. Recovery from the point of failure is usually accomplished quickly. The following types of errors can occur during normal operation:

Sterling Connect:Direct provides the following facilities to address errors:

Facility Description
Session establishment retry When one or more Processes run with a node, Sterling Connect:Direct establishes a session with that node and begins execution. If Sterling Connect:Direct cannot start the session, Sterling Connect:Direct retries the session establishment. The initialization parameters, MAXRETRIES and WTRETRIES, determine the number of retries and the interval between retries.

If Sterling Connect:Direct cannot establish a session after all retries are exhausted, the Process is placed in the Hold queue in the TCQ with a status of Waiting for Connection (WC). When a session is established with the other node, all other Processes are scanned and the highest priority Process is executed after the previous Process is finished.

VTAM automatic session retry If Process execution is interrupted because of a VTAM session failure, Sterling Connect:Direct automatically attempts to restart the session. This recovery facility uses the same parameter values as the session establishment retry facility.

If Sterling Connect:Direct cannot establish the session, the Process that is executing and any other Processes that are ready to run with the other node are placed in the Hold queue with a status of Waiting for Connection (WC).

TCQ/TCX Repair Utility When the TCQ becomes corrupt because of an outage or other circumstance, Sterling Connect:Direct may abend in production or during the next DTF initialization. The Sterling Connect:Direct administrator can use the TCQ/TCX repair utility to remove ambiguous or corrupt data and avoid having to cold start the DTF and reinitialize the TCQ, thus losing any Processes left in the TCQ.
Process step checkpoint As a Process executes, Sterling Connect:Direct records which step is executing in the TCQ. If Process execution is interrupted for any reason, the Process is held in the TCQ. When the Process is available for execution again, Sterling Connect:Direct automatically begins execution at that step.
COPY statement checkpoint/restart For physical sequential files and partitioned data sets, Sterling Connect:Direct collects positioning checkpoint information at specified intervals as a COPY statement executes. Checkpoints are taken for each member that is transferred within a PDS, regardless of the checkpoint interval. If the copying procedure is interrupted for any reason, you can restart it at the last checkpoint position.
Note: Whenever a Process step is interrupted and restarted, some data will be retransmitted. Statistics records for the Process step will reflect the actual bytes transferred, and not the size of the file.

The COPY statement checkpoint/restart works in conjunction with step restart. The restart is automatic if Sterling Connect:Direct can reestablish a session based on the initialization parameter values for MAXRETRIES and WTRETRIES. See COPY Statement Checkpoint/Restart Facility for more information.

The CHANGE PROCESS command can also invoke the checkpoint/restart facility. See Controlling Processes with Commands for instructions on how to use the CHANGE PROCESS command.

Note: Checkpoint/restart is not supported for I/O exits at this time.