Process Recovery

IBM® Connect:Direct® provides facilities to recover from most errors that occur during Process execution. Recovery from the point of failure is usually accomplished quickly. The following types of errors can occur during normal operation:

  • Link failure terminates a session between IBM Connect:Direct systems
  • File I/O error occurs during Process execution
  • IBM Connect:Direct abends because of a hardware or other error
  • TCQ Corruption

IBM Connect:Direct provides the following facilities to address errors:

Facility Description
Session establishment retry When one or more Processes run with a node, IBM Connect:Direct establishes a session with that node and begins execution. If IBM Connect:Direct cannot start the session, IBM Connect:Direct retries the session establishment. The initialization parameters, MAXRETRIES and WTRETRIES, determine the number of retries and the interval between retries.

If IBM Connect:Direct cannot establish a session after all retries are exhausted, the Process is placed in the Hold queue in the TCQ with a status of Waiting for Connection (WC). When a session is established with the other node, all other Processes are scanned and the highest priority Process is executed after the previous Process is finished.

VTAM automatic session retry If Process execution is interrupted because of a VTAM session failure, IBM Connect:Direct automatically attempts to restart the session. This recovery facility uses the same parameter values as the session establishment retry facility.

If IBM Connect:Direct cannot establish the session, the Process that is executing and any other Processes that are ready to run with the other node are placed in the Hold queue with a status of Waiting for Connection (WC).

TCQ/TCX Repair Utility When the TCQ becomes corrupt because of an outage or other circumstance, IBM Connect:Direct may abend in production or during the next DTF initialization. The IBM Connect:Direct administrator can use the TCQ/TCX repair utility to remove ambiguous or corrupt data and avoid having to cold start the DTF and reinitialize the TCQ, thus losing any Processes left in the TCQ.
Process step checkpoint As a Process executes, IBM Connect:Direct records which step is executing in the TCQ. If Process execution is interrupted for any reason, the Process is held in the TCQ. When the Process is available for execution again, IBM Connect:Direct automatically begins execution at that step.
COPY statement checkpoint/restart For physical sequential files and partitioned data sets, IBM Connect:Direct collects positioning checkpoint information at specified intervals as a COPY statement executes. Checkpoints are taken for each member that is transferred within a PDS, regardless of the checkpoint interval. If the copying procedure is interrupted for any reason, you can restart it at the last checkpoint position.
Note: Whenever a Process step is interrupted and restarted, some data will be retransmitted. Statistics records for the Process step will reflect the actual bytes transferred, and not the size of the file.

The COPY statement checkpoint/restart works in conjunction with step restart. The restart is automatic if IBM Connect:Direct can reestablish a session based on the initialization parameter values for MAXRETRIES and WTRETRIES. See COPY Statement Checkpoint/Restart Facility for more information.

The CHANGE PROCESS command can also invoke the checkpoint/restart facility. See Controlling Processes with Commands for instructions on how to use the CHANGE PROCESS command.

Note: Checkpoint/restart is not supported for I/O exits at this time.