Several facilities are provided for Process recovery after a system malfunction. The purpose of Process recovery is to resume execution as quickly as possible and to minimize redundant data transmission after a system failure. The following Connect:Direct® for UNIX facilities are available to enable Process recovery:
- Process step restart—As a Process runs, the steps are recorded in the TCQ. If a Process is interrupted for any reason, the Process is held in the TCQ. When you release the Process to continue running, the Process automatically begins at the step where it halted.
- Automatic session retry—Two sets of connection retry parameters are defined in the remote node information record of the network map file: short-term and long-term. If you do not specify a value for these parameters in the remote node information record, default values are used from the local.node entry of the network map file. The short-term parameters allow immediate retry attempts. Long-term parameters are used after all short-term retries are attempted. Long-term attempts assume that the connection problem cannot be fixed quickly and retry attempts occur after a longer time period, thus saving the overhead of connection retry attempts.
- Checkpoint restart—This feature is available with the copy statement.
Checkpoint restart can be explicitly configured within a copy step through the ckpt parameter. If it is not configured in the copy step, it can be configured in the Initparms through the ckpt.interval parameter.
- Run Task restart—If a Process is interrupted when a run task on an SNODE step is executing,
Connect:Direct for UNIX attempts to synchronize the previous run task step on the SNODE with the current run task step. Synchronization occurs in one of the following ways:
- If the SNODE is executing the task when the Process is restarted, it waits for the task to complete, and then responds to the PNODE with the task completion status. Processing continues.
- If the SNODE task completes before the Process is restarted, it saves the task results. When the Process is restarted, the SNODE reports the results, and processing continues.
If synchronization fails, Connect:Direct for UNIX reads the restart parameter in the run task step or the initialization parameters file to determine whether to perform the run task step again. The restart parameter on the run task step overrides the setting in the initialization parameter.
For example, if the SNODE loses the run task step results due to a Connect:Direct for UNIX cold restart, Connect:Direct for UNIX checks the value defined in the restart parameter to determine whether to perform the run task again.
Run task restart works differently when Connect:Direct for UNIX runs behind a connection load balancer.
- Interruption of Process activity when the SNODE is a
Connect:Direct for UNIX node—When the SNODE is a
Connect:Direct for UNIX node and the PNODE interrupts Process activity by issuing a command to suspend Process activity, deleting an executing Process, or when a link fails or an I/O error occurs during a transfer, the Process is placed in the Wait queue in WS status.
If Process activity does not continue, you must manually delete the Process from the TCQ. You cannot issue a change process command from the SNODE to continue Process activity; the Process can only be restarted by the PNODE, which is always in control of the session.