How Checkpoint Restart and Retry Processing Works

The checkpoint restart feature allows the restarting of copy operations that fail in certain ways, for example, due to network errors. The copy operation is resumed from a previously check-pointed location rather than having to start over from the beginning of the file transfer. Checkpoint restart behavior is controlled by the PNODE. SNODE settings have no effect.

To enable this feature, on the PNODE you must specify a value greater than 0 (0 disables automatic retries) using one of the following parameters:

  • Number of long-term session retry attempts (the number of long-term retry attempts specified for the Connect:Direct Server Adapter)
  • Number of short-term session retry attempts (for the Connect:Direct Server Adapter)
  • Number of Retry attempts for establishing a session (in the Connect:Direct Server Begin Session Service used in a particular business process)

You must also specify the CheckpointInterval in the CopyTo and CopyFrom service of your business process.

Note: Certain errors fail without any retries, for example, when a required parameter is missing or invalid or a remote node is not in the netmap.

Both the Interval between long-term session attempts (minutes) and Number of long-term session retry attempts fields deal with long-term session establishment and re-establishment attempts. To use a shorter timeframe to establish and re-establish sessions, specify the following fields when configuring the Connect:Direct Server Adapter:

  • Number of short-term session retry attempts
  • Interval between short-term session attempts (seconds)

If both the short-term and long-term retry parameters are specified, the values for the short-term parameters are used first, and if a session has still not been established (or re-established), the value for the long-term parameter is used if it is specified.

Checkpoint restart is supported for both inbound and outbound file transfers. This feature is only supported for documents stored in the file system— it is not supported when document storage is set to Database or document encryption is enabled in Sterling B2B Integrator. For more information, see Enable Document Encryption for File System Documents.

Checkpoint restart can be enabled whether or not Sterling Connect:Direct® Secure Plus, compression, or any other communications session factor is enabled or disabled.

In a file transfer, the PNODE determines whether checkpointing will be performed and sets the checkpoint interval. These two parameters can be set as a global default (usually in an initialization parameter or property file) or overridden in the Sterling Connect:Direct Process (Copy step) or Sterling B2B Integrator business process.

The checkpoint information is kept on the target node (the node receiving the file) until the file transfer completes successfully (in which case, this temporary record is destroyed) or kept for a specific period of time if the file transfer fails. When a file transfer fails, the checkpoint information is retained for a specified default length of time on the target system (30 days is common). In Sterling B2B Integrator, this setting is configured by setting the ckptRemoveDate property in the noapp.properties property file located in the properties subdirectory of the installation.

If a remote Sterling Connect:Direct server or the network fails during a copy operation, the Connect:Direct Server Adapter goes into retry mode using its own default long-term and short-term values and waits the specified amount of time and then resumes the copy. These values can be overridden by the following values, if specified, in the Connect:Direct Server Begin Session service:

  • Number of long-term session retry attempts (BeginSessionMaxRetries)
  • Interval between long-term session attempts (BeginSessionRetryInterval)
  • Number of short-term session retry attempts (ShortTermMaxRetries)
  • Interval between short-term session attempts (ShortTermRetryInterval)