Commit-failed recovery

Commit failure support is provided only by CICS® file control, because it is the only CICS component that needs this support.

A commit failure is one that occurs during the commit stage of a unit of work (either following the prepare phase of two-phase commit, or following backout of the unit of work). It means that the unit of work has not yet completed, and the commit must be retried successfully before the recovery manager can forget about the unit of work.

When a failure occurs during file control’s commit processing, CICS ensures that all the unit of work log records for updates made to data sets that have suffered the commit failure are kept by the recovery manager. Preserving the log records ensures that the commit processing for the unit of work can be retried later when conditions are favorable.

The most likely cause of a file control commit failure, from which a unit of work can recover, is that the SMSVSAM server is not available when file control is attempting to release the RLS locks. When other SMSVSAM servers in the sysplex detect that a server has failed, they retain all the active exclusive locks held by the failed server on its behalf. Therefore, CICS does not need to retain locks explicitly when a commit failure occurs. When the SMSVSAM server becomes available again, the commit is automatically retried.

However, it is also possible for a file control commit failure to occur as a result of some other error when CICS is attempting to release RLS locks during commit processing, or is attempting to convert some of the locks into retained locks during the commit processing that follows a backout failure. In this case it may be necessary to retry the commit explicitly using the SET DSNAME RETRY command. Such failures should be rare, and may be indicative of a more serious problem.

It is possible for a unit of work that has not performed any recoverable work, but which has performed repeatable reads, to suffer a commit failure. If the SMSVSAM server fails while holding locks for repeatable read requests, it is possible to access the records when the server recovers, because all repeatable read locks are released at the point of failure. If the commit failure is not due to a server failure, the locks are held as active shared locks. The INQUIRE UOWDSNFAIL command distinguishes between a commit failure where recoverable work was performed, and one for which only repeatable read locks were held.