Effect of problems with the system log

A key value of CICS® is its ability to implement its transactional recovery commitments and thus safeguard the integrity of recoverable data updated by CICS applications.

This ability relies upon logging before-images and other information to the system log. However, the system log itself might suffer software or hardware related problems, including failures in the CICS recovery manager, the CICS logger domain, or the z/OS® system logger. Although problems with these components are unlikely, you must understand the actions to take to minimize the impact of such problems.

If the CICS log manager detects an error in the system log that indicates previously logged data has been lost, it initiates a shutdown of the region. This action minimizes the number of transactions that fail after a problem with the log is detected and therefore minimizes the data integrity exposure.

Any problem with the system log that indicates that it might not be able to access all the data previously logged invalidates the log. In this case, you can perform only a diagnostic run or an initial start of the region to which the system log belongs.

The reason that a system log is completely invalidated by these kinds of error is that CICS can no longer rely on the data it previously logged being available for recovery processing. For example, the last records logged might be unavailable, and therefore recovery of the most recent units of work cannot be carried out. However, data might be missing from any part of the system log and CICS cannot identify what is missing. CICS cannot examine the log and determine exactly what data is missing, because the log data might appear consistent in itself even when CICS has detected that some data is missing.

These are the messages that CICS issues as it reads the log during system initialization except when START=INITIAL is specified. They can help you identify which units of work were recovered:
DFHRM0402
This message is issued for each unit of work when it is first encountered on the log.
DFHRM0403 and DFHRM0404
One of these messages is issued for each unit of work when its context is found. The message reports the state of the unit of work.
DFHRM0405
This message is issued when a complete keypoint has been recovered from the log.
If you see that message DFHRM0402 is issued for a unit of work, and it is matched by message DFHRM0403 or DFHRM0404, you can be sure of the state of the unit of work. If you see message DFHRM0405, you can use the preceding messages to determine which units of work are incomplete, and you can also be sure that none of the units of work is completely missing.

Another class of problem with the system log is one that does not indicate any loss of previously logged data; for example, access to the logstream was lost due to termination of the z/OS system logger address space. This class of problem causes an immediate termination of CICS because a subsequent emergency restart will probably succeed when the cause of the problem has been resolved.

For information about how to deal with system log problems, see Some conditions that cause CICS log manager error messages.