Controlling the recording of hard machine check interruptions

You can use the MODE command to control the recording or monitoring of hard machine-check interruptions.
 
MODE {PD}[,INTERVAL={nnnnn}][,RECORD[=nnn ][,CPU={x  }]
     {SD}           {300  }          |=ALL       {ALL}
     {IV}                            |=25
     {TC}                            |=16
     {PT}                            |=5
     {CC}                            |=10
     {PS}                            |=20
     {AD}
     {SL}     {SC}
     {SS}
     {IC}
     {CO}
     {CS}
 

The parameters are:

PD
Instruction-processing damage machine checks are to be monitored in the specified mode.
SD
System damage machine checks are to be monitored in the specified mode.
IV
Machine checks indicating invalid PSW or registers are to be monitored in the specified mode.
TC
Machine checks indicating TOD clock damage are to be monitored in the specified mode.
PT
Machine checks indicating processor timer damage are to be monitored in the specified mode.
CC
Machine checks indicating clock comparator damage are to be monitored in the specified mode.
PS
Machine checks indicating primary clock synchronization are to be monitored in the specified mode.
AD
Machine checks indicating ETR attachment are to be monitored in the specified mode.
SL
Machine checks indicating switch to local synchronization are to be monitored in the specified mode.
SC
Machine checks indicating ETR synchronization checks are to be monitored in the specified mode.
SS
Machine checks indicating STP synchronization checks are to be monitored in the specified mode.
IC
Machine checks indicating STP island condition are to be monitored in the specified mode.
CO
Machine checks indicating STP configuration change are to be monitored in the specified mode.
CS
Machine checks indicating STP clock source error condition are to be monitored in the specified mode.
INTERVAL=nnnnn
This parameter is used together with the RECORD=nnn parameter. It defines the number of seconds used in counting hard machine check interrupts. If the specified number of seconds elapses before the specified number of interrupts of the specified type occur on the specified processor, the count of that type of interrupt is set to zero, and the counting is started again from zero. If the specified number of hard machine check interrupts does occur in the specified interval, then the system either performs a timer-related recovery action or invokes alternate CPU recovery (ACR) to take the failing processor offline. If the INTERVAL parameter is omitted, then INTERVAL=300 is assumed.
RECORD=nnn
After the specified number (1 to 999) of hard machine checks of the specified type occurs on the specified processor in the specified interval, the system either performs a timer-related recovery action or invokes alternate CPU recovery (ACR) to take the failing processor offline. All interruptions of that type occurring on that processor are recorded on the logrec data set until the specified number is reached. If no number is specified or if the RECORD parameter is omitted, the system uses the following default setting:
  • RECORD=16 for PD
  • RECORD=25 for SL
  • RECORD=20 for SC
  • RECORD=10 for SS, IC, CO, and CS
  • RECORD=5 for all others
RECORD=ALL
All specified hard machine-check interruptions of the specified type occurring on the specified processor are to be recorded on the logrec data set. The system will no longer monitor the frequency of hard machine-check interruptions of that type occurring on that processor.
CPU=x
The address (0, 1, 2, 3...) of the processor to be monitored in the specified mode. If the parameter is omitted, ALL is assumed.
CPU=ALL
All processors in the system are to be monitored in the specified mode.

Example 1:

Monitor instruction-processing-damage machine-check interruptions on processor 0. If seven of these interruptions occur in 600 seconds on processor 0, invoke ACR to take processor 0 offline.
mode  pd,record=7,interval=600,cpu=0

Example 2:

Record on the logrec data set all machine-check interruptions indicating invalid PSW or registers, but do not monitor them for any processor in the system.
MODE  IV,CPU=ALL,RECORD=ALL

Example 3:

Monitor the frequency of system damage machine-check interruptions on all processors, using the default values of five for the RECORD= parameter and 300 for the INTERVAL= parameter. After five system damage machine checks have occurred on a given processor within five minutes (300 seconds), invoke ACR to take that processor offline.
mode sd