Deallocation error log entries
Three different error log messages are associated with CPU deallocation.
The following are examples.
- errpt short format - summary
- The following is an example of entries displayed by the errpt command (without options):
# errpt IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 804E987A 1008161399 I O proc4 CPU DEALLOCATED 8470267F 1008161299 T S proc4 CPU DEALLOCATION ABORTED 1B963892 1008160299 P H proc4 CPU FAILURE PREDICTED #
- If processor deallocation is enabled, a
CPU FAILURE PREDICTED
message is always followed by either aCPU DEALLOCATED
message or aCPU DEALLOCATION ABORTED
message. - If processor deallocation is not enabled, only the
CPU FAILURE PREDICTED
message is logged. Enabling processor deallocation any time after one or moreCPU FAILURE PREDICTED
messages have been logged initiates the deallocation process and results in a success or failure error log entry, as described above, for each processor reported failing.
- If processor deallocation is enabled, a
- errpt long format - detailed description
- The following is the form of output obtained with errpt
-a:
CPU_FAIL_PREDICTED
Error description: Predictive Processor Failure
This error indicates that the hardware detected that a processor has a high probability to fail in a near future. It is always logged whether or not processor deallocation is enabled.
DETAIL DATA: Physical processor number, location
Example error log entry - long formLABEL: CPU_FAIL_PREDICTED IDENTIFIER: 1655419A Date/Time: Thu Sep 30 13:42:11 Sequence Number: 53 Machine Id: 00002F0E4C00 Node Id: auntbea Class: H Type: PEND Resource Name: proc25 Resource Class: processor Resource Type: proc_rspc Location: 00-25 Description CPU FAILURE PREDICTED Probable Causes CPU FAILURE Failure Causes CPU FAILURE Recommended Actions ENSURE CPU GARD MODE IS ENABLED RUN SYSTEM DIAGNOSTICS. Detail Data PROBLEM DATA 0144 1000 0000 003A 8E00 9100 1842 1100 1999 0930 4019 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 4942 4D00 5531 2E31 2D50 312D 4332 0000 0002 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ... ... ... ... ...
CPU_DEALLOC_SUCCESS
Error Description: A processor has been successfully deallocated after detection of a predictive processor failure. This message is logged when processor deallocation is enabled, and when the CPU has been successfully deallocated.
DETAIL DATA: Logical CPU number of deallocated processor.
Example: error log entry - long form:
In this example, proc24 was successfully deallocated and was logical CPU 0 when the failure occurred.LABEL: CPU_DEALLOC_SUCCESS IDENTIFIER: 804E987A Date/Time: Thu Sep 30 13:44:13 Sequence Number: 63 Machine Id: 00002F0E4C00 Node Id: auntbea Class: O Type: INFO Resource Name: proc24 Description CPU DEALLOCATED Recommended Actions MAINTENANCE IS REQUIRED BECAUSE OF CPU FAILURE Detail Data LOGICAL DEALLOCATED CPU NUMBER 0
CPU_DEALLOC_FAIL
Error Description: A processor deallocation, due to a predictive processor failure, was not successful. This message is logged when CPU deallocation is enabled, and when the CPU has not been successfully deallocated.
DETAIL DATA:
Reason code, logical CPU number, additional information depending of the type of failure.The reason code is a numeric hexadecimal value. The possible reason codes are:Item Description 2
One or more processes/threads remain bound to the last logical CPU. In this case, the detailed data give the PIDs of the offending processes. 3
A registered driver or kernel extension returned an error when notified. In this case, the detailed data field contains the name of the offending driver or kernel extension (ASCII encoded). 4
Deallocating a processor causes the machine to have less than two available CPUs. This operating system does not deallocate more than N-2 processors on an N-way machine to avoid confusing applications or kernel extensions using the total number of available processors to determine whether they are running on a Uni Processor (UP) system where it is safe to skip the use of multiprocessor locks, or a Symmetric Multi Processor (SMP). 200 (0xC8)
Processor deallocation is disabled (the ODM attribute cpuguard has a value of disable
). You normally do not see this error unless you start ha_star manually.Examples: error log entries - long format
Example 1:
In this example, the deallocation for proc26 failed. The reason codeLABEL: CPU_DEALLOC_ABORTED IDENTIFIER: 8470267F Date/Time: Thu Sep 30 13:41:10 Sequence Number: 50 Machine Id: 00002F0E4C00 Node Id: auntbea Class: S Type: TEMP Resource Name: proc26 Description CPU DEALLOCATION ABORTED Probable Causes SOFTWARE PROGRAM Failure Causes SOFTWARE PROGRAM Recommended Actions MAINTENANCE IS REQUIRED BECAUSE OF CPU FAILURE SEE USER DOCUMENTATION FOR CPU GARD Detail Data DEALLOCATION ABORTED CAUSE 0000 0003 DEALLOCATION ABORTED DATA 6676 6861 6568 3200
3
means that a kernel extension returned an error to the kernel notification routine. TheDEALLOCATION ABORTED DATA
above spells fvhaeh2, which is the name the extension used when registering with the kernel.Example 2:
In this example, the deallocation for proc19 failed. The reason codeLABEL: CPU_DEALLOC_ABORTED IDENTIFIER: 8470267F Date/Time: Thu Sep 30 14:00:22 Sequence Number: 71 Machine Id: 00002F0E4C00 Node Id: auntbea Class: S Type: TEMP Resource Name: proc19 Description CPU DEALLOCATION ABORTED Probable Causes SOFTWARE PROGRAM Failure Causes SOFTWARE PROGRAM Recommended Actions MAINTENANCE IS REQUIRED BECAUSE OF CPU FAILURE; SEE USER DOCUMENTATION FOR CPU GARD Detail Data DEALLOCATION ABORTED CAUSE 0000 0002 DEALLOCATION ABORTED DATA 0000 0000 0000 4F4A
2
indicates thread(s) were bound to the last logical processor and did not unbind after receiving the SIGCPUFAIL signal. TheDEALLOCATION ABORTED DATA
shows that these threads belonged to process 0x4F4A.Options of the ps command (
-o THREAD, -o BND
) allow you to list all threads or processes along with the number of the CPU they are bound to, when applicable.Example 3:
In this example, the deallocation of proc2 failed because there were two or fewer active processors at the time of failure (reason codeLABEL: CPU_DEALLOC_ABORTED IDENTIFIER: 8470267F Date/Time: Thu Sep 30 14:37:34 Sequence Number: 106 Machine Id: 00002F0E4C00 Node Id: auntbea Class: S Type: TEMP Resource Name: proc2 Description CPU DEALLOCATION ABORTED Probable Causes SOFTWARE PROGRAM Failure Causes SOFTWARE PROGRAM Recommended Actions MAINTENANCE IS REQUIRED BECAUSE OF CPU FAILURE SEE USER DOCUMENTATION FOR CPU GARD Detail Data DEALLOCATION ABORTED CAUSE 0000 0004 DEALLOCATION ABORTED DATA 0000 0000 0000 0000
4
).