RAS Kernel Services
The Reliability, Availability, and Serviceability (RAS) kernel services are used to record the occurrence of hardware or software failures and to capture data about these failures.
The panic kernel service is called when a catastrophic failure occurs and the system can no longer operate. The panic service performs a system dump. The system dump captures data areas that are registered in the Master Dump Table. The kernel and kernel extensions use the dmp_ctl kernel service to add and delete entries in the Master Dump Table, and record dump routine failures.
The errsave and errlast kernel service is called to record an entry in the system error log when a hardware or software failure is detected.
The trcgenk and trcgenkt kernel services are used along with the trchook subroutine to record selected system events in the event-tracing facility.
The ras_register and ras_unregister kernel services register and unregister RAS handlers for a specific component. These handlers are called by the kernel when the system needs to communicate various RAS commands to each component.
The register_HA_handler and unregister_HA_handler kernel services are used to register high availability event handlers for kernel extensions that need to be aware of events such as processor deallocation.
One of the RAS features is a service that monitors for excessive periods of interrupt disablement on a processor, and logs these events to the error log. The disablement_checking_suspend and disablement_checking_resume services exempt a code segment from this detection.