Managing system hang
System hang management allows users to run mission-critical applications continuously while improving application availability. System hang detection alerts the system administrator of possible problems and then allows the administrator to log in as root or to reboot the system to resolve the problem.
shconf command
The shconf command is invoked when System Hang Detection is enabled. The shconf command configures which events are surveyed and what actions are to be taken if such events occur. You can specify any of the following actions, the priority level to check, the time out while no process or thread executes at a lower or equal priority, the terminal device for the warning action, and the getty command action:
- Log an error in errlog file
- Display a warning message on the system console (alphanumeric console) or on a specified TTY
- Reboot the system
- Give a special getty to allow the user to log in as root and launch commands
- Launch a command
For the Launch a command and Give a special getty options, system hang detection launches the special getty command or the specified command at the highest priority. The special getty command prints a warning message that it is a recovering getty running at priority 0. The following table captures the various actions and the associated default parameters for priority hang detection. Only one action is enabled for each type of detection.
Option | Enablement | Priority | Timeout (seconds) |
---|---|---|---|
Log an error in errlog file | disabled | 60 | 120 |
Display a warning message | disabled | 60 | 120 |
Give a recovering getty | enabled | 60 | 120 |
Launch a command | disabled | 60 | 120 |
Reboot the system | disabled | 39 | 300 |
For lost IO detection, you can set the time out value and enable the following actions:
Option | Enablement |
---|---|
Display a warning message | disabled |
Reboot the system | disabled |
shdaemon daemon
The shdaemon daemon is a process that is launched by init and runs at priority 0 (zero). It is in charge of handling system hang detection by retrieving configuration information, initiating working structures, and starting detection times set by the user.