Question & Answer
Question
System is perhaps pingable but unable to continue active login sessions or open new ones.
Answer
These types of problems are often difficult to resolve from the software side alone. In our experience, a true hard lockup, can be a hardware problem. Make sure that all firmware, BIOS and diagnostics have been examined and or updated.
You first step should be to confirm the state of the machine. Some suggestions:
- Is the system pingable?
- Get kdump setup, configured and tested before the next occurrence.
Sysrq:
Sysrq allows you a "backdoor" into the kernel to gather information or trigger a crashdump if the kernel is still alive, ie pingable.
See /usr/src/<kernelversion>/Documentation/sysrq.txt for details and additional options.
Make sure that /etc/sysctl.conf has the following:
kernel.sysrq = 1
kernel.panic_on_oops = 1 (Best Practice)
- Then run sysctl -p
- You can confirm the sysctl values with the following command: sysctl -A | less
- You can then test it by echo'ng into /proc/sysrq-trigger. i.e.
echo m > /proc/sysrq-trigger
- Memory status information will be logged to /var/log/messages.
- If the system hangs, goto the console and use the Alt-Sysrq sequences below. The Sysrq key is often labeled PrintScreen.
Alt-Sysrq-c
NOTE: On System P (ppc64) system, the Alt-Sysrq will test the crashdump just as expected. In the event of a real panic, the debugger (mon>) will be entered and you will need to type "X" to trigger the dump.
- Alt-Sysrq-p (process listing)
Alt-Sysrq-t (Stacks)
Alt-Sysrq-m (memory)
The value kernel.unknown_nmi_panic will allow you to trigger an NMI and Oops if your hardware has an NMI button. This is suggested when the system is not pingable or otherwise accessible. It is intended to help debug what are generally Hardware problems. You would only enable this normally after a discussion with Customer Support.
Before you enable kernel.unknown_nmi_panic, check to see if nmi_watchdog is enabled by doing the following:
cat /proc/interrupts | grep NMI
If there are nonzero values, you will need to disable nmi_watchdog in the bootloader. Edit the kernel command line to include:
nmi_watchdog=0
Make sure to reboot after the bootloader change and check to make sure it's disabled with:
cat /proc/interrupts | grep NMI
Then, edit /etc/sysctl.conf to include the following:
- kernel.unknown_nmi_panic=1
You can confirm the sysctl values with the following command: sysctl -A | less
NOTE: unknown_nmi_panic is incompatible with nmi_watchdog and the Oracle hangcheck_timer. Please contact Service for additional information.
For Redhat and Suse provide:
The vmcore from the /var/crash/* directory.
Related Information
Was this topic helpful?
Document Information
Modified date:
12 August 2021
UID
isg3T1010236