News
Abstract
While running resource intensive workloads on an IBM Power11 9080-HEU server with 64 TB of memory and 256 cores (2048 logical processors), the following hard lockups (and in some cases, soft lockups) were seen, as shown in the Symptoms section.
Content
Linux Releases Affected
SUSE Linux Enterprise Server (SLES) 15, Service Pack (SP) 6
Red Hat Enterprise Linux (RHEL) 10
IBM Systems Affected
All IBM Power Systems
Symptoms
The following hard lockups (and in some cases, soft lockups) were seen:
[ C1629] watchdog: CPU 1629 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x8e4/0x14b0
[ C1629] watchdog: CPU 1629 TB:2071965991551168, last heartbeat TB:2071960063310583 (11578ms ago)
[ C1467] watchdog: CPU 1467 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x8e4/0x14b0
[ C1467] watchdog: CPU 1467 TB:2071965996668630, last heartbeat TB:2071959863658451 (11978ms ago)
Message from syslogd@localhost at Mar 25 06:14:10 ...
kernel:[ C1629] watchdog: CPU 1629 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x8e4/0x14b0
Message from syslogd@localhost at Mar 25 06:14:10 ...
kernel:[ C1629] watchdog: CPU 1629 TB:2071965991551168, last heartbeat TB:2071960063310583 (11578ms ago)
Message from syslogd@localhost at Mar 25 06:14:10 ...
kernel:[ C1467] watchdog: CPU 1467 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x8e4/0x14b0
Message from syslogd@localhost at Mar 25 06:14:10 ...
kernel:[ C1467] watchdog: CPU 1467 TB:2071965996668630, last heartbeat TB:2071959863658451 (11978ms ago)
[ C1491] watchdog: CPU 1491 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x8e4/0x14b0
[ C1491] watchdog: CPU 1491 TB:2071966022290386, last heartbeat TB:2071959889241715 (11978ms ago)
On a system with above mentioned processor and memory resources with millions of parallel memory and CPU intensive threads running, soft and hard lockups are seen. Due to a limitation, the min_free_kbytes value which controls the memory reclaim process, cannot be set to 10% of the available memory.
Workaround
You can choose from any of the following options to help with the issue:
- Increase the watchdog_thresh from 10 to at least 20. It might need to be increased more based on the load. You can run the following command:
# echo 20 > /proc/sys/kernel/watchdog_thresh (or even higher) - Increase the resource limits (ulimits) both hard and soft limits.
- Increase the wq_watchdog thresh from current 30 to 60 or 90 by running the following command:
Or disable it by running the following command:# echo 60 > /sys/module/workqueue/parameters/watchdog_thresh# echo 0 > /sys/module/workqueue/parameters/watchdog_thresh - Increase the min_free_kbytes to a significant value by running the following command:
# sysctl vm.min_free_kbytes=1294967300 - Ensure console messages are minimal by running the following commands:
# echo 4 > /proc/sys/kernel/printk # echo N | tee /sys/module/printk/parameters/*
Fix Outlook
Fix will be provided in a future release.
I/O device impacted
Not Applicable
[{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SGMV168","label":"IBM Support for SUSE Linux Enterprise Server"},"ARM Category":[{"code":"a8m0z000000Gnl7AAC","label":"Red Hat Enterprise Linux"},{"code":"a8m0z000000GnlCAAS","label":"SUSE Linux Enterprise Server"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"15.0.0"}]
Was this topic helpful?
Document Information
Modified date:
29 July 2025
UID
ibm17240412