IBM Support

Qemu guest boot fails with continuous soft-lockups for certain custom NUMA topology

Flashes (Alerts)


Abstract

QEMU guests on PPC64 might hang during startup with soft-lockups if the NUMA topology creates partially overlapping CPU masks.

Content

Linux Releases Affected

SLES-16 PPC64LE Kernels 6.12.0-160000.16-default or later.

IBM Systems Affected

Any IBM Power System LPAR that aupports and runs a KVM Guest in an LPAR.

Symptoms

Guest system hangs during early boot.

In these cases, the guest might stop responding during startup and show the following lockup messages:

Quiescing Open Firmware ...
Booting Linux via __start() @ 0x0000000000250000 ...
[   24.403004][    C0] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[   48.502675][    C0] watchdog: BUG: soft lockup - CPU#0 stuck for 45s! [swapper/0:1]
[   60.770656][    C0] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   60.771038][    C0] rcu:     (detected by 0, t=6002 jiffies, g=-1171, q=2 ncpus=8)
[   60.771257][    C0] rcu: All QSes seen, last rcu_sched kthread activity 6002 (4294943345-4294937343), jiffies_till_next_fqs=1, root ->qsmask 0x0
[   60.771673][    C0] rcu: rcu_sched kthread starved for 6002 jiffies! g-1171 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[   60.771818][    C0] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.

Root cause

Before the fix (f55dac1dafb3334be1), the kernel correctly identified partial overlaps between CPU masks and treated the topology as invalid, returning early from `build_sched_domains()` to prevent invalid domain creation.

After the fix, the scheduler sometimes compares wrong CPU masks, allowing partial overlaps to pass. This situation creates invalid domain hierarchies, leading to soft-lockups.

A partial overlap occurs when two domains share some CPUs but are not fully nested or fully separate.

Example

```
-numa dist,src=0,dst=0,val=10 -numa dist,src=0,dst=1,val=10 -numa dist,src=0,dst=2,val=10 -numa dist,src=0,dst=3,val=10
-numa dist,src=1,dst=0,val=10 -numa dist,src=1,dst=1,val=10 -numa dist,src=1,dst=2,val=10 -numa dist,src=1,dst=3,val=10
-numa dist,src=2,dst=0,val=10 -numa dist,src=2,dst=1,val=10 -numa dist,src=2,dst=2,val=10 -numa dist,src=2,dst=3,val=11
-numa dist,src=3,dst=0,val=10 -numa dist,src=3,dst=1,val=10 -numa dist,src=3,dst=2,val=11 -numa dist,src=3,dst=3,val=10
```

Node 2 and Node 3 partially overlap

Node 2: CPUmask {0,1,2}

Node 3: CPU mask{0,1,3}

This should be rejected, but the fix treats it as valid and leads to soft-lockups.

Note: This is a synthetic scenario, uncommon on real hardware. Physical NUMA systems usually have hierarchical or symmetric distances.

Workaround

Using symmetric NUMA distances between nodes, along with NUMA topologies where CPU masks either fully overlap or are disjoint, can help prevent this issue.

Fix Outlook

SUSE mirrored bug number: 1246843

You can see the fix in the upstream master kernel:

https://github.com/torvalds/linux/commit/661f951e371cc134ea31c84238dbdc9a898b8403

Fix will be provided in a future release.

I/O device impacted

None.

[{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SGMV168","label":"IBM Support for SUSE Linux Enterprise Server"},"ARM Category":[{"code":"a8m0z000000cwiUAAQ","label":"Linux on Power-\u003ESupported distributions-\u003EPOWER"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"},{"code":"PF053","label":"Power"},{"code":"PF100","label":"KVM"}],"Version":"11.0.0;12.0.0;15.0.0"}]

Document Information

Modified date:
10 November 2025

UID

ibm17244965