Flashes (Alerts)
Abstract
If DLPAR operations cause CPU soft lockups, Power Systems with Emulex FC adapters might crash due to out of memory errors. This failure can be seen in Red Hat Enterprise Linux 9.
Content
Power Systems with Emulex FC adapters
When performing DLPAR operations on CPUs while the Emulex FC adapter is installed, there is a possibility that the driver might not register the addition of new CPUs or the removal of active CPUs. This failure might cause the system to hit a soft lockup that can look similar to the following trace:
watchdog: BUG: soft lockup - CPU#81 stuck for 26s! [kworker/81:1H:1036]
NIP [c0080000071c59e8] lpfc_sli4_process_eq+0x50/0x200 [lpfc]
LR [c0080000071e16f0] lpfc_sli4_poll_hbtimer+0x78/0xe0 [lpfc]
Call Trace:
[c00000005475b5d0] [c00000039a990480] 0xc00000039a990480 (unreliable)
[c00000005475b630] [c0080000071e16f0] lpfc_sli4_poll_hbtimer+0x78/0xe0 [lpfc]
[c00000005475b670] [c000000000238050] call_timer_fn+0x50/0x1c0
[c00000005475b700] [c0000000002384e4] __run_timers.part.0+0x324/0x480
[c00000005475b7e0] [c000000000238694] run_timer_softirq+0x54/0xa0
[c00000005475b810] [c000000000ece3cc] __do_softirq+0x15c/0x3e0
[c00000005475b910] [c000000000158ac8] __irq_exit_rcu+0x158/0x190
[c00000005475b940] [c000000000158d00] irq_exit+0x20/0x40
[c00000005475b960] [c00000000002805c] timer_interrupt+0x14c/0x2b0
[c00000005475b9c0] [c000000000016dc4] replay_soft_interrupts+0x134/0x2f0
[c00000005475bbb0] [c000000000017088] arch_local_irq_restore+0x108/0x170
[c00000005475bbe0] [c000000000ecdcc0] _raw_spin_unlock_irqrestore+0x80/0xb0
[c00000005475bc10] [c000000000964a94] mix_interrupt_randomness+0xe4/0x1b0
[c00000005475bc70] [c00000000017ad98] process_one_work+0x298/0x580
[c00000005475bd10] [c00000000017b128] worker_thread+0xa8/0x630
[c00000005475bda0] [c000000000188428] kthread+0x1b8/0x1c0
[c00000005475be10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
Oops: Kernel access of bad area, sig: 11 [#1]
NIP [c0000000004ebf3c] deactivate_slab+0x15c/0x6f0
LR [c0000000004ec790] flush_cpu_slab+0x90/0x130
Call Trace:
[c000000591b07af0] [c000000000ec4ab0] schedule+0x60/0x110 (unreliable)
[c000000591b07c30] [c0000000004ec790] flush_cpu_slab+0x90/0x130
[c000000591b07c70] [c00000000017ad98] process_one_work+0x298/0x580
[c000000591b07d10] [c00000000017b128] worker_thread+0xa8/0x630
[c000000591b07da0] [c000000000188428] kthread+0x1b8/0x1c0
[c000000591b07e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
There is no workaround for this issue currently. It is advised to shut down the logical partition rather than using the DLPAR operation before adding or removing CPUs from the configuration. If a CPU soft lockup occurs, performing DLPAR operations is not advised until the error is resolved.
Was this topic helpful?
Document Information
Modified date:
20 February 2023
UID
ibm16955489