IBM Support

Dump collection failed due to out of memory (OOM) error

Flashes (Alerts)


Abstract

The crashkernel memory reservation for kernel dump (kdump) and firmware assisted dump (FADump) operations is recommended based on few assumptions about the resources available for a specific system random access memory (RAM) size. If a power system fails to capture dumps with OOM error logs even after using the recommended crashkernel memory reservations for kdump and fadump operations, it is possible that the CPU core count is higher than average for the specific system RAM size.

For example, if a system crashes, then the kdump and FADump operations are likely to fail in collecting the dump if the CPU core count exceeds 40 and the system RAM is less than 128 GB. The kdump and FADump operations fail to collect the dump in such cases because the recommended crashkernel memory size for the capture kernel is not sufficient.

Content

Linux Releases Affected

Red Hat Enterprise Linux (RHEL) 8.6, 8.7, 8.8, 8.9
Red Hat Enterprise Linux (RHEL) 9.1, 9.2, 9.3

SUSE Linux Enterprise Server (SLES) 15 SP1 SP2 SP3 SP4 SP5 SP6

 
IBM Systems Affected

Power10 systems

Symptoms

If the capture kernel boots with the following backtrace when a system crashes, then the dump capture is likely to fail:

[    4.467979] swapper/2 invoked oom-killer: gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=0, oom_score_adj=0
[    4.467992] CPU: 2 PID: 1 Comm: swapper/2 Not tainted 5.14.0-362.el9.ppc64le #1
[    4.467999] Call Trace:
[    4.468002] [c00000001684b3e0] [c000000010897fa0] dump_stack_lvl+0x74/0xa8 (unreliable)
[    4.468019] [c00000001684b420] [c000000010441a78] dump_header+0x64/0x250
[    4.468026] [c00000001684b4a0] [c000000010441910] out_of_memory+0x3d0/0x440
[    4.468030] [c00000001684b530] [c0000000104d1aa4] __alloc_pages_may_oom+0x154/0x230
[    4.468035] [c00000001684b5d0] [c0000000104d262c] __alloc_pages_slowpath.constprop.0+0x78c/0xb40
[    4.468038] [c00000001684b720] [c0000000104d2bf0] __alloc_pages+0x210/0x2b0
[    4.468042] [c00000001684b7b0] [c0000000105064c0] alloc_page_interleave+0x30/0xb0
[    4.468047] [c00000001684b7e0] [c00000001051b928] allocate_slab+0x4e8/0x570
[    4.468051] [c00000001684b850] [c00000001051fff8] ___slab_alloc+0x468/0x8c0
[    4.468055] [c00000001684b960] [c000000010523b44] kmem_cache_alloc+0x1e4/0x620
[    4.468058] [c00000001684b9c0] [c000000010138444] create_events_from_catalog.constprop.0+0x44/0xd50
[    4.468065] [c00000001684bb50] [c0000000101393c4] hv_24x7_init+0xe4/0x260
[    4.468068] [c00000001684bbd0] [c000000010012120] do_one_initcall+0x60/0x2c0
[    4.468072] [c00000001684bca0] [c0000000120053c4] do_initcalls+0x13c/0x190
[    4.468079] [c00000001684bd50] [c0000000120056f4] kernel_init_freeable+0x240/0x2b4
[    4.468082] [c00000001684bdb0] [c000000010012730] kernel_init+0x30/0x1a0
[    4.468085] [c00000001684be10] [c00000001000cd64] ret_from_kernel_thread+0x5c/0x64
[    4.468089] Mem-Info:
[    4.468093] active_anon:0 inactive_anon:658 isolated_anon:0
A slight change in system configuration can lead to the same problem with a different backtrace. However, a backtrace that includes an out_of_memory function call or a system that invokes the oom-killer process indicates that the dump capture is likely to fail.
 
Workaround
As a workaround, you can revise the recommended crashkernel memory reservation for kdump and fadump by using the following formula:
 
Revised crashkernel size = Recommended crashkernel size + (Number of CPU cores * X)
where X = 12MB for kdump and 18MB for FADump
 
For example, if the recommended crashkernel size for a system is 2048M and the system has 50 CPU cores, then the updated crashkernel value must be calculated as follows:
  • For kdump: Revised crashkernel size = 2048 M + (50*12 M) = 2648 M
  • For FADump: Revised crashkernel size = 2048 M + (50*18 M) = 2848 M
 
Fix Outlook
Red hat Bug : 2236564
SUSE: SLES 15 SP7

I/O device impacted

None

 

[{"Type":"MASTER","Line of Business":{"code":"LOB69","label":"Storage TPS"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SGMV157","label":"IBM Support for Red Hat Enterprise Linux Server"},"ARM Category":[{"code":"a8m0z000000Gnl7AAC","label":"Red Hat Enterprise Linux"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
13 March 2026

UID

ibm17060322