Flashes (Alerts)
Abstract
When you perform Dynamic Logical Partitioning (DLPAR) remove operations for I/O devices that are assigned to Linux logical partitions, including SR-IOV logical ports, or while performing live partition mobility when using hybrid network virtualization (HNV) that is configured in a Linux logical partition, an undetected data corruption or a crash of the logical partition can occur.
Content
Linux Releases Affected
SUSE Linux Enterprise Server (SLES) 12, Service Pack (SP) 2
SUSE Linux Enterprise Server 12, Service Pack 3
SUSE Linux Enterprise Server 12, Service Pack 4
SUSE Linux Enterprise Server 12, Service Pack 5
SUSE Linux Enterprise Server 15
SUSE Linux Enterprise Server 15, Service Pack 1
SUSE Linux Enterprise Server 15, Service Pack 2
SUSE Linux Enterprise Server 15, Service Pack 3
Red Hat Enterprise Linux (RHEL) 7.9, for Power
Red Hat Enterprise Linux 7.9, for Power LE
Red Hat Enterprise Linux 8.1, for Power LE
Red Hat Enterprise Linux 8.2, for Power LE
Red Hat Enterprise Linux 8.4, for Power LE
Red Hat Enterprise Linux 8.5, for Power LE
SUSE Linux Enterprise Server 12, Service Pack 3
SUSE Linux Enterprise Server 12, Service Pack 4
SUSE Linux Enterprise Server 12, Service Pack 5
SUSE Linux Enterprise Server 15
SUSE Linux Enterprise Server 15, Service Pack 1
SUSE Linux Enterprise Server 15, Service Pack 2
SUSE Linux Enterprise Server 15, Service Pack 3
Red Hat Enterprise Linux (RHEL) 7.9, for Power
Red Hat Enterprise Linux 7.9, for Power LE
Red Hat Enterprise Linux 8.1, for Power LE
Red Hat Enterprise Linux 8.2, for Power LE
Red Hat Enterprise Linux 8.4, for Power LE
Red Hat Enterprise Linux 8.5, for Power LE
IBM Systems Affected
Linux logical partitions that run on any PowerVM based POWER8, POWER9, or Power10 system.
Linux logical partitions that run on any PowerVM based POWER8, POWER9, or Power10 system.
I/O Devices Affected
SR-IOV logical ports and dedicated I/O adapters and devices that are assigned directly to a Linux logical partition. Virtual devices, such as vNIC, virtual Ethernet, virtual SCSI, and virtual Fibre Channel are all unaffected.
Description
This issue can manifest as undetected memory corruption, which can then later cause instability in the LPAR. The following is an example of the crash observed as a result of this issue:
This issue can manifest as undetected memory corruption, which can then later cause instability in the LPAR. The following is an example of the crash observed as a result of this issue:
[ 702.753929] BUG: Unable to handle kernel data access on read at 0xe9834db192293fdf
[ 702.753960] Faulting instruction address: 0xc0000000001689d4
[ 702.753967] Oops: Kernel access of bad area, sig: 11 [#1]
[ 702.753977] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 702.753989] Modules linked in: rpadlpar_io rpaphp xsk_diag nfsv3 nfs_acl nfs lockd grace fscache netfs rfkill bonding tls sunrpc pseries_rng nvme nvme_core drm drm_panel_orientation_quirks xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc ibmveth scsi_transport_fc vmx_crypto dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse overlay squashfs loop
[ 702.754073] CPU: 2 PID: 3357 Comm: drmgr Not tainted 5.14.0-48.el9.ppc64le #1
[ 702.754085] NIP: c0000000001689d4 LR: c000000000168998 CTR: 0000000000000000
[ 702.754096] REGS: c00000000d26b620 TRAP: 0380 Not tainted (5.14.0-48.el9.ppc64le)
[ 702.754106] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 84008228 XER: 20040006
[ 702.754122] CFAR: c000000000168a3c IRQMASK: 0
[ 702.754122] GPR00: c000000000168998 c00000000d26b8c0 c000000002a46a00 c000000002b39ce8
[ 702.754122] GPR04: c0000000467b59d0 0000000000001e56 c0000000092e8e00 000004bfffffffff
[ 702.754122] GPR08: e9834db192293fdf 000004c000000000 000004ffffffffff 0000000000008000
[ 702.754122] GPR12: 0000000000000000 c00000137faf3700 0000000000000000 00000001372af0f0
[ 702.754122] GPR16: c00000000255e998 0000000000000000 c00000000115da90 c00000000115da58
[ 702.754122] GPR20: c00000000299f9d0 c00000000115da20 c00000000115da18 ffffffffffffffff
[ 702.754122] GPR24: c00000000115dad8 c00000000299fa08 0000000000000000 c00000001b194000
[ 702.754122] GPR28: 0000000000000005 0000000000000000 c0000000467b59d0 c00000000255e998
[ 702.754218] NIP [c0000000001689d4] request_resource+0x74/0x110
[ 702.754235] LR [c000000000168998] request_resource+0x38/0x110
[ 702.754245] Call Trace:
[ 702.754248] [c00000000d26b8c0] [c000000000079078] pcibios_setup_bus_self+0x78/0x90 (unreliable)
[ 702.754264] [c00000000d26b8f0] [c000000000076988] pcibios_allocate_bus_resources+0x208/0x410
[ 702.754276] [c00000000d26b9d0] [c000000000076d50] pcibios_finish_adding_to_bus+0x30/0xe0
[ 702.754289] [c00000000d26ba40] [c000000000104d18] init_phb_dynamic+0xb8/0x100
[ 702.754302] [c00000000d26bab0] [c008000003030624] dlpar_add_slot+0x18c/0x380 [rpadlpar_io]
[ 702.754325] [c00000000d26bbe0] [c0000000008177ec] kobj_attr_store+0x2c/0x50
[ 702.754336] [c00000000d26bc00] [c00000000063df64] sysfs_kf_write+0x64/0x80
[ 702.754348] [c00000000d26bc20] [c00000000063d2c8] kernfs_fop_write_iter+0x1b8/0x2a0
[ 702.754358] [c00000000d26bc70] [c000000000542dfc] new_sync_write+0x11c/0x1c0
[ 702.754371] [c00000000d26bd10] [c000000000546084] vfs_write+0x2a4/0x370
[ 702.754382] [c00000000d26bd60] [c000000000546454] ksys_write+0x84/0x140
[ 702.754392] [c00000000d26bdb0] [c000000000030830] system_call_exception+0x160/0x300
[ 702.754402] [c00000000d26be10] [c00000000000c168] system_call_vectored_common+0xe8/0x278
Workaround
It is recommended that you avoid performing DLPAR remove operations of SR-IOV logical ports and dedicated I/O devices that are assigned to a Linux logical partition and avoid using live partition mobility with hybrid network virtualization devices with Linux until the fix is applied.
Fix Outlook
A fix is already available for the following Linux releases:
SLES 12 SP2: 4.4.121-92.175.2
SLES 12 SP3: 4.4.180-94.164.3
SLES 12 SP4: 4.12.14-95.99.3
SLES 15 SP5: 4.12.14-122.116.1
SLES 15 SP1: 4.12.14-150100.197.114.2
SLES 15 SP2: 5.3.18-150200.24.115.1
SLES 15 SP3: 5.3.18-150300.59.63.1
For all other release, if a fix is needed immediately, you can open a support ticket with Red Hat or SUSE and request a hot fix, referencing the bugzilla numbers that are listed below. IBM is working with SUSE to release a fix for this issue as part of a future RHEL and SUSE maintenance release.
Red Hat Bugzillas 2073707, 2081418
SUSE Bugzilla 1198660
[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SGMV157","label":"IBM Support for Red Hat Enterprise Linux Server"},"ARM Category":[{"code":"a8m0z000000Gnl7AAC","label":"Red Hat Enterprise Linux"},{"code":"a8m0z000000GnlCAAS","label":"SUSE Linux Enterprise Server"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
12 September 2022
UID
ibm16592577