Flashes (Alerts)
Abstract
Live partition mobility (LPM) of large memory logical partitions (LPARs) with Hybrid Network Virtualization (HNV) device might fail and result in the source LPAR to reboot.
Content
Linux Releases Affected
RedHat Enterprise Linux 8.x, for Power LE
RedHat Enterprise Linux 9.x, for Power LE
SUSE Linux Enterprise Server 15
RedHat Enterprise Linux 9.x, for Power LE
SUSE Linux Enterprise Server 15
IBM Systems Affected
Linux logical partitions that run on any PowerVM-based POWER9 or Power10 system.
Symptoms
The LPM of a large configuration LPAR with HNV devices might result in the source LPAR to reboot. Network drivers use the page_pool interface to manage DMA mappings. When the LPM is started, the page pool interface asynchronously unmaps DMA-mapped pages for the device. This action can result in the source LPAR to reboot.
If the source LPAR reboots, then the following messages are displayed on the console:
[ 3610.403820] tce_freemulti_pSeriesLP: 48 callbacks suppressed
[ 3610.403833] tce_freemulti_pSeriesLP: plpar_tce_stuff failed
[ 3610.403869] rc = -4
[ 3610.403872] index = 0x70000016
[ 3610.403876] limit = 0x1^M
[ 3610.403879] tce = 0x80000061ee00000
[ 3610.403882] pgshift = 0x10
[ 3610.403884] npages = 0x1
[ 3610.403887] tbl = 000000003a6a2145^M
[ 3610.403912] CPU: 86 PID: 97129 Comm: kworker/86:2 Kdump: loaded Tainted: G E
6.4.0-623164-default #1 SLE15-SP6 763d454e096eda7d91355fd5b171013052d83ed3
[ 3610.403928] Hardware name: IBM,9080-M9S POWER9 (raw) 0x4e2101 0xf000005 of:IBM,FW950.80 (VH950_131) hv:phyp pSeries
[ 3610.403937] Workqueue: events page_pool_release_retry
[ 3610.404003] Call Trace:
[ 3610.404006] dump_stack_lvl+0x6c/0x9c (unreliable)
[ 3610.404039] tce_freemulti_pSeriesLP+0x1e8/0x1f0
[ 3610.404070] __iommu_free+0x118/0x220
[ 3610.404086] iommu_free+0x28/0x70
[ 3610.404106] dma_iommu_unmap_page+0x24/0x40
[ 3610.404113] dma_unmap_page_attrs+0x1ac/0x1e0
[ 3610.404139] page_pool_return_page+0x58/0x1b0
[ 3610.404146] page_pool_release+0x10c/0x270
[ 3610.404152] page_pool_release_retry+0x2c/0x110
[ 3610.404159] process_one_work+0x314/0x620
[ 3610.404173] worker_thread+0x78/0x620
[ 3610.404179] kthread+0x148/0x150
[ 3610.404188] start_kernel_thread+0x14/0x18
Workaround
Before you initiate the LPM, unbind the SR-IOV device from the HNV bonding.
- Set the SR-IOV link to down. This initiates the failover to the vNIC/vETH.
ip link set down dev <sriov-interface> - Wait for 5 mins. This gives enough time for all the buffers that are mapped to the SR-IOV device to be flushed out.
- Initiate the LPM.
To determine SR-IOV interface on the LPAR
The SR-IOV interface mentioned in the workaround, which is bonded to the HNV, can be determined by completing the following steps:
Note: The bonding device and the interfaces that are used in the following example are from a test LPAR. You can substitute the test LPAR interfaces with the source LPAR interfaces.
- Get the interface from the HNV bonding by running the following command:
# cat /proc/net/bonding/bond46ea3b82Ethernet Channel Bonding Driver: v5.14.21-150500.55.73-defaultBonding Mode: fault-tolerance (active-backup) (fail_over_mac follow) Primary Slave: eth13 (primary_reselect always) Currently Active Slave: eth13 <----- SRIOV interface MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Peer Notification Delay (ms): 0 Slave Interface: eth6 <------ vNIC MII Status: up Speed: 25000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 8a:86:8e:57:a2:03 Slave queue ID: 0 Slave Interface: eth13 MII Status: up Speed: 25000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 8a:86:8f:20:a9:00 Slave queue ID: 0 -
Check if eth13 is an SRIOV interface while eth6 is a vNIC by running the following commands:
# ethtool -i eth13 |grep driver driver: mlx5_core# ethtool -i eth6 |grep driver driver: ibmvnic
Fix Outlook
IBM is working to include a fix in a future RedHat and SUSE update.
I/O device impacted
Hybrid Network Virtualization devices
[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SGMV157","label":"IBM Support for Red Hat Enterprise Linux Server"},"ARM Category":[{"code":"a8m0z000000Gnl7AAC","label":"Red Hat Enterprise Linux"},{"code":"a8m0z000000GnlCAAS","label":"SUSE Linux Enterprise Server"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
12 November 2024
UID
ibm17175372