Flashes (Alerts)
Abstract
During PCIe error injection testing on Power10 Systems, it is observed that, occasionally, recoverable PCIe errors are not recovered by the Linux® OS and the affected adapter is taken offline.
Content
Linux releases affected: Red Hat® Enterprise Linux 8.2 Red Hat Enterprise Linux 8.4 SUSE Linux Enterprise Server 12, Service Pack 5 SUSE Linux Enterprise Server 15, Service Pack 3 IBM systems affected: All IBM Power10 systems Symptoms When a PCIe error recover is attempted, the following errors are seen in the log and the adapter is taken offline: [ 7668.221060] bnx2x: [bnx2x_io_slot_reset:14359(enP21p1s0f1)]IO slot reset initializing... [ 7668.221124] bnx2x 0015:01:00.1: enabling device (0140 -> 0142) [ 7668.225177] bnx2x: [bnx2x_io_slot_reset:14375(enP21p1s0f1)]IO slot reset --> driver unload [ 7862.256292] INFO: task kworker/u48:1:14577 blocked for more than 120 seconds. [ 7862.256303] Tainted: G W --------- - - 4.18.0-305.el8.ppc64le #1 [ 7862.256305] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7862.256307] kworker/u48:1 D 0 14577 2 0x00000888 [ 7862.256313] Workqueue: netns cleanup_net [ 7862.256315] Call Trace: [ 7862.256318] [c00000017ae736a0] [c0000000001a2d88] kthread+0x8/0x1c0 (unreliable) [ 7862.256322] [c00000017ae73880] [c000000000018400] __switch_to+0x2e0/0x500 [ 7862.256325] [c00000017ae738e0] [c000000000ed90a8] __schedule+0x2f8/0x9c0 [ 7862.256328] [c00000017ae739b0] [c000000000ed97d8] schedule+0x68/0x130 [ 7862.256331] [c00000017ae739e0] [c000000000ed9f30] schedule_preempt_disabled+0x20/0x30 [ 7862.256334] [c00000017ae73a00] [c000000000edba18] __mutex_lock.isra.1+0x388/0x760 [ 7862.256337] [c00000017ae73aa0] [c000000000c4a1d8] rtnl_lock+0x28/0x40 [ 7862.256340] [c00000017ae73ac0] [c000000000c2da1c] default_device_exit+0x3c/0x1a0 [ 7862.256342] [c00000017ae73b70] [c000000000c17134] cleanup_net+0x404/0x720 [ 7862.256345] [c00000017ae73c60] [c0000000001981b4] process_one_work+0x304/0x5d0 [ 7862.256347] [c00000017ae73d00] [c000000000198cfc] worker_thread+0xcc/0x7a0 [ 7862.256349] [c00000017ae73db0] [c0000000001a2f30] kthread+0x1b0/0x1c0 [ 7862.256353] [c00000017ae73e20] [c00000000000b7d8] ret_from_kernel_thread+0x5c/0x64 Workaround If a device is taken offline because of this issue, the device can be DLPAR removed from the LPAR and added back in order to recover from this issue. Fix Outlook IBM is working with Red Hat and SUSE to release a fix for this issue. The fix is targeted to be made available in the next minor release of Red Hat and SUSE.
[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SGMV157","label":"IBM Support for Red Hat Enterprise Linux Server"},"ARM Category":[{"code":"a8m0z000000Gnl7AAC","label":"Red Hat Enterprise Linux"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
23 September 2021
UID
ibm16490839