APAR status
Closed as program error.
Error description
The kernel panics around the time of stopping or starting the Spectrum Scale daemon. The back trace from the crash looks like this: [12334.372189] CPU: 28 PID: 244084 Comm: mmccr Kdump: loaded Tainted: G??????? W? OE? ------------?? 3.10.0-1160.66.1.el7.x86_64 #1 [12334.385060] Hardware name: Lenovo ThinkSystem SR650 - XXXX [12334.396680] task: ffff9cb339e54200 ti: ffff9cb322958000 task.ti: ffff9cb322958000 [12334.405012] RIP: 0010:[<ffffffffa404ba86>]? [<ffffffffa404ba86>] filp_close+0x26/0x90 [12334.413747] RSP: 0018:ffff9cb32295bef8? EFLAGS: 00010282 [12334.419660] RAX: ffffffffc08470e0 RBX: ffff9cb31a95e600 RCX: ffff9cb31a95e600 [12334.427606] RDX: 0000000000000000 RSI: ffff9cb37b4a0500 RDI: ffff9cb31a95e600 [12334.435550] RBP: ffff9cb32295bf10 R08: 0000000000000000 R09: 0000561807f21fc0 [12334.443495] R10: 0000000000000022 R11: 0000000000000246 R12: ffff9cb37b4a0500 [12334.451438] R13: ffff9cb37b4a0540 R14: 0000000000000000 R15: 0000000000000000 [12334.459385] FS:? 00007fdb88800780(0000) GS:ffff9cb39d100000(0000) knlGS:0000000000000000 [12334.468394] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12334.474790] CR2: ffffffffc0847140 CR3: 0000002f65680000 CR4: 00000000005607e0 [12334.482734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [12334.490678] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [12334.498623] PKRU: 55555554 [12334.501633] Call Trace: [12334.504359]? [<ffffffffa406f8ac>] __close_fd+0x8c/0xb0 [12334.510082]? [<ffffffffa404d513>] SyS_close+0x23/0x50 [12334.515707]? [<ffffffffa4599f92>] system_call_fastpath+0x25/0x2a [12334.522386] Code: ff 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 47 38 48 89 fb 48 85 c0 74 5b 48 8b 47 28 49 89 f4 48 85 c0 74 4a <48> 8b 40 60 45 31 ed 48 85 c0 74 08 e8 a9 c2 14 00 41 89 c5 f6 [12334.544214] RIP? [<ffffffffa404ba86>] filp_close+0x26/0x90 [12334.550332]? RSP <ffff9cb32295bef8> [12334.554213] CR2: ffffffffc0847140 Reported in: Spectrum Scale 5.1.3.1 RHEL 7.9 kernel 3.10.0-1160.66.1.el7.x86_64
Local fix
Problem summary
Part of GPFS are kernel modules that are loaded upon startup and used by other components. Usage counters were not used correctly in the tracedev module, which can lead to the module being unloaded while still in use, resulting in a kernel crash. One case where this is possible is running the "mmvdisk server configure" and "mmvdisk server unconfigure" commands with the --recycle option.
Problem conclusion
This problem is fixed in 5.1.5 PTF 1 To see all Spectrum Scale APARs and their respective fix solutions refer to page https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_ apars.html Benefits of the solution: Avoid the kernel crash, by handling the usage counters of the tracedev module correctly. Work Around: Avoid stopping GPFS immediately after starting up. Problem trigger: Run GPFS shutdown and startup. This is a rare problem, so running this or the mentioned "mmvdisk server" command in a loop will be necessary to trigger the problem. Symptom: Abend/Crash Platforms affected: ALL Linux OS environments Functional Area affected: All Scale Users Customer Impact: Suggested
Temporary fix
Comments
APAR Information
APAR number
IJ41553
Reported component name
SPEC SCALE DME
Reported component ID
5737F34AP
Reported release
513
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2022-08-04
Closed date
2022-08-23
Last modified date
2022-08-23
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE DME
Fixed component ID
5737F34AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"513","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
23 August 2022