APAR status
Closed as program error.
Error description
Node randomly crashed at the following place. 282129.573157] BUG: unable to handle kernel NULL pointer dereference at 00000000000000c4 [282129.581099] IP: [<ffffffffc3430ac8>] _Z9gpfsFsyncP13gpfsVfsData_tP9MMFSVInfoP9cxiNode_tiP10ext _cred_t+0x2f8/0x370 [mmfs26] [282129.592252] PGD 800000210d04e067 PUD 2cf5d99067 PMD 0 [282129.597536] Oops: 0000 [#1] SMP [282129.600890] Modules linked in: stap_netlog(OE) nfs_layout_nfsv41_files cts rpcsec_gss_krb5 nfsv4 dns_resolver tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nfsv3 nfs_acl nfs lockd grace fscache isofs loop mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp bridge stp llc proclog_dd7de3(OE) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) dell_rbu nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) nvidia(POE) dcdbas skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mei_me pcspkr sg i2c_i801 mei lpc_ich ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad sch_fq_codel binfmt_misc auth_rpcgss sunrpc ip_tables xfs dm_thin_pool dm_persistent_data [282129.673879] dm_bio_prison dm_bufio libcrc32c mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlx5_core(OE) ttm mlxfw(OE) ahci psample ptp pps_core drm libahci crct10dif_pclmul auxiliary(OE) nvme devlink crct10dif_common crc32c_intel libata megaraid_sas(OE) nvme_core mlx_compat(OE) drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: stap_netlog] [282129.717874] CPU: 20 PID: 27804 Comm: python Kdump: loaded Tainted: P OE ------------ 3.10.0-1160.49.1.el7.x86_64 #1 [282129.729501] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.4.9 06/29/2018 [282129.737057] task: ffff9fcddbbd5280 ti: ffff9fd2550e4000 task.ti: ffff9fd2550e4000 [282129.744605] RIP: 0010:[<ffffffffc3430ac8>] [<ffffffffc3430ac8>] _Z9gpfsFsyncP13gpfsVfsData_tP9MMFSVInfoP9cxiNode_tiP10ext _cred_t+0x2f8/0x370 [mmfs26] [282129.758172] RSP: 0018:ffff9fd2550e7cf0 EFLAGS: 00010286 [282129.763558] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 [282129.770758] RDX: ffff9fcddbbd58f8 RSI: 0000000000006c9c RDI: ffffffffc334bea8 [282129.777961] RBP: ffff9fd2550e7db0 R08: 0000000000000000 R09: 0000000000000005 [282129.785164] R10: 0000000000000001 R11: 0000000000000208 R12: ffffffffffffffff [282129.792367] R13: ffff9fd2550e7d28 R14: ffff9ffeec67af48 R15: 0000000000800000 [282129.799567] FS: 00007fdbeba37740(0000) GS:ffff9fda7f480000(0000) knlGS:0000000000000000 [282129.807721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [282129.813541] CR2: 00000000000000c4 CR3: 0000002022544000 CR4: 00000000007607e0 [282129.820742] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [282129.827943] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [282129.835138] PKRU: 55555554 [282129.837931] Call Trace: [282129.840475] [<ffffffffc3337321>] fsyncInternal.constprop.120+0x101/0x210 [mmfslinux] [282129.848368] [<ffffffffbaac6f4b>] ? wake_up_atomic_t+0x2b/0x30 [282129.854279] [<ffffffffc0c0897c>] ? nfs_file_fsync+0x9c/0x1b0 [nfs] [282129.860617] [<ffffffffc333754b>] gpfs_f_flush+0xab/0xc0 [mmfslinux] [282129.867044] [<ffffffffbac4ba77>] filp_close+0x37/0x90 [282129.872257] [<ffffffffbac6fa2c>] __close_fd+0x8c/0xb0 [282129.877472] [<ffffffffbac4d5a3>] SyS_close+0x23/0x50 [282129.882600] [<ffffffffbb195f92>] system_call_fastpath+0x25/0x2a [282129.909001] RIP [<ffffffffc3430ac8>] _Z9gpfsFsyncP13gpfsVfsData_tP9MMFSVInfoP9cxiNode_tiP10ext _cred_t+0x2f8/0x370 [mmfs26] [282129.920223] RSP <ffff9fd2550e7cf0>
Local fix
Problem summary
The codepath for flushing file data to disk did not properly check for a stale file system, resulting in a crash.
Problem conclusion
This problem is fixed in 5.1.2 PTF 4 To see all Spectrum Scale APARs and their respective fix solutions refer to page https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_ apars.html Benefits of the solution: Node does not crash in this scenario Work Around: N/A Problem trigger: With file descriptor open and kept open, have file system go stale (e.g. restart daemon). Then issue a request to flush the data to a file (or implicit flushOnClose). Symptom: Abend/Crash Platforms affected: ALL Linux OS environments Functional Area affected: All Scale Users Customer Impact: High Importance
Temporary fix
Comments
APAR Information
APAR number
IJ37068
Reported component name
SPEC SCALE STD
Reported component ID
5737F33AP
Reported release
512
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2022-01-10
Closed date
2022-03-22
Last modified date
2022-03-22
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE STD
Fixed component ID
5737F33AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"512","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
23 March 2022