APAR status
Closed as program error.
Error description
mmmount hit logAssert: poolDataP[poolIndex].poolId == poolId After a storage pool is deleted and a new storage pool is created, mmmount on Spectrum Scale V5.0.5/V5.1.0 may hit following assert due to the pool id conflict: 2021-12-13_00:14:54.346-0500: mounting /dev/fs3 2021-12-13_00:14:54.350-0500: [I] Command: mount ess3 2021-12-13_00:14:54.364-0500: [X] logAssertFailed: poolDataP[poolIndex].poolId == poolId 2021-12-13_00:14:54.364-0500: [X] return code 0, reason code 0, log record tag 0 2021-12-13_00:14:55.087-0500: [X] *** Assert exp(poolDataP[poolIndex].poolId == poolId) in line 1762 of file /build/ode/tac503 ptf3/export/ppc64le-linux/usr/include/mmfs/SGDesc.h 2021-12-13_00:14:55.087-0500: [E] *** Traceback: 2021-12-13_00:14:55.087-0500: [E] 2:0x1081E7F4 logAssertFailed + 0x484 at ??:0 2021-12-13_00:14:55.087-0500: [E] 3:0x1070687C StripeGroupDesc::getStoragePoolBlockMapType(int) + 9C at ??:0 2021-12-13_00:14:55.087-0500: [E] 4:0x107B0140 StripeGroupDesc::readFromBuf(char *, int, int*) + 80 at ??:0 2021-12-13_00:14:55.087-0500: [E] 4:0x107B0140 StripeGroup::sgdesc_update(char *, int, int*) + 80 at ??:0 2021-12-13_00:14:55.087-0500: [E] 6:0x1090FC80 StripeGroup::SGMount(int, unsigned int, unsigned int*, allocStatCounter s*, int, int) + 0xDE0 at ??:0 2021-12-13_00:14:55.087-0500: [E] 7:0x105742B0 SFSOpenFS(StripeGroup*, OpenFSForWhat, unsigned int, int, unsigned int) + 0x1640 at ??:0 2021-12-13_00:14:55.087-0500: [E] 8:0x10577F28 SFSMountFS(StripeGroup*, unsigned int, unsigned int, int, char*, int, i nt, unsigned int) + 0x428 at ??:0 2021-12-13_00:14:55.087-0500: [E] 9:0x105CAC2C HandleMBMount(MBMountParms*) + 0x16C at ??:0 2021-12-13_00:14:55.087-0500: [E] 10:0x102C8588 Mailbox::msgHandlerBody(void*) + 0x3D8 at ??:0 2021-12-13_00:14:55.087-0500: [E] 11:0x102A5688 Thread::callBody(Thread*) + 0x118 at ??:0 2021-12-13_00:14:55.087-0500: [E] 12:0x1028F4EC Thread::callBodyWrapper(Thread*) + 0x11C at ??:0 from Cook, Kelley R.K. (External) to Everyone: 1:25 PM 2021-12-13_00:14:55.087-0500: [E] 13:0x3FFFA28E8B94 start_thread + 0x104 at ??:0 2021-12-13_00:14:55.087-0500: [E] 14:0x3FFFA25285F4 __clone + 0xE4 at ??:0 mmfsd: /build/ode/tac503ptf3/export/ppc64le-linux/usr/include/mm fs/SGDesc.h:1762: void logAssertFailed(UInt32, const char*, UI nt32, Int32, Int32, UInt32, const char*, const char*): Assertion 'poolDataP[poolIndex].poolId == poolId? failed. 2021-12-13_00:14:55.087-0500: [E] Signal 6 at location 0x3FFFA243FAF0 in process 529571, link reg 0x3FFFA2441E6C. 2021-12-13_00:14:55.087-0500: [I] nip 0x00003FFFA243FAF0 msr 0x900000010280F033 2021-12-13_00:14:55.087-0500: [I] ctr 0x0000000000000000 link 0x00003FFFA2441E6C 2021-12-13_00:14:55.087-0500: [I] xer 0x0000000000000000 ccr 0x0000000044224844 Reported in: Spectrum Scale V5.0.5/V5.1.0
Local fix
Problem summary
GPFS daemon could assert while mounting the file system on a client node with code level prior to V5.1.1.0. This can only happen if a new storage pool is being created by mmadddisk and the storage pool had been deleted in the past via mmdeldisk.
Problem conclusion
This problem is being fixed in 5.0.5 PTF 12 To see all Spectrum Scale APARs and their respective fix solutions refer to page https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_ apars.html Benefits of the solution: Prevent unexpected GPFS daemon assert that could occur during mmmount. Work Around: Increase the number of disks being added via the mmadddisk command or avoid creating a new storage pool. Problem trigger: Creating a new storage pool via mmadddisk command. Then mount the fs from a client node with code level prior to V5.1.1.0. Symptom: Abend/Crash Platforms affected: ALL Operating System environments Functional Area affected: All Scale Users Customer Impact: Critical
Temporary fix
Comments
APAR Information
APAR number
IJ36842
Reported component name
SPEC SCALE DME
Reported component ID
5737F34AP
Reported release
505
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-12-21
Closed date
2022-01-12
Last modified date
2022-01-12
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE DME
Fixed component ID
5737F34AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"505","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
13 January 2022