APAR status
Closed as program error.
Error description
Error Description: When mmadddisk and "mmrestripefs -b" are running at the same time, daemon might hit logAssert as below: 2020-04-29_08:31:34.122+1000: [X] logAssertFailed: lhsP->size == rhsP->size && lhsP->size == resultP->size 2020-04-29_08:31:34.122+1000: [X] return code 0, reason code 0, log record tag 0 2020-04-29_08:31:56.392+1000: [E] *** Traceback: 2020-04-29_08:31:56.392+1000: [E] 2:0x55B3D3392BC8 logAssertFailed + 0x418 at ??:0 2020-04-29_08:31:56.392+1000: [E] 3:0x55B3D3F412F3 Bitmap::ANDNOT(Bitmap const*, Bitmap const*, Bitmap*) + 0x53 at ??:0 2020-04-29_08:31:56.392+1000: [E] 4:0x55B3D31647F5 FastRebalanceContext::checkDiskAddrAndReselectDisk(fsDisk Addr, int, unsigned int, int, int*) + 0x555 at ??:0 2020-04-29_08:31:56.392+1000: [E] 5:0x55B3D3164D3B doFastRebalanceCheck(StripeGroup*, int, fsDiskAddr, int, unsigned int, unsigned int*, int*) + 0x1CB at ??:0 2020-04-29_08:31:56.392+1000: [E] 6:0x55B3D31122D0 FileMetadata::doneRepairFile(FSOperation*, SGPoolIdList const&, SGPoolIdList const&, BrokenAddrMsg*, fileRepairInfo*, long long*, Errno*, long long, repCompInfo*) + 0xEB0 at ??:0 2020-04-29_08:31:56.392+1000: [E] 7:0x55B3D316CE21 SFSRepairFileDone(StripeGroup*, SGPoolIdList const&, SGPoolIdList const&, FileUID, PitFileWorkTypes, fileRepairInfo*, PitFile*, long long*, Errno*, BrokenAddrMsg*, long long, repCompInfo*) + 0xB11 at ??:0 2020-04-29_08:31:56.392+1000: [E] 8:0x55B3D316D3BD RepairInode(StripeGroup*, SGPoolIdList const&, SGPoolIdList const&, long long, Inode*, unsigned int*, GenNum*, unsigned int, fileRepairInfo*, PitFile*, PitFileWorkTypes, long long, long long*, long long*, unsigned int*, long long, ErrorDetails*, Errno, BrokenAddrMsg*, long long, repCompInfo*) + 0x58D at ??:0 2020-04-29_08:31:56.392+1000: [E] 9:0x55B3D349EB68 PitSlave::doWork(PitSlaveWorkItem*, ErrorDetails*, void*) + 0x428 at ??:0 2020-04-29_08:31:56.392+1000: [E] 10:0x55B3D34A0CD6 PitSlave::recordCompletedWork(PitSlaveWorkItem*, void*) + 0x1026 at ??:0 2020-04-29_08:31:56.392+1000: [E] 11:0x55B3D34A60E5 PitSlave::pitWorkerMain() + 0x225 at ??:0 2020-04-29_08:31:56.392+1000: [E] 12:0x55B3D34A68D8 PitWorkerThreadBody(void*) + 0xF8 at ??:0 2020-04-29_08:31:56.392+1000: [E] 13:0x55B3D2E9AD23 Thread::callBody(Thread*) + 0x63 at ??:0 2020-04-29_08:31:56.392+1000: [E] 14:0x55B3D2E885B2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 2020-04-29_08:31:56.392+1000: [E] 15:0x7FDF42810EA5 start_thread + 0xC5 at ??:0 2020-04-29_08:31:56.392+1000: [E] 16:0x7FDF419148CD __clone + 0x6D at ??:0 mmfsd: /project/spreltac502/build/rtac502s003a/src/avs/fs/mmfs/t s/classes/basic/bitmap.C:841: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion 'lhsP->size == rhsP->size && lhsP->size == resultP->size' failed. Reported in: Spectrum Scale 5.0.5 on RHEL7
Local fix
Avoid run "mmadddisk" and "mmrestripefs -r" at the same time.
Problem summary
logAssertFailed (lhsP->size == rhsP->size && lhsP->size == resultP->size) When the command mmrestripefs with -b option is running for rebalancing, when adding more dataOnly or metadataOnly disks, the rebalancing threads could hit this disks bitmap operation assert because of the disks bitmap size is increased due to adding disks.
Problem conclusion
This problem is fixed in 5.1.5 To see all Spectrum Scale APARs and their respective fix solutions refer to page https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_ apars.html Benefits of the solution: Avoid mmfsd daemon process crash Work around: Don't add dataOnly or metadataOnly disks to the file system while mmrestripefs -b option command is in progress. Problem trigger: Adding dataOnly or metadataOnly disks while rebalancing is in progress. Symptom: mmfsd daemon process died Platforms affected: All Operating Systems Functional Area affected: mmrestripefs command with -b option Customer Impact: Medium
Temporary fix
Comments
APAR Information
APAR number
IJ34399
Reported component name
SPEC SCALE DME
Reported component ID
5737F34AP
Reported release
505
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-08-12
Closed date
2022-07-20
Last modified date
2022-09-08
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE DME
Fixed component ID
5737F34AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"505","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
08 September 2022