IJ34399: LOGASSERT WHEN MMADDDISK AND MMRESTRIPEFS -B ARE RUNNING

APAR status

Closed as program error.

Error description

Error Description:
When mmadddisk and "mmrestripefs -b" are running at the
same time,  daemon might hit logAssert as below:

2020-04-29_08:31:34.122+1000: [X] logAssertFailed:
lhsP->size == rhsP->size && lhsP->size == resultP->size
2020-04-29_08:31:34.122+1000: [X] return code 0, reason
code 0, log record tag 0
2020-04-29_08:31:56.392+1000: [E] *** Traceback:
2020-04-29_08:31:56.392+1000: [E]     2:0x55B3D3392BC8
logAssertFailed + 0x418 at ??:0
2020-04-29_08:31:56.392+1000: [E]     3:0x55B3D3F412F3
Bitmap::ANDNOT(Bitmap const*, Bitmap const*, Bitmap*) +
0x53 at ??:0
2020-04-29_08:31:56.392+1000: [E]     4:0x55B3D31647F5
FastRebalanceContext::checkDiskAddrAndReselectDisk(fsDisk
Addr, int, unsigned int, int, int*) + 0x555 at ??:0
2020-04-29_08:31:56.392+1000: [E]     5:0x55B3D3164D3B
doFastRebalanceCheck(StripeGroup*, int, fsDiskAddr, int,
unsigned int, unsigned int*, int*) + 0x1CB at ??:0
2020-04-29_08:31:56.392+1000: [E]     6:0x55B3D31122D0
FileMetadata::doneRepairFile(FSOperation*, SGPoolIdList
const&, SGPoolIdList const&, BrokenAddrMsg*,
fileRepairInfo*, long long*, Errno*, long long,
repCompInfo*) + 0xEB0 at ??:0
2020-04-29_08:31:56.392+1000: [E]     7:0x55B3D316CE21
SFSRepairFileDone(StripeGroup*, SGPoolIdList const&,
SGPoolIdList const&, FileUID, PitFileWorkTypes,
fileRepairInfo*, PitFile*, long long*, Errno*,
BrokenAddrMsg*, long long, repCompInfo*) + 0xB11 at ??:0
2020-04-29_08:31:56.392+1000: [E]     8:0x55B3D316D3BD
RepairInode(StripeGroup*, SGPoolIdList const&,
SGPoolIdList const&, long long, Inode*, unsigned int*,
GenNum*, unsigned int, fileRepairInfo*, PitFile*,
PitFileWorkTypes, long long, long long*, long long*,
unsigned int*, long long, ErrorDetails*, Errno,
BrokenAddrMsg*, long long, repCompInfo*) + 0x58D at ??:0
2020-04-29_08:31:56.392+1000: [E]     9:0x55B3D349EB68
PitSlave::doWork(PitSlaveWorkItem*, ErrorDetails*, void*)
+ 0x428 at ??:0
2020-04-29_08:31:56.392+1000: [E]     10:0x55B3D34A0CD6
PitSlave::recordCompletedWork(PitSlaveWorkItem*, void*) +
0x1026 at ??:0
2020-04-29_08:31:56.392+1000: [E]     11:0x55B3D34A60E5
PitSlave::pitWorkerMain() + 0x225 at ??:0
2020-04-29_08:31:56.392+1000: [E]     12:0x55B3D34A68D8
PitWorkerThreadBody(void*) + 0xF8 at ??:0
2020-04-29_08:31:56.392+1000: [E]     13:0x55B3D2E9AD23
Thread::callBody(Thread*) + 0x63 at ??:0
2020-04-29_08:31:56.392+1000: [E]     14:0x55B3D2E885B2
Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0
2020-04-29_08:31:56.392+1000: [E]     15:0x7FDF42810EA5
start_thread + 0xC5 at ??:0
2020-04-29_08:31:56.392+1000: [E]     16:0x7FDF419148CD
__clone + 0x6D at ??:0
mmfsd:
/project/spreltac502/build/rtac502s003a/src/avs/fs/mmfs/t
s/classes/basic/bitmap.C:841: void
logAssertFailed(UInt32, const char*, UInt32, Int32,
Int32, UInt32, const char*, const char*): Assertion
'lhsP->size == rhsP->size && lhsP->size == resultP->size'
failed.


Reported in:
Spectrum Scale 5.0.5 on RHEL7

Local fix

Avoid run "mmadddisk" and "mmrestripefs -r" at the same
time.

Problem summary

logAssertFailed
(lhsP->size == rhsP->size && lhsP->size == resultP->size)
When the command mmrestripefs with -b option is running
for rebalancing, when adding
more dataOnly or metadataOnly disks,
the rebalancing threads could hit this disks bitmap
operation assert because of the disks bitmap
size is increased due to adding disks.

Problem conclusion

This problem is fixed in 5.1.5
To see all Spectrum Scale APARs and
their respective fix solutions refer to page
https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_
apars.html


Benefits of the solution:
Avoid mmfsd daemon process crash

Work around:
Don't add dataOnly or metadataOnly disks to the file
system while mmrestripefs -b option command is in progress.
Problem trigger:
Adding dataOnly or metadataOnly disks
while rebalancing is in progress.
Symptom: mmfsd daemon process died
Platforms affected: All Operating Systems
Functional Area affected: mmrestripefs command with -b option
Customer Impact: Medium

Temporary fix

Comments

APAR Information

APAR number
IJ34399
Reported component name
SPEC SCALE DME
Reported component ID
5737F34AP
Reported release
505
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-08-12
Closed date
2022-07-20
Last modified date
2022-09-08

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
SPEC SCALE DME
Fixed component ID
5737F34AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"505","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
08 September 2022

Tips

IJ34399: LOGASSERT WHEN MMADDDISK AND MMRESTRIPEFS -B ARE RUNNING

Subscribe to this APAR

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

Document Information

Share your feedback

Need support?