Topic
No replies
chris87
chris87
1 Post
ACCEPTED ANSWER

Pinned topic Assert with GPFS 3.3.0.26

‏2012-11-04T13:39:36Z |
Hello,

i have a problem with a GPFS cluster of 16 nodes, all connected via FC to an IBM v7000 Storage (all have the NSDs directly configured).

The version was for a couple of years at 3.2.0.29, now i have upgraded in first step to 3.3.0.26

At this time, the new version runs for 2 weeks and now i get some errors on more nodes:


Sun Nov  4 11:38:13.088 2012: logAssertFailed: !
"oldDiskAddrP == NULL || oldDiskAddrFound.compAddr(*oldDiskAddrP)" Sun Nov  4 11:38:13.089 2012: 

return code 0, reason code 0, log record tag 0 Sun Nov  4 11:38:13.453 2012: *** Assert exp(!
"oldDiskAddrP == NULL || oldDiskAddrFound.compAddr(*oldDiskAddrP)") in line 8181 of file /project/sprelche/build/rches026a/src/avs/fs/mmfs/ts/fs/metadata.C Sun Nov  4 11:38:13.454 2012: *** Traceback: Sun Nov  4 11:38:13.455 2012:         2:0x56C972 FileMetadata::updateDataBlockDiskAddr(

long long, fsDiskAddr const*, fsDiskAddr const&, int, indBlockDesc*, unsigned int*) + 0xE52 Sun Nov  4 11:38:13.456 2012:         3:0x4AC165 BufferDesc::commitAssigned(int, unsigned 

int) + 0x1B5 Sun Nov  4 11:38:13.455 2012:         4:0x4AFA49 BufferDesc::flushBuffer(

int) + 0x609 Sun Nov  4 11:38:13.456 2012:         5:0x59E2CA OpenFile::handleBufferFlush(int, BufferDesc*, 

long long, 

long long, 

long long, 

long long, ByteRange const&, int*, int*, ByteRange*, 

long long*) + 0x76A Sun Nov  4 11:38:13.455 2012:         6:0x59EBD6 OpenFile::flushAllBuffers(int, ByteRange const&, int*, int*, ByteRange*) + 0x1B6 Sun Nov  4 11:38:13.456 2012:         7:0x59F807 FileMetadata::flushFile(int, ByteRange const&, int*, int*, ByteRange*) + 0x877 Sun Nov  4 11:38:13.455 2012:         8:0x5A075C SFSSyncFile(StripeGroup*, 

long long, int, int, ByteRange const&, OpenFile*) + 0x2EC Sun Nov  4 11:38:13.456 2012:         9:0x5983AA HandleMBFSyncFile(MBFSyncFileParms*) + 0xCA Sun Nov  4 11:38:13.455 2012:         10:0x451069 Mailbox::msgHandlerBody(void*) + 0x2B9 Sun Nov  4 11:38:13.456 2012:         11:0x448CF8 Thread::callBody(Thread*) + 0x108 Sun Nov  4 11:38:13.455 2012:         12:0x442ACD Thread::callBodyWrapper(Thread*) + 0x8D Sun Nov  4 11:38:13.456 2012:         13:0x7FBFC3197F1A start_thread + 0x8A Sun Nov  4 11:38:13.455 2012:         14:0x7FBFC2BBD5D2 _end + 0x7FBFC207E00A mmfsd: /project/sprelche/build/rches026a/src/avs/fs/mmfs/ts/fs/metadata.C:8181: 

void logAssertFailed(unsigned int, 

const char*, unsigned int, int, int, unsigned int, 

const char*, 

const char*): Assertion `!
"oldDiskAddrP == NULL || oldDiskAddrFound.compAddr(*oldDiskAddrP)"
' failed. Sun Nov  4 11:38:13.456 2012: Signal 6 at location 0x7FBFC2B2307B in process 7373, link reg 0xFFFFFFFFFFFFFFFF. Sun Nov  4 11:38:13.455 2012: rax    0x0000000000000000  rbx    0x000000004055A540 Sun Nov  4 11:38:13.456 2012: rcx    0xFFFFFFFFFFFFFFFF  rdx    0x0000000000000006 Sun Nov  4 11:38:13.455 2012: rsp    0x000000004055A498  rbp    0x000000004055C960 Sun Nov  4 11:38:13.456 2012: rsi    0x0000000000001CED  rdi    0x0000000000001CCD Sun Nov  4 11:38:13.455 2012: r8     0x0000000000001CED  r9     0x0000000000000006 Sun Nov  4 11:38:13.456 2012: r10    0x0000000000000008  r11    0x0000000000000202 Sun Nov  4 11:38:13.455 2012: r12    0x00007FFF94B49E94  r13    0x0000000000915C78 Sun Nov  4 11:38:13.456 2012: r14    0x0000000000001FF5  r15    0x0000000000914EE0 Sun Nov  4 11:38:13.455 2012: rip    0x00007FBFC2B2307B  eflags 0x0000000000000202 Sun Nov  4 11:38:13.456 2012: csgsfs 0x0000000000000033  err    0x0000000000000000 Sun Nov  4 11:38:13.455 2012: trapno 0x0000000000000000  oldmsk 0x0000000010017807 Sun Nov  4 11:38:13.456 2012: cr2    0x0000000000000000 Sun Nov  4 11:38:13.841 2012: Traceback: Sun Nov  4 11:38:13.845 2012: 0:00007FBFC2B2307B _end + 7FBFC1FE3AB3 Sun Nov  4 11:38:13.846 2012: 1:00007FBFC2B2484E _end + 7FBFC1FE5286 Sun Nov  4 11:38:13.845 2012: 2:00007FBFC2B1CAF4 _end + 7FBFC1FDD52C Sun Nov  4 11:38:13.846 2012: 3:0000000000687017 logAssertFailed + 197 Sun Nov  4 11:38:13.845 2012: 4:000000000056C972 FileMetadata::updateDataBlockDiskAddr(

long long, fsDiskAddr const*, fsDiskAddr const&, int, indBlockDesc*, unsigned int*) + E52 Sun Nov  4 11:38:13.846 2012: 5:00000000004AC165 BufferDesc::commitAssigned(int, unsigned 

int) + 1B5 Sun Nov  4 11:38:13.845 2012: 6:00000000004AFA49 BufferDesc::flushBuffer(

int) + 609 Sun Nov  4 11:38:13.846 2012: 7:000000000059E2CA OpenFile::handleBufferFlush(int, BufferDesc*, 

long long, 

long long, 

long long, 

long long, ByteRange const&, int*, int*, ByteRange*, 

long long*) + 76A Sun Nov  4 11:38:13.845 2012: 8:000000000059EBD6 OpenFile::flushAllBuffers(int, ByteRange const&, int*, int*, ByteRange*) + 1B6 Sun Nov  4 11:38:13.846 2012: 9:000000000059F807 FileMetadata::flushFile(int, ByteRange const&, int*, int*, ByteRange*) + 877 Sun Nov  4 11:38:13.845 2012: 10:00000000005A075C SFSSyncFile(StripeGroup*, 

long long, int, int, ByteRange const&, OpenFile*) + 2EC Sun Nov  4 11:38:13.846 2012: 11:00000000005983AA HandleMBFSyncFile(MBFSyncFileParms*) + CA Sun Nov  4 11:38:13.845 2012: 12:0000000000451069 Mailbox::msgHandlerBody(void*) + 2B9 Sun Nov  4 11:38:13.846 2012: 13:0000000000448CF8 Thread::callBody(Thread*) + 108 Sun Nov  4 11:38:13.845 2012: 14:0000000000442ACD Thread::callBodyWrapper(Thread*) + 8D Sun Nov  4 11:38:13.846 2012: 15:00007FBFC3197F1A start_thread + 8A Sun Nov  4 11:38:13.877 2012: Signal 6 at location 0x7FBFC2B85DF5 in process 7373, link reg 0xFFFFFFFFFFFFFFFF. Sun Nov  4 11:38:13.878 2012: mmfsd is shutting down. Sun Nov  4 11:38:13.877 2012: Reason 

for shutdown: Signal handler entered Sun Nov  4 11:38:13 CET 2012: mmcommon mmfsdown invoked.  Subsystem: mmfs  Status: active


Does someone know where this problem comes from or what i can do to prevent?

Thanks for all reply...

Regards, Christian