Topic
  • No replies
BinLv
BinLv
1 Post

Pinned topic One question on mutex

‏2013-05-10T03:39:14Z |

Our application need to handle DB2 API and CLI in multi-thread in serialization.
Our multi-threads have one hMutex1 to control.
In our logs show both threads have gotten this hMutex1. It's a weird here.

The crash stack information:
Segmentation fault in sqleUCfreeDiagInfo(db2UCinterface*,db2UCdiagnosticsInfo**) at 0x900000010393274 ($t12)
0x900000010393274 (sqleUCfreeDiagInfo(db2UCinterface*,db2UCdiagnosticsInfo**)+0xb0) e81b0020          ld   r0,0x20(r27)
(dbx) thread current
 thread  state-k     wchan            state-u    k-tid mode held scope function
>$t12    run                          running  53084355   k   no   sys  sqleUCfreeDiagInfo(db2UCinterface*,db2UCdiagnosticsInfo**)
(dbx) where
sqleUCfreeDiagInfo(db2UCinterface*,db2UCdiagnosticsInfo**)(??, ??) at 0x900000010393274
sqleUCappConnect(0xa, 0x4000, 0x2000, 0x0) at 0x90000001002cddc
CLI_sqlConnect(CLI_CONNECTINFO*,sqlca*,CLI_ERRORHEADERINFO*)(??, ??, ??) at 0x9000000101b3188
SQLConnect2(CLI_CONNECTINFO*,unsigned char*,short,unsigned char*,short,unsigned char*,short,unsigned char*,short,unsigned char)(0x112753080, 0x403f, 0x8000000000008, 0x112753180, 0x8000000000008, 0x0, 0x90000000002b1f8, 0xb0c) at 0x9000000101b0b10
SQLDriverConnect2.fdpr.chunk.2(CLI_CONNECTINFO*,void*,unsigned char*,short,unsigned char*,short,short*,unsigned short,unsigned char,unsigned char,CLI_ERRORHEADERINFO*)(0x112753080, 0x1109cd008, 0x1d0000001d, 0x112753180, 0x9001000a1cdea78, 0x40000, 0x0, 0x40000) at 0x9000000101cc924
SQLConnect1(CLI_CONNECTINFO*,unsigned char*,short,unsigned char*,short,unsigned char*,short)(0x112753080, 0x97020301000198d0, 0x9001000a13043d0, 0x9001000a13043a0, 0x40000, 0x0, 0x0) at 0x9000000101c6140
SQLConnect(0x100000001, 0x50400000504, 0xfffdfffffffffffd, 0x700000007, 0xfffdfffffffffffd, 0x0, 0xfffdfffffffffffd) at 0x9000000101c6a48
SQLConnection::initConnection(char*,char*,char*,char*)(0x111aabd70, 0x11162dcf1, 0x110210411, 0x110210511, 0x0) at 0x100033de8
SQLConnection::SQLConnection(char*,char*,char*,char*)(0x111aabd70, 0x11162dcf1, 0x110210411, 0x110210511, 0x0) at 0x10003b8a4
DiagnosticLogThread(void*)(0x110210410) at 0x10000a6e8
(dbx) thread current 13
(dbx) where
_global_lock_common(??, ??, ??) at 0x90000000079c85c
_mutex_lock(??, ??, ??) at 0x9000000007a9aa0
cliutl.sqloxltc_app@AF112_5(??) at 0x9000000100e06b0
sqlerInvokeKnownProcedure(unsigned int,sqlda*,sqlca*)(0x1c0000001c, 0x92, 0x0) at 0x90000001014698c
sqleMappingFnClient.fdpr.chunk.1(db2UCinterface*,sqlca*)(??, ??) at 0x900000010146c78
sqleriar(SQLE_DB2RA_T*)(??) at 0x900000010144f08
sqlmhReg.@91@sqlm_send_snapshot_db2ra(unsigned int,unsigned int,void*,sqlm_collected*,void*,unsigned char,short,unsigned int,sqlmStreamFlags*,sqlca*)(0xa0000000a, 0x114, 0x0, 0x110ca0ce8, 0x110ca0d00, 0x0, 0x0, 0x0) at 0x9000000101f4744
db2GetSnapshot(??, ??, ??) at 0x9000000101f0c94
CandleDb2SnapShot::GetDatabaseSnapshot(int*,int,int*,int,int,char**,int)(0x111b41650, 0x110ca1828, 0x200000002, 0x110ca1830, 0x600000006, 0xffffffffffffffff, 0x0, 0x0) at 0x10005b774
kud00_kudappl00_agent::TakeSample()(0x111402eb0) at 0x10015399c
ctira::DriveDataCollection()(0x111402eb0) at 0x90000000f0194b4
TableManager::checkForExpiredRequests(long)(0x110ccadb0, 0x51834d94) at 0x90000000f00b114
TableManager::timeout(CTRA_Timerspec_*)(0x110ccaef8) at 0x90000000f00cd2c
CTRA_timer_base::TimerCallbackHandler()(0x110209770) at 0x90000000eff0a38
krabutmr.Handler_base(void*)(0x110209770) at 0x90000000eff117c
krabuptm.CTRA_timer_task(void*)(0x110209ad0) at 0x90000000eff3660
(dbx)
 

thread 12:
(2013/05/03,05:39:32.11905-4:kuddlagt.cpp,1904,"DiagnosticLogThread") Before Locking the mutex...
(2013/05/03,05:39:32.11906-4:kuddlagt.cpp,1906,"DiagnosticLogThread") after locking the mutex...
(2013/05/03,05:39:32.11907-4:kudcussql.cpp,359,"kud00_customized_sql_status::getInstance") Entry

thread 13:
(2013/05/03,05:39:32.1173A-B:kuda4agt.cpp,98,"TakeSample") Entry
...
(2013/05/03,05:39:32.1173E-B:kuda4agt.cpp,160,"TakeSample") Before Locking the mutex...
(2013/05/03,05:39:32.1173F-B:kuda4agt.cpp,162,"TakeSample") after locking the mutex...
(2013/05/03,05:39:32.118C1-B:kuda4agt.cpp,203,"TakeSample") The instance name is db2inst4
(2013/05/03,05:39:32.118C2-B:kuda4agt.cpp,209,"TakeSample") Acquiring snapshot for partition -1
...
(2013/05/03,05:39:32.BC02-B:cssmain.cpp,1295,"GetDatabaseSnapshot") Entry
...
(2013/05/03,05:39:32.1192E-B:cssmain.cpp,1397,"GetDatabaseSnapshot") Calling db2GetSnapshotSize : db2iversion 10, db2version 10010000
(2013/05/03,05:39:32.1192F-B:cssmain.cpp,1400,"GetDatabaseSnapshot") Calling db2GetSnapshotSize : objtype_copy[0] = 65546
(2013/05/03,05:39:32.11930-B:cssmain.cpp,1400,"GetDatabaseSnapshot") Calling db2GetSnapshotSize : objtype_copy[1] = 65566

I'm sure that that
1)In thread 12 SQLConnection::SQLConnection calling is after hMutex1 lock success, if failed then other log message would be shown.
2)In thread 13 CandleDb2SnapShot::GetDatabaseSnapshot calling is also after the hMutex1 lock success.

So I'm confused how the two threads here to get the hMutex1 successfully at the same time.

My OS is AIX 7.1

Any way to check this ? thanks.