IBM Support

What do these db2diag.log messages "mc_dispatch() call came back with error" mean?

Technical Blog Post


Abstract

What do these db2diag.log messages "mc_dispatch() call came back with error" mean?

Body

What do the following repetitive messages (in db2diag.log) mean?

2015-12-15-23.25.30.209153-300 I101318E684           LEVEL: Info
PID     : 9697                 TID : 140736926312192 PROC : db2sysc 1
INSTANCE: db2inst1             NODE : 001
HOSTNAME: XXX.XXX.XXX
EDUID   : 33                   EDUNAME: db2ard 1
FUNCTION: DB2 UDB, base sys utilities, sqeARChainBuilder::processChain, probe:900
DATA #1 : String, 40 bytes
Autorestart thread chain processing ends
DATA #2 : signed integer, 4 bytes
0
DATA #3 : Boolean, 1 bytes
false
DATA #4 : Boolean, 1 bytes
false
DATA #5 : Boolean, 1 bytes
false
DATA #6 : unsigned integer, 2 bytes
32
DATA #7 : Boolean, 1 bytes
true
DATA #8 : Boolean, 1 bytes
true
DATA #9 : unsigned integer, 4 bytes

2015-12-15-23.28.41.217662-300 I102003E1343          LEVEL: Severe
PID     : 9697                 TID : 140736968255232 PROC : db2sysc 1
INSTANCE: db2inst1             NODE : 001
HOSTNAME: XXX.XXX.XXX
EDUID   : 23                   EDUNAME: db2clstrRscMon 1
FUNCTION: DB2 UDB, high avail services, rocmMemberHCAMonitorStartSessionRegisterAndDispatch, probe:3633
MESSAGE : ZRC=0x827300AA=-2106392406=HA_ZRC_FAILED "SQLHA API call error"
DATA #1 : String, 39 bytes
mc_dispatch() call came back with error
DATA #2 : rsct error, PD_TYPE_SQLHA_RSCT_ERROR_RETURN, 4 bytes
0x00000003
DATA #3 : String, 66 bytes
2610-603 The session with the RMC subsystem has been interrupted.
DATA #4 : String, 27 bytes
Instancename, Member Number
DATA #5 : String, 8 bytes
db2inst1
DATA #6 : signed integer, 2 bytes
1
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
  [0] 0x00007FFFC69400E7 _Z51rocmMemberHCAMonitorStartSessionRegisterAndDispatchP16ROCM_HCA_MONITOR + 0x4C7
  [1] 0x00007FFFC693F9E9 rocmMemberHCAMonitor + 0x39
  [2] 0x000000000041EADB _ZN26sqeMemberAdapterMonitorEdu6RunEDUEv + 0x10B
  [3] 0x00007FFFF1590A27 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
  [4] 0x00007FFFF0D5C3A3 sqloEDUEntry + 0x303
  [5] 0x000000373AC07A51 /lib64/libpthread.so.0 + 0x7A51
  [6] 0x000000373A8E893D clone + 0x6D

2015-12-15-23.28.41.219398-300 I103347E1176          LEVEL: Severe
PID     : 9697                 TID : 140736968255232 PROC : db2sysc 1
INSTANCE: db2inst1             NODE : 001
HOSTNAME: XXX.XXX.XXX
EDUID   : 23                   EDUNAME: db2clstrRscMon 1
FUNCTION: DB2 UDB, high avail services, rocmMemberHCAMonitorStartSessionRegisterAndDispatch, probe:3633
MESSAGE : ZRC=0x827300AA=-2106392406=HA_ZRC_FAILED "SQLHA API call error"
DATA #1 : String, 39 bytes
mc_dispatch() call came back with error
DATA #2 : String, 27 bytes
Instancename, Member Number
DATA #3 : String, 8 bytes
db2inst1
DATA #4 : signed integer, 2 bytes
1
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
  [0] 0x00007FFFC693FDD8 _Z51rocmMemberHCAMonitorStartSessionRegisterAndDispatchP16ROCM_HCA_MONITOR + 0x1B8
  [1] 0x00007FFFC693F9E9 rocmMemberHCAMonitor + 0x39
  [2] 0x000000000041EADB _ZN26sqeMemberAdapterMonitorEdu6RunEDUEv + 0x10B
  [3] 0x00007FFFF1590A27 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
  [4] 0x00007FFFF0D5C3A3 sqloEDUEntry + 0x303
  [5] 0x000000373AC07A51 /lib64/libpthread.so.0 + 0x7A51
  [6] 0x000000373A8E893D clone + 0x6D

2015-12-15-23.28.41.220944-300 I104524E445           LEVEL: Event
PID     : 9697                 TID : 140736968255232 PROC : db2sysc 1
INSTANCE: db2inst1             NODE : 001
HOSTNAME: XXX.XXX.XXX
EDUID   : 23                   EDUNAME: db2clstrRscMon 1
FUNCTION: DB2 UDB, high avail services, rocmMemberHCAMonitorStartSessionRegisterAndDispatch, probe:3693
DATA #1 : rsct error, PD_TYPE_SQLHA_RSCT_ERROR_RETURN, 4 bytes
0x00000000

2015-12-15-23.28.42.221493-300 I104970E1333          LEVEL: Severe
PID     : 9697                 TID : 140736968255232 PROC : db2sysc 1
INSTANCE: db2inst1             NODE : 001
HOSTNAME: XXX.XXX.XXX
EDUID   : 23                   EDUNAME: db2clstrRscMon 1
FUNCTION: DB2 UDB, high avail services, rocmMemberHCAMonitorStartSessionRegisterAndDispatch, probe:3608
MESSAGE : ZRC=0x827300AA=-2106392406=HA_ZRC_FAILED "SQLHA API call error"
DATA #1 : String, 27 bytes
Error starting RSCT session
DATA #2 : rsct error, PD_TYPE_SQLHA_RSCT_ERROR_RETURN, 4 bytes
0x00000002
DATA #3 : String, 68 bytes
2610-602 A session could not be established with the RMC subsystem.
DATA #4 : String, 27 bytes
Instancename, Member Number
DATA #5 : String, 8 bytes
db2inst1
DATA #6 : signed integer, 2 bytes
1
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
  [0] 0x00007FFFC69400E7 _Z51rocmMemberHCAMonitorStartSessionRegisterAndDispatchP16ROCM_HCA_MONITOR + 0x4C7
  [1] 0x00007FFFC693F9E9 rocmMemberHCAMonitor + 0x39
  [2] 0x000000000041EADB _ZN26sqeMemberAdapterMonitorEdu6RunEDUEv + 0x10B
  [3] 0x00007FFFF1590A27 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
  [4] 0x00007FFFF0D5C3A3 sqloEDUEntry + 0x303
  [5] 0x000000373AC07A51 /lib64/libpthread.so.0 + 0x7A51
  [6] 0x000000373A8E893D clone + 0x6D

Carried on for a few minutes then :

2015-12-15-23.28.47.239430-300 I121029E1164          LEVEL: Severe
PID     : 9697                 TID : 140736968255232 PROC : db2sysc 1
INSTANCE: db2inst1             NODE : 001
HOSTNAME: XXX.XXX.XXX
EDUID   : 23                   EDUNAME: db2clstrRscMon 1
FUNCTION: DB2 UDB, high avail services, rocmMemberHCAMonitorStartSessionRegisterAndDispatch, probe:3608
MESSAGE : ZRC=0x827300AA=-2106392406=HA_ZRC_FAILED "SQLHA API call error"
DATA #1 : String, 27 bytes
Error starting RSCT session
DATA #2 : String, 27 bytes
Instancename, Member Number
DATA #3 : String, 8 bytes
db2inst1
DATA #4 : signed integer, 2 bytes
1
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
  [0] 0x00007FFFC693FDD8 _Z51rocmMemberHCAMonitorStartSessionRegisterAndDispatchP16ROCM_HCA_MONITOR + 0x1B8
  [1] 0x00007FFFC693F9E9 rocmMemberHCAMonitor + 0x39
  [2] 0x000000000041EADB _ZN26sqeMemberAdapterMonitorEdu6RunEDUEv + 0x10B
  [3] 0x00007FFFF1590A27 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
  [4] 0x00007FFFF0D5C3A3 sqloEDUEntry + 0x303
  [5] 0x000000373AC07A51 /lib64/libpthread.so.0 + 0x7A51
  [6] 0x000000373A8E893D clone + 0x6D

2015-12-15-23.28.47.240659-300 I122194E445           LEVEL: Event
PID     : 9697                 TID : 140736968255232 PROC : db2sysc 1
INSTANCE: db2inst1             NODE : 001
HOSTNAME: XXX.XXX.XXX
EDUID   : 23                   EDUNAME: db2clstrRscMon 1
FUNCTION: DB2 UDB, high avail services, rocmMemberHCAMonitorStartSessionRegisterAndDispatch, probe:3693
DATA #1 : rsct error, PD_TYPE_SQLHA_RSCT_ERROR_RETURN, 4 bytes
0x00000000

2015-12-15-23.29.15.594177-300 I122640E532           LEVEL: Event
PID     : 35180                TID : 46912839558752  PROC : db2rocme 1 [db2inst1]
INSTANCE: db2inst1             NODE : 001
HOSTNAME: XXX.XXX.XXX
FUNCTION: DB2 UDB, high avail services, rocmGenerateEvent, probe:888
MESSAGE : Completed fast member departure. FMD valid flag and start/end times
          as follows.
DATA #1 : Boolean, 1 bytes
true
DATA #2 : String, 26 bytes
12/15/2015 23:29:15.164410
DATA #3 : String, 26 bytes
12/15/2015 23:29:15.593999

The "mc_*" calls are used to establish RSCT monitoring of the HCAs on the server (Highspeed Communincation Adapter ports). The code calling rocmMemberHCAMonitorStartSessionRegisterAndDispatch, rocmMemberHCAMonitor, states that establishing this RSCT monitoring can encounter timing issues wrt the underlying RSCT infrastructure. This is a transient state. Due to this transient state, DB2 has built in retries in order to ensure the monitoring is established. DB2 will retry up to 30 times to establish the HCA monitoring. With each iteration that does fail, logging will occur as seen in above db2diag.log snip.
Later in db2diag.log we see that DB2 successfully registered this HCA monitoring on the eighth attempt.

DB2 is working as designed for this transient RSCT state.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm11140472