IBM Support

IT29277: IN PURESCALE, WHILE USING I/O DRAWER, DB2 MEMBER MAY GO DOWN WHEN THERE IS A PROBLEM WITH A ROCE PORT

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • When a RoCE port that is configured for HA encounters issues, it
    may result in one of the members going down.
    
    In this case, the db2diag.log shows the following entries:
    
    2018-09-07-05.27.43.138008+540 I2379A709 LEVEL: Severe
    PID : 15597810 TID : 139862 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : MYDB1
    APPHDL : 0-10933 APPID: *N0.db2inst1.180905095247
    AUTHID : DB2INST2 HOSTNAME: host21
    EDUID : 139862 EDUNAME: db2agent (MYDB1) 0
    FUNCTION: DB2 UDB, RAS/PD component, pdLogCaPrintf, probe:876
    DATA #1 : <preformatted>
    xport_send: dat_ep_post_rdma_write of the MCB failed:
    0x80040000. EP: 0x1111177d0
    DATA #1 : <preformatted>
    If a CF return code is displayed above and you wish to get
    more information then please run the following command:
    ...
    2018-09-07-05.27.43.152685+540 I8731A746 LEVEL: Error
    PID : 15597810 TID : 102875 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : MYDB1
    HOSTNAME: host21
    EDUID : 102875 EDUNAME: db2XInot GBP 2-0 (MYDB1) 0
    FUNCTION: DB2 UDB, RAS/PD component, pdLogCaPrintf, probe:876
    DATA #1 : <preformatted>
    link_status_write: do_dequeue for link status Buffer FAILED dest
    Address: 0x111b86f68 RKEY = 0x4ee00 len = 4, src Address: 0x
    121146ac LKEY = 0x36700 len = 4 status = 0x80090020, ep =
    0x12114c50
    DATA #1 : <preformatted>
    If a CF return code is displayed above and you wish to get
    more information then please run the following command:
    db2diag -cfrc <CF_errcode>
    ...
    2018-09-07-05.27.43.154096+540 I10195A6128 LEVEL: Event
    PID : 15597810 TID : 102875 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : MYDB1
    HOSTNAME: host21
    EDUID : 102875 EDUNAME: db2XInot GBP 2-0 (MYDB1) 0
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF, SAL_GBP_HANDLE::SAL_CheckXiLink, probe:204
    MESSAGE : CA RC= 2148073504
    DATA #1 : String, 59 bytes
    Detected broken XI connection, attempt reset operation now.
    DATA #2 : Codepath, 8 bytes
    7:15
    DATA #3 : unsigned integer, 8 bytes
    1
    DATA #4 : SAL CF Index, PD_TYPE_SAL_CF_INDEX, 8 bytes
    2
    DATA #5 : SAL CF Node Number, PD_TYPE_SAL_CF_NODE_NUM, 2 bytes
    129
    DATA #6 : String, 49 bytes
    current xi cf-server/member-devname/adapter-index
    DATA #7 : SAL CF Server Name, PD_TYPE_SAL_CF_SERVER_NAME, 13
    bytes
    host22-en1
    DATA #8 : SAL Member Device Name,
    PD_TYPE_SAL_MEMBER_DEVICE_NAME, 4 bytes
    hca0
    DATA #9 : Connection pool link adapter number,
    PD_TYPE_SAL_ADAPTER_NUMBER, 8 bytes
    0
    ...
    2018-09-07-05.27.43.156303+540 I17603A738 LEVEL: Error
    PID : 15597810 TID : 101309 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : MYDB1
    HOSTNAME: host21
    EDUID : 101309 EDUNAME: db2LLMn2 (MYDB1) 0
    FUNCTION: DB2 UDB, RAS/PD component, pdLogCaPrintf, probe:876
    DATA #1 : <preformatted>
    link_status_write: do_dequeue for link status Buffer FAILED dest
    Address: 0x111b882e8 RKEY = 0x10500 len = 4, src Address: 0x
    185ac29c LKEY = 0x16800 len = 4 status = 0x80090020, ep =
    0x185bd5d0
    DATA #1 : <preformatted>
    If a CF return code is displayed above and you wish to get
    more information then please run the following command:
    db2diag -cfrc <CF_errcode>
    ...
    2018-09-07-05.27.43.161216+540 I21396A630 LEVEL: Error
    PID : 15597810 TID : 101309 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : MYDB1
    HOSTNAME: host21
    EDUID : 101309 EDUNAME: db2LLMn2 (MYDB1) 0
    FUNCTION: DB2 UDB, RAS/PD component, pdLogCaPrintf, probe:876
    DATA #1 : <preformatted>
    notify_disconnect(close): dat_ep_disconnect failed: 0x80030000,
    EP: 0x1185bd5d0 Token: 0x1a000
    DATA #1 : <preformatted>
    If a CF return code is displayed above and you wish to get
    more information then please run the following command:
    db2diag -cfrc <CF_errcode>
    ...
    2018-09-07-05.27.43.167388+540 I30439A4907 LEVEL: Event
    PID : 15597810 TID : 102106 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : MYDB1
    HOSTNAME: host21
    EDUID : 102106 EDUNAME: db2XInot SCA 2-0 (MYDB1) 0
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF, SAL_GBP_HANDLE::SAL_CheckXiLink, probe:204
    MESSAGE : CA RC= 2148073504
    DATA #1 : String, 59 bytes
    Detected broken XI connection, attempt reset operation now.
    ...
    2018-09-07-05.27.43.185042+540 E53804A4857 LEVEL: Error
    PID : 15597810 TID : 139862 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : MYDB1
    APPHDL : 0-10933 APPID: *N0.db2inst1.180905095247
    AUTHID : DB2INST2 HOSTNAME: host21
    EDUID : 139862 EDUNAME: db2agent (MYDB1) 0
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF,
    SAL_MANAGEMENT_PORT_HANDLE::SAL_ManagementQueryKillConnection,
    probe:12678
    MESSAGE : ECF=0x94C6004D=-1798963123
    DATA #1 : CF RC, PD_TYPE_SD_CF_RC, 4 bytes
    2147876941
    
    The stack files shows following stack of functions:
    
    <StackTrace>
    -------Frame------ ------Function + Offset------
    0x090000000057FF14 pthread_kill + 0xD4
    0x090000000057F764 _p_raise + 0x44
    0x0900000000039E68 raise + 0x48
    0x0900000000056864 abort + 0xC4
    0x0900000004A59CF8 sqloExitEDU + 0x298
    0x0900000004ABE0DC sqle_panic__Fi + 0x71C
    0x090000000534DC54
    SAL_ResetXiConnection__14SAL_GBP_HANDLEFR17SAL_XI_RECONN_EDU +
    0x3D54
    0x090000000B4C985C
    SAL_CheckXiLink__14SAL_GBP_HANDLEFR17SAL_XI_RECONN_EDU + 0xC9C
    0x090000000B4C9CF4 RunEDU__17SAL_XI_RECONN_EDUFv + 0x34
    0x0900000004B5EFA0 EDUDriver__9sqzEDUObjFv + 0x2E0
    0x0900000004A53694 sqloEDUEntry + 0x374
    </StackTrace>
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * ALL                                                          *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to Db2 11.1 Mod 4 Fixpack 5 or higher                *
    ****************************************************************
    

Problem conclusion

  • First fixed in Db2 11.1 Mod 4 Fixpack 5
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT29277

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    B10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-05-28

  • Closed date

    2020-01-16

  • Last modified date

    2022-03-29

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

  • RB10 PSN

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"DB2 for Linux- UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.1","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
04 May 2022