IBM Support

IT29277: IN PURESCALE, WHILE USING I/O DRAWER, DB2 MEMBER MAY GO DOWN WHEN THERE IS A PROBLEM WITH A ROCE PORT

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • When a RoCE port that is configured for HA encounters issues, it
    may result in one of the members going down.
    
    In this case, the db2diag.log shows the following entries:
    
    2018-09-07-05.27.43.138008+540 I2379A709 LEVEL: Severe
    PID : 15597810 TID : 139862 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : DUIT
    APPHDL : 0-10933 APPID: *N0.db2inst1.180905095247
    AUTHID : DB2IRS HOSTNAME: host21
    EDUID : 139862 EDUNAME: db2agent (DUIT) 0
    FUNCTION: DB2 UDB, RAS/PD component, pdLogCaPrintf, probe:876
    DATA #1 : <preformatted>
    xport_send: dat_ep_post_rdma_write of the MCB failed:
    0x80040000. EP: 0x1111177d0
    DATA #1 : <preformatted>
    If a CF return code is displayed above and you wish to get
    more information then please run the following command:
    ...
    2018-09-07-05.27.43.152685+540 I8731A746 LEVEL: Error
    PID : 15597810 TID : 102875 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : DUIT
    HOSTNAME: host21
    EDUID : 102875 EDUNAME: db2XInot GBP 2-0 (DUIT) 0
    FUNCTION: DB2 UDB, RAS/PD component, pdLogCaPrintf, probe:876
    DATA #1 : <preformatted>
    link_status_write: do_dequeue for link status Buffer FAILED dest
    Address: 0x111b86f68 RKEY = 0x4ee00 len = 4, src Address: 0x
    121146ac LKEY = 0x36700 len = 4 status = 0x80090020, ep =
    0x12114c50
    DATA #1 : <preformatted>
    If a CF return code is displayed above and you wish to get
    more information then please run the following command:
    db2diag -cfrc <CF_errcode>
    ...
    2018-09-07-05.27.43.154096+540 I10195A6128 LEVEL: Event
    PID : 15597810 TID : 102875 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : DUIT
    HOSTNAME: host21
    EDUID : 102875 EDUNAME: db2XInot GBP 2-0 (DUIT) 0
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF, SAL_GBP_HANDLE::SAL_CheckXiLink, probe:204
    MESSAGE : CA RC= 2148073504
    DATA #1 : String, 59 bytes
    Detected broken XI connection, attempt reset operation now.
    DATA #2 : Codepath, 8 bytes
    7:15
    DATA #3 : unsigned integer, 8 bytes
    1
    DATA #4 : SAL CF Index, PD_TYPE_SAL_CF_INDEX, 8 bytes
    2
    DATA #5 : SAL CF Node Number, PD_TYPE_SAL_CF_NODE_NUM, 2 bytes
    129
    DATA #6 : String, 49 bytes
    current xi cf-server/member-devname/adapter-index
    DATA #7 : SAL CF Server Name, PD_TYPE_SAL_CF_SERVER_NAME, 13
    bytes
    host22-en1
    DATA #8 : SAL Member Device Name,
    PD_TYPE_SAL_MEMBER_DEVICE_NAME, 4 bytes
    hca0
    DATA #9 : Connection pool link adapter number,
    PD_TYPE_SAL_ADAPTER_NUMBER, 8 bytes
    0
    ...
    2018-09-07-05.27.43.156303+540 I17603A738 LEVEL: Error
    PID : 15597810 TID : 101309 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : DUIT
    HOSTNAME: host21
    EDUID : 101309 EDUNAME: db2LLMn2 (DUIT) 0
    FUNCTION: DB2 UDB, RAS/PD component, pdLogCaPrintf, probe:876
    DATA #1 : <preformatted>
    link_status_write: do_dequeue for link status Buffer FAILED dest
    Address: 0x111b882e8 RKEY = 0x10500 len = 4, src Address: 0x
    185ac29c LKEY = 0x16800 len = 4 status = 0x80090020, ep =
    0x185bd5d0
    DATA #1 : <preformatted>
    If a CF return code is displayed above and you wish to get
    more information then please run the following command:
    db2diag -cfrc <CF_errcode>
    ...
    2018-09-07-05.27.43.161216+540 I21396A630 LEVEL: Error
    PID : 15597810 TID : 101309 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : DUIT
    HOSTNAME: host21
    EDUID : 101309 EDUNAME: db2LLMn2 (DUIT) 0
    FUNCTION: DB2 UDB, RAS/PD component, pdLogCaPrintf, probe:876
    DATA #1 : <preformatted>
    notify_disconnect(close): dat_ep_disconnect failed: 0x80030000,
    EP: 0x1185bd5d0 Token: 0x1a000
    DATA #1 : <preformatted>
    If a CF return code is displayed above and you wish to get
    more information then please run the following command:
    db2diag -cfrc <CF_errcode>
    ...
    2018-09-07-05.27.43.167388+540 I30439A4907 LEVEL: Event
    PID : 15597810 TID : 102106 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : DUIT
    HOSTNAME: host21
    EDUID : 102106 EDUNAME: db2XInot SCA 2-0 (DUIT) 0
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF, SAL_GBP_HANDLE::SAL_CheckXiLink, probe:204
    MESSAGE : CA RC= 2148073504
    DATA #1 : String, 59 bytes
    Detected broken XI connection, attempt reset operation now.
    ...
    2018-09-07-05.27.43.185042+540 E53804A4857 LEVEL: Error
    PID : 15597810 TID : 139862 PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : DUIT
    APPHDL : 0-10933 APPID: *N0.db2inst1.180905095247
    AUTHID : DB2IRS HOSTNAME: host21
    EDUID : 139862 EDUNAME: db2agent (DUIT) 0
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF,
    SAL_MANAGEMENT_PORT_HANDLE::SAL_ManagementQueryKillConnection,
    probe:12678
    MESSAGE : ECF=0x94C6004D=-1798963123
    DATA #1 : CF RC, PD_TYPE_SD_CF_RC, 4 bytes
    2147876941
    
    The stack files shows following stack of functions:
    
    <StackTrace>
    -------Frame------ ------Function + Offset------
    0x090000000057FF14 pthread_kill + 0xD4
    0x090000000057F764 _p_raise + 0x44
    0x0900000000039E68 raise + 0x48
    0x0900000000056864 abort + 0xC4
    0x0900000004A59CF8 sqloExitEDU + 0x298
    0x0900000004ABE0DC sqle_panic__Fi + 0x71C
    0x090000000534DC54
    SAL_ResetXiConnection__14SAL_GBP_HANDLEFR17SAL_XI_RECONN_EDU +
    0x3D54
    0x090000000B4C985C
    SAL_CheckXiLink__14SAL_GBP_HANDLEFR17SAL_XI_RECONN_EDU + 0xC9C
    0x090000000B4C9CF4 RunEDU__17SAL_XI_RECONN_EDUFv + 0x34
    0x0900000004B5EFA0 EDUDriver__9sqzEDUObjFv + 0x2E0
    0x0900000004A53694 sqloEDUEntry + 0x374
    </StackTrace>
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * ALL                                                          *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to Db2 11.1 Mod 4 Fixpack 5 or higher                *
    ****************************************************************
    

Problem conclusion

  • First fixed in Db2 11.1 Mod 4 Fixpack 5
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT29277

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    B10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-05-28

  • Closed date

    2020-01-16

  • Last modified date

    2020-01-16

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

  • RB10 PSN

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.1","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 January 2020