IBM Support

IC65284: RECOVERY ERRORS ON SDS CLONE CAUSED BY DISK FLUSHER ACTIVITY ON PRIMARY

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • On SDS secondary servers several different recovery errors can
    show up
    sporadically if disk flushers run on the primary between
    checkpoints. Most
    but probably not all problems are also related to previous btree
    scanner
    activity on the primary.
    
    Some of the possible symptoms:
    
    15:41:04  Assert Warning: Error during recovery left index
    inconsistent.
    15:41:04  IBM Informix Dynamic Server Version 11.50.FC5
    15:41:04   Who: Session(19, informix@sirius, 0, 10a9da050)
                    Thread(47, xchg_1.1, 10a9a1988, 5)
                    File: rskey.c Line: 1655
    15:41:04   Results: Index 'd:"michaelm".t#ix_c' is now unusable
    15:41:04   Action: Run 'oncheck -cI d:"michaelm".t#ix_c'
    15:41:04  stack trace for pid 29595 written to
    /chunks/michaelm/af/af.4179480
    15:41:05   See Also: /chunks/michaelm/af/af.4179480
    15:41:05  Error during recovery left index inconsistent.
    15:41:05  Rollforward of log record failed. iserrno = 105
    15:41:05  Log Record: log = 47, pos = 0x23100f4, type =
    OLDRSAM:ADDITEM(28), tra
    ns = 22
    15:41:05  Rollforward of log record failed. iserrno = 105
    15:41:05  Log Record: log = 47, pos = 0x23100f4, type =
    OLDRSAM:ADDITEM(28), tra
    ns = 22
    
    15:41:04  Stack for thread: 47 xchg_1.1
    
     base: 0x000000010bef1000
      len:   69632
       pc: 0x0000000100dd0e40
      tos: 0x000000010beff8e1
    state: running
       vp: 5
    
     ucontext: <NULL>
      siginfo: <NULL>
    
    oninit :: afstack + 0x5c sp=0x10bf000e0(...)
    oninit :: afhandler + 0xd68 sp=0x10bf005f0 delta_sp=1296(...)
    oninit :: afwarn_interface + 0x3c sp=0x10bf00d70
    delta_sp=1920(...)
    oninit :: kybad + 0xbf8 sp=0x10bf00e30 delta_sp=192(...)
    oninit :: doadditem + 0x9d8 sp=0x10bf012a0 delta_sp=1136(...)
    oninit :: plogredo + 0x185c sp=0x10bf01760 delta_sp=1216(...)
    oninit :: rlogm_redo + 0x284 sp=0x10bf019c0 delta_sp=608(...)
    oninit :: next_recvr + 0x96c sp=0x10bf01a90 delta_sp=208(...)
    oninit :: prod_loop2 + 0x3c sp=0x10bf01c40 delta_sp=432(...)
    oninit :: producer_thread + 0x2b0 sp=0x10bf01d00
    delta_sp=192(...)
    oninit :: startup + 0xa8 sp=0x10bf01e50 delta_sp=336(...)
    
    16:01:37  Assert Failed: Page Check Error in dobtmerge:bad right
    btree node
    16:01:37  IBM Informix Dynamic Server Version 11.50.FC5
    16:01:37   Who: Session(19, informix@sirius, 0, 10a9e8dd0)
                    Thread(53, xchg_1.7, 10a9a4b68, 9)
                    File: rsdebug.c Line: 1105
    16:01:37   Results: Possible inconsistencies in index
    'd:"michaelm".t012#ix_t012
    _c'
    16:01:37   Action: Run 'oncheck -cI d:"michaelm".t012#ix_t012_c'
    16:01:37  stack trace for pid 690 written to
    /chunks/michaelm/af/af.41d9951
    16:01:37   See Also: /chunks/michaelm/af/af.41d9951
    16:01:44  Page Check Error in dobtmerge:bad right btree node
    16:01:44  Assert Warning: Error during recovery left index
    inconsistent.
    16:01:44  IBM Informix Dynamic Server Version 11.50.FC5
    16:01:44   Who: Session(19, informix@sirius, 0, 10a9e8dd0)
                    Thread(53, xchg_1.7, 10a9a4b68, 9)
                    File: rskey.c Line: 1655
    16:01:44   Results: Index 'd:"michaelm".t012#ix_t012_c' is now
    unusable
    16:01:44   Action: Run 'oncheck -cI d:"michaelm".t012#ix_t012_c'
    16:01:44  stack trace for pid 690 written to
    /chunks/michaelm/af/af.41d9951
    16:01:45   See Also: /chunks/michaelm/af/af.41d9951
    16:01:45  Error during recovery left index inconsistent.
    16:01:46  Rollforward of log record failed. iserrno = 105
    16:01:46  Log Record: log = 50, pos = 0x497e788, type =
    OLDRSAM:BTMERGE(59), tra
    ns = 19
    16:01:46  Rollforward of log record failed. iserrno = 105
    16:01:46  Log Record: log = 50, pos = 0x497e788, type =
    OLDRSAM:BTMERGE(59), tra
    ns = 19
    
    16:01:37  Stack for thread: 53 xchg_1.7
    
     base: 0x000000010bf89000
      len:   69632
       pc: 0x0000000100dd0e40
      tos: 0x000000010bf97c51
    state: running
       vp: 9
    
     ucontext: <NULL>
      siginfo: <NULL>
    
    oninit :: afstack + 0x5c sp=0x10bf98450(...)
    oninit :: afhandler + 0xd68 sp=0x10bf98960 delta_sp=1296(...)
    oninit :: affail_interface + 0x3c sp=0x10bf990e0
    delta_sp=1920(...)
    oninit :: bffail + 0x6b4 sp=0x10bf991a0 delta_sp=192(...)
    oninit :: dobtmerge + 0x5bc sp=0x10bf99250 delta_sp=176(...)
    oninit :: plogredo + 0x1d78 sp=0x10bf99760 delta_sp=1296(...)
    oninit :: rlogm_redo + 0x284 sp=0x10bf999c0 delta_sp=608(...)
    oninit :: next_recvr + 0x96c sp=0x10bf99a90 delta_sp=208(...)
    oninit :: prod_loop2 + 0x3c sp=0x10bf99c40 delta_sp=432(...)
    oninit :: producer_thread + 0x2b0 sp=0x10bf99d00
    delta_sp=192(...)
    oninit :: startup + 0xa8 sp=0x10bf99e50 delta_sp=336(...)
    
    This defect may be a candidate for any recovery error on SDS
    clones if there
    is flusher activity on the primary.
    

Local fix

  • Configure LRU and checkpoint parameters in a way that no flusher
    activity  (onstat -F, column LRU Writes) occurs on the primary
    
    primary.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All users on SDS                                             *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * Log roll forward errors on Shared Disk Secondary nodes when  *
    * there are moderate to high level of index updates.           *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to 11.50.xC7 and above.                              *
    ****************************************************************
    

Problem conclusion

  • Problem first fixed in 11.50.xC7.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IC65284

  • Reported component name

    IBM IDS ENTRP E

  • Reported component ID

    5724L2304

  • Reported release

    B15

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2009-12-22

  • Closed date

    2010-10-01

  • Last modified date

    2010-10-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    IBM IDS ENTRP E

  • Fixed component ID

    5724L2304

Applicable component levels

  • RB15 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"B15","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
01 October 2010