IBM Support

IC68160: ONTAPE -L CORRUPTS SECONDARY

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • A secondary server needs failure recovery,  and it cannot reach
    the primary server.
    when starting the secondary we get:
    
    14:50:08  Physical Recovery Started at Page (1:8442).
    14:50:08  Physical Recovery Complete: 0 Pages Examined, 0 Pages
    Restored.
    14:50:08  DR: Trying to connect to primary server = ol_pr1150fc6
    14:50:09  DR: Cannot connect to primary server
    14:50:09  DR: Turned off on secondary server
    14:50:09  Dataskip is now OFF for all dbspaces
    14:50:09  Restartable Restore has been ENABLED
    14:50:09  Recovery Mode
    
    
    the secondary hangs there.
    
    now we do an ontape -l  to get it online:
    C:\sqldists\1150fc6>ontape -l
    
    Roll forward should start with log number 111
    Restore will use log backup file
    c:\ltape\chokmah_9_Log0000000111. Press Return to continue ...
    
    Rollforward log file c:\ltape\chokmah_9_Log0000000111 ...
    buc_fe.c : Archive API processing failed at line 908 for msgtype
    Do you want to suspend (y), commit (n) or abort (CTRL/C) logical
    restore?
    
    this will lead to the following:
    14:53:58  10 recovery worker threads will be started.
    14:53:58  Start Logical Recovery - Start Log 111, End Log ?
    14:53:58  Starting Log Position - 111 0x436110
    14:54:02  rsrecvr.c 4585 set EOF
    14:54:02  No logical log restore will be performed.
    14:54:02  Clearing the physical and logical logs has started
    14:54:04  Cleared 103 MB of the physical and logical logs in 2
    seconds
    14:54:05  Physical Recovery Started at Page (1:7142).
    14:54:05  Physical Recovery Complete: 0 Pages Examined, 0 Pages
    Restored.
    14:54:05  Logical Recovery Started.
    14:54:05  10 recovery worker threads will be started.
    14:54:05  Started processing open transactions on secondary
    during startup
    14:54:06  Finished processing open transactions on secondary
    during startup.
    14:54:06  DR: HDR secondary server operational
    14:54:06  Logical Recovery has reached the transaction cleanup
    phase.
    14:54:06  Logical Recovery Complete.
       13 Committed, 1 Rolled Back, 0 Open, 0 Bad Locks
    
    14:54:06  Begin recreating indexes deferred during recovery.
    14:54:06  Bringing system to Quiescent Mode with no Logical
    Restore.
    
    14:54:07  Quiescent Mode
    14:54:07  Completed recreating indexes.
    14:54:08  Checkpoint Completed:  duration was 0 seconds.
    14:54:08  Tue Feb 23 - loguniq 111, logpos 0x453018, timestamp:
    0x1ae7cb2 Interval: 2964
    
    14:54:08  Maximum server connections 1
    14:54:08  Checkpoint Statistics - Avg. Txn Block Time 0.000, #
    Txns blocked 1, Plog used
    
    ontape will bring the engine to quiescent mode
    if the primary comes online again, the secondary states:
    
    15:49:32  DR: Secondary server connected
    15:49:32  Updates from secondary allowed
    15:49:32  Updates from secondary allowed
    15:49:32  DR: Secondary server needs failure recovery
    
    15:49:35  DR: Start failure recovery from tape ...
    15:49:40  DR: Multi-User mode
    15:49:41  Recovery Mode
    15:49:44  Updates from secondary currently not allowed
    15:49:44  Updates from secondary currently not allowed
    15:49:44  Updates from secondary currently not allowed
    15:49:44  Updates from secondary currently not allowed
    
    
    if you do an ontape -l the following will happen:
    
    15:55:30  DR: Start failure recovery from tape ...
    15:55:31  Logical Recovery Started.
    15:55:31  10 recovery worker threads will be started.
    15:55:31  Start Logical Recovery - Start Log 111, End Log ?
    15:55:31  Starting Log Position - 111 0x453018
    15:55:34  Requested shared memory segment size rounded from
    532596KB to 532608KB
    15:55:34  Dynamically allocated new virtual shared memory
    segment (size 532608KB)
    15:55:34  Memory sizes:resident:47616 KB, virtual:565312 KB, no
    SHMTOTAL limit
    15:55:34  Checkpoint Record not Found in Logical Log.
    15:55:34  Assert Failed: Invalid Mutex Type
    15:55:34  IBM Informix Dynamic Server Version 11.50.FC6
    15:55:34   Who: Session(15, informix@HDR1, 1540,
    0000000000000000)
                    Thread(38, ontape, 0, 1)
                    File: mt.c Line: 8617
    15:55:34  stack trace for pid 1568 written to
    C:\sqldists\1150fc6\tmp\af.40eec66
    15:55:42   See Also: C:\sqldists\1150fc6\tmp\af.40eec66,
    shmem.40eec66.0
    15:55:43  Updates from secondary currently not allowed
    15:55:43  Updates from secondary currently not allowed
    15:55:43  Updates from secondary currently not allowed
    15:55:43  Updates from secondary currently not allowed
    
    this is the stack of the af file:
    
    15:55:34  Stack for thread: 38 ontape
    
     base: 0x0000000083870000
      len:   69632
       pc: 0x0000000140b58cd4
      tos: 0x000000008387edf0
    state: running
       vp: 1
    
    0x0000000140B59C1A (oninit)afhandler
    0x0000000140B5A4AF (oninit)afcrash_interface
    0x0000000140B4A18C (oninit)mt_lock
    0x000000014057EF88 (oninit)_igetstreamQ
    0x000000014070F6C2 (oninit)rswrite_lgr
    0x0000000140288A5F (oninit)tbj_log_write_restore
    0x0000000140061228 (oninit)sqmain
    0x0000000140BD0D20 (oninit)listen_verify
    0x0000000140BD220E (oninit)spawn_thread
    0x0000000140B8616A (oninit)startup
    IFX_ONTAPE_FILE_PRE
    
    and the ontape will also fail with an error message
    C:\Windows\system32>ontape -l
    
    Roll forward should start with log number 111
    Restore will use log backup file
    c:\ltape\chokmah_9_Log0000000111. Press Return to continue ...
    
    Rollforward log file c:\ltape\chokmah_9_Log0000000111 ...
    buc_fe.c : Archive API processing failed at line 908 for msgtype
    Do you want to suspend (y), commit (n) or abort (CTRL/C) logical
    restore?
    
    Do you want to suspend (y), commit (n) or abort (CTRL/C) logical
    restore?
    y
    
    Program over.
    
    
    the secondary is now corrupted, it cannot even be patched, it
    needs to be re-initialized.
    
    ontape should check whether or not it is running on a secondary
    and not do the cleaning of
    logical logs and going to quiescent mode.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Users of HDR environment.                                    *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * Performing log restore (ontape -l, or via onbar) on a HDR    *
    * secondary server when the primary server is unavailable can  *
    * cause the HDR pair to fail synchronization when the primary  *
    * is back online.                                              *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to 11.50.xC9 when available.                         *
    ****************************************************************
    

Problem conclusion

  • Problem fixed in 11.50.xC9.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IC68160

  • Reported component name

    IBM IDS ENTRP E

  • Reported component ID

    5724L2304

  • Reported release

    B15

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2010-04-25

  • Closed date

    2011-09-27

  • Last modified date

    2011-09-27

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    IBM IDS ENTRP E

  • Fixed component ID

    5724L2304

Applicable component levels

  • RB15 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"B15","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
27 September 2011