IBM Support

IC95562: EVEN AFTER FIX OF APAR IC79798 UPDATABLE HDR SECONDARIES CAN GETWHERE THEY NEVER GET OUT OF UPDATES BLOCKED STATE

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • The fix for IC79798 on updatable hdr secondary tied the smx
    timeouts and HDR ping timeouts together internally.  So once a
    hdr pair is up and operational with the fix for IC79798 the
    servers hdr ping timeouts and smx thread timeouts should be in
    sync.  However, if a temporary network glitch happens during the
    reconnect phase of hdr (when the primary is shipping log files
    to the secondary to get it caught up to the current position)
    the primary server is not sending pings, so no ping timeouts can
    happen at this point.  Since the server will not ping timeout,
    if the network problem is temporary, hdr will stay connected and
    the log shipping would finish and the servers would get caught
    up and put in to an operational state (where then hdr pinging
    would start).  But if the temporary problem is long enough that
    the servers would have ping timed out (if it was pinging) the
    SMX timeout can still happen.  If it does, the updatable
    secondary will then be stuck in it's updates blocked state.
    
    This is what you would see in the MSGPATH files on the primary
    and secondary servers:
    
    primary:
    
    08:04:48  DR: Primary server connected
    08:04:48  DR: Using default behavior of failure-recovering
    Secondary server
    
    08:04:50  DR: Sending log 4, size 2500 pages, 100.00 percent
    used
    08:04:51  DR: Sending log 5, size 2500 pages, 100.00 percent
    used
    08:04:53  DR: Sending log 6, size 2500 pages, 100.00 percent
    used
    08:04:55  DR: Sending log 7, size 2500 pages, 100.00 percent
    used
    08:04:57  DR: Sending log 8, size 2500 pages, 100.00 percent
    used
    08:07:00  SMX thread is exiting
    08:07:00  SMX thread is exiting
    08:07:01  DR: Sending log 9, size 2500 pages, 100.00 percent
    used
    08:07:02  DR: Sending log 10, size 2500 pages, 100.00 percent
    used
    08:07:05  Logical Log 11 Complete, timestamp: 0x7f90f.
    08:07:05  DR: Sending log 11, size 2500 pages, 30.40 percent
    used
    08:07:05  DR: Sending log 12 (current), size 2500 pages, 0.16
    percent used
    08:07:07  DR: Sending Logical Logs Completed
    08:07:08  DR: Primary server operational
    
    So you can see the SMX thread is exiting message at
    8:07:00...the network issue happened after 8:04:57 and then
    resolved itself at 8:07:01 (when the next sending log message
    appears)
    
    Secondary:
    08:04:47  DR: Secondary server connected
    ...
    08:04:49  DR: Failure recovery from disk in progress ...
    08:04:49  Logical Recovery Started.
    08:04:49  17 recovery worker threads will be started.
    08:04:49  Start Logical Recovery - Start Log 4, End Log ?
    08:04:49  Starting Log Position - 4 0x1e018
    08:04:50  Started processing open transactions on secondary
    during startup
    08:04:50  Finished processing open transactions on secondary
    during startup.
    08:04:50  DR: HDR secondary server operational
    08:04:50  Memory sizes:resident:217760 KB, virtual:57232 KB, no
    SHMTOTAL limit
    08:04:52  Logical Log 4 Complete, timestamp: 0x35e48.
    08:04:54  Logical Log 5 Complete, timestamp: 0x42e3e.
    08:04:56  Logical Log 6 Complete, timestamp: 0x4e8b3.
    08:04:58  Logical Log 7 Complete, timestamp: 0x599f9.
    08:06:00  SMX thread is exiting because the timeout period of 10
    seconds has elapsed. Use
    the IFX_SMX_TIMEOUT environment variable to set the timeout
    period.
    08:06:02  SMX thread is exiting because the timeout period of 10
    seconds has elapsed. Use
    the IFX_SMX_TIMEOUT environment variable to set the timeout
    period.
    08:07:00  Updates from secondary currently not allowed
    08:07:00  Updates from secondary currently not allowed
    08:07:01  Logical Log 8 Complete, timestamp: 0x64b5a.
    08:07:02  Logical Log 9 Complete, timestamp: 0x7181c.
    08:07:05  Logical Log 10 Complete, timestamp: 0x7e02e.
    08:07:06  Logical Log 11 Complete, timestamp: 0x7f940.
    08:07:08  B-tree scanners disabled.
    08:07:10  Checkpoint Completed:  duration was 0 seconds.
    
    DRTIMEOUT was set to 10, so if primary server was pinging, it
    would have timed out after 40 seconds of interuption but it had
    gone on for ~ 2 minutes (or 120 seconds) and still hadn't done
    it's hdr ping time out, due to the fact that the primary does
    not send pings until after it logs its "DR: primary server
    operational" message to MSGPATH.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Users of a HDR with UPDATABLE_SECONDARY > 0 in certain cases *
    * when temporary network interruptions occur.                  *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Update to IDS-11.70.xC8                                      *
    ****************************************************************
    

Problem conclusion

  • Problem Fixed In IDS-11.70.xC8
    

Temporary fix

Comments

APAR Information

  • APAR number

    IC95562

  • Reported component name

    INFORMIX SERVER

  • Reported component ID

    5725A3900

  • Reported release

    B70

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2013-08-28

  • Closed date

    2014-02-26

  • Last modified date

    2024-09-24

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    INFORMIX SERVER

  • Fixed component ID

    5725A3900

Applicable component levels

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"B70","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
24 September 2024