IBM Support

PH58963: INTEGRATED SYNCHRONIZATION HANGS DUE TO A 'JAVA HEAP OUT OF MEMORY' SITUATION OF THE REPLICATION PROCESS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as documentation error.

Error description

  • Very high workload (for example a very long-running query) could
    impact the processing of Integrated Synchronization: replication
    latency could increase while the Java heap memory available for
    the replication process could be consumed entirely.
    In such a 'out of memory' situation of the JVM (Java Virtual
    Machine), the replication (instance) process does not shut down
    completely which leads to a situation where the replication
    instance hangs / does no longer respond.
    The DSNX881I 3003 message can be an indicator for a hanging
    replication instance (msg details: "The replication status for
    DB2 location (subscription ) is missing, check that the
    replication capture agent: is running, has valid credentials, is
    attached to DB2 and is reachable under <> from the Accelerator
    network.")
    
    When restarting the replication instance, the following
    information is seen:
    "The Insync process terminated due to Java heap storage
    shortage:
    2023-08-31 23:45:40 SEVERE
    dwa.replication.apply.utilities.TerminatingUncaughtExceptionHand
    ler uncaughtException: Thread TransactionAgeWarnThread threw an
    exception. Triggering instance restart.
    java.lang.OutOfMemoryError: Java heap space"
    
    The behavior of Integrated Synchronization when facing JVM 'out
    of memory'-situations will be improved with Accelerator
    maintenance level 7.5.13.
    
    Additional keywords:
    TS014054011 Insync JVM OutOfMemoryError DSNX881I 3003 hang
    GH../Everest/Customer-Cases/issues/610
    DT257688
    

Local fix

Problem summary

  • Problem Summary:
    When the Java Virtual Machine (JVM) runs out of memory, the
    replication (instance) process does not shut down completely
    which leads to a situation where the replication instance hangs
    / does no longer respond.
    
    Users Affected:
    Users of Integrated Synchronization
    
    Problem Scenario:
    If the integrated synchronization is executing an extremely
    heavy workload or a bunch of open transactions are being
    tracked, it is possible that the Java heap memory available for
    the replication process could be consumed entirely. The
    replication instance hangs / does no longer respond on the
    occurrence of such an event.
    
    Problem Symptoms:
    The DSNX881I 3003 message can be an indicator for a hanging
    replication instance (msg details: "The replication status for
    DB2 location (subscription ) is missing, check that the
    replication capture agent: is running, has valid credentials, is
    attached to DB2 and is reachable under <> from the Accelerator
    network.")
    
    When restarting the replication instance, the following
    information is seen:
    "The Insync process terminated due to Java heap storage
    shortage:
    2023-08-31 23:45:40 SEVERE dwa.replication.apply.utilities.Term
    inatingUncaughtExceptionHandleruncaughtException: Thread Transac
    exception. Triggering instance restart.
    java.lang.OutOfMemoryError: Java heap space".
    

Problem conclusion

  • The issue has been fixed with Accelerator maintenance level
    7.5.13.
    
    A new JVM parameter makes sure that the JVM shuts down on the
    very first occurrence of the out of memory error. Thus the
    internal synchronization process can no longer get stuck in a
    non responsive state.
    
    Upgrade your Accelerator maintenance level accordingly.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PH58963

  • Reported component name

    ANYTCS ACCLTR Z

  • Reported component ID

    5697DA700

  • Reported release

    750

  • Status

    CLOSED DOC

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2023-12-31

  • Closed date

    2024-08-26

  • Last modified date

    2024-10-10

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":"BU011","label":"Systems - zSystems software"},"Product":{"code":"SG19M"},"Platform":[{"code":"PF054","label":"z Systems"}],"Version":"750"}]

Document Information

Modified date:
10 October 2024