IBM Support

PH25073: JAVA I/O ENGINE OUT OF MEMORY ISSUES CAN CAUSES SYSTEM INSTABILITY (SQL0973N, SQL5199N RC 1, SQL5199N RC 4)

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as fixed if next.

Error description

  • In Big SQL, when an out of memory error occurs from the Java
    I/O engine, the Java DFSIO FMP will properly get marked
    unstable to prevent additional requests from being sent to the
    Java process, and a new Java DFSIO FMP will be created for new
    requests. In most cases, the outstanding work being handled by
    that Java process that detected the out of memory can complete
    successfully, which is desired. However, in some cases, the out
    of memory leaves the Java process in such a state
    that
    communication is no longer possible, which prevents the
    corresponding Java DFSIO FMP from terminating successfully. In
    cases where the Java I/O engine continues to hit these severe
    out of memory conditions, there is a possibility that multiple
    Java DSFIO FMPs will be in an unstable state, which will then
    lead to all requests resulting in SQL5199N error with a reason
    code of 4 indicating that a 3rd Java DFSIO FMP will not be
    created. This is generally an indication that the memory
    allocated to Big SQL (INSTANCE_MEMORY) is not sufficient to
    handle the customer current workload, there is a rogue query
    causing the OOM, or some other spike in the workload.
    
    In these
    cases, the user application may see the following types of
    errors:
    
    (1) SQL0973N with heap name token of BigSQL IO.
    This
    indicates that the Java process has encountered an out of
    memory error. Big SQL will attempt to be quite resilient to
    this error, and will mark the corresponding Java DFSIO FMP
    unstable to prevent more work from being assigned to that
    process; but the process will remain up and running with the
    hope that in-flight work being done by that Java process will
    be able to complete successfully. Big SQL will immediately
    start up another Java DFSIO FMP and corresponding Java process
    for new work. In most cases, the in-flight work will complete
    successfully and the unstable Java DFSIO FMP and Java process
    will terminate cleanly. The SQL0973N is just an indication that
    the Java I/O engine process experienced an out of memory
    condition, but does not indicate an issue with Big SQL. It
    could be an indication of a heavy workload, too much
    concurrency or a poorly written SQL statement.
    
    The error will
    be returned to the application:
    
    SQL0973N Not enough storage is
    available in the "BigSQL IO" heap or stack toprocess the
    statement. SQLSTATE=57011
    
    The following will be logged in the
    db2diag.log:
    
    2020-04-29-14.26.20.744456-420 I49008E783 LEVEL:
    Severe
    PID : 18400 TID : 140426844530432 PROC : db2fmp
    (
    INSTANCE: bigsql NODE : 002
    HOSTNAME:
    54kam4.fyre.ibm.com
    FUNCTION: DB2 UDB, routine_infrastructure,
    sqlerFmpGenerateCommonReply, probe:896
    MESSAGE : A Java error
    occurred. The FMP will be marked unstable.
    DATA #1 : SQLCA,
    PD_DB2_TYPE_SQLCA, 136 bytes
    sqlcaid : SQLCA sqlcabc: 136
    sqlcode: -973 sqlerrml: 9
    sqlerrmc: BigSQL IO
    sqlerrp :
    SQLER000
    sqlerrd : (1) 0x822401EF (2) 0x00000000 (3)
    0x00000000
    (4) 0x00000000 (5) 0x00000000 (6) 0x00000002
    sqlwarn
    : (1) (2) (3) (4) (5) (6)
    (7) (8) (9) (10) (11)
    sqlstate:
    57011
    
    (2) SQL5199N with a reason code token of 1:
    This
    indicates that there was some issue with communication with the
    Java DFSIO FMP (and/or corresponding Java process). This error
    can follow the above error (SQL0973N / BigSQL IO) due to having
    a Java DFSIO FMP getting marked unstable and in such a state
    that communication with the Java process is no longer possible.
    Any in-flight scans that had work assigned to the now unstable
    Java DFSIO FMP will fail with this error, but new requests will
    be successful.
    
    The error will be returned to the
    application:
    
    SQL5199N The statement failed because a
    connection to a Hadoop/External Table I/O component could not
    be established or maintained. Hadoop/External Table I/O
    component name: "Java DFSIO". Reason code: "1". Database
    partition number: "1". SQLSTATE=57067
    
    The following will be
    logged in the db2diag.log:
    
    2020-04-26-18.34.33.914878-240
    I6080192E750 LEVEL: Error
    PID : 545632 TID : 140381878478592
    PROC : db2sysc 1
    INSTANCE: mcknight NODE : 001 DB :
    BIGSQL
    APPHDL : 0-80 APPID: *N0.mcknight.200426205913
    UOWID :
    122 ACTID: 1
    AUTHID : MCKNIGHT HOSTNAME: hotellnxbi01
    EDUID :
    7938 EDUNAME: db2agnts (BIGSQL) 1
    FUNCTION: DB2 UDB,
    routine_infrastructure, sqlerTerminateFmp, probe:217
    MESSAGE :
    ZRC=0x822401F0=-2111569424=SQLER_EXT_TABLE_FMP_FAILED
    "A DB2
    Fenced Mode Process abnormally terminated."
    DATA #1 : String,
    38 bytes
    SQL5199N: Communication error with FMP
    DATA #2 :
    String, 10 bytes
    Java DFSIO
    DATA #3 : String, 1 bytes
    1
    
    (3)
    SQL5199N with a reason code of 4:
    This indicates that there are
    2 Java DFSIO FMPs which have been marked unstable, and an
    attempt has been made to create a 3rd Java DFSIO FMP, which is
    not allowed. If there are already two unstable FMPs it is
    indicative that the system is in a bad state. This is an
    indication that the memory allocated to Big SQL
    (INSTANCE_MEMORY) is not sufficient to handle the workload.
    Consideration should be made to increasing the amount of
    INSTANCE_MEMORY that is allocated to each Big SQL.
    
    The error
    will be returned to the application:
    
    SQL5199N The statement
    failed because a connection to a Hadoop/External Table I/O
    component could not be established or maintained.
    Hadoop/External Table I/O component name: "Java DFSIO". Reason
    code: "4". Database partition number: "1". SQLSTATE=57067
    
    The
    following will be logged in the
    db2diag.log:
    
    2020-05-01-15.33.19.759664-240 I23692222E1299
    LEVEL: Error
    PID : 3165878 TID : 140145676248832 PROC : db2sysc
    1
    INSTANCE: mcknight NODE : 001 DB : BIGSQL
    APPHDL : 0-4242
    APPID: *N0.mcknight.200501181009
    UOWID : 937 ACTID: 1
    AUTHID :
    MCKNIGHT HOSTNAME: hotellnxbi01
    EDUID : 327 EDUNAME: db2agntp
    (BIGSQL) 1
    FUNCTION: DB2 UDB, routine_infrastructure,
    sqlerGetFmpFromPool, probe:2591
    MESSAGE :
    ZRC=0xFFFFEBB1=-5199
    SQL5199N The statement failed because a
    connection to a
    Hadoop/External Table I/O component could not
    be established or
    maintained. Hadoop/External Table I/O
    component name: "". Reason
    code: "". Database partition number:
    "".
    
    DATA #1 : String, 38 bytes
    SQL5199N: Unable to start Java
    I/O FMP
    DATA #2 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
    sqlcaid :
    SQLCA sqlcabc: 136 sqlcode: -5199 sqlerrml: 14
    sqlerrmc: Java
    DFSIO 4 1
    sqlerrp : SQLERFME
    sqlerrd : (1) 0x00000000 (2)
    0x00000000 (3) 0x00000000
    (4) 0x00000000 (5) 0x00000000 (6)
    0x00000001
    sqlwarn : (1) (2) (3) (4) (5) (6)
    (7) (8) (9) (10)
    (11)
    sqlstate: 57067
    
    To help prevent the system from becoming
    completely unstable and requiring a restart of Big SQL, changes
    have been made to the handling of the Java DFSIO FMP such that
    if the Java process detects an out of memory condition, the
    Java DFSIO FMP will be forcefully terminated after a short
    period of time. This will reduce the likelihood of the customer
    experiencing the SQL5199N error with a reason code 4 for all
    requests, and requiring a restart of Big SQL.
    
    NOTE: If the
    memory allocated to Big SQL is not sufficient for the workload
    being run, there is still the potential for statements to fail
    with the errors shown above, as well as with other memory
    related errors. In fact, if a Java DFSIO FMP is terminated due
    to a Java out of memory condition, there may be several
    statements which will then fail with a SQL5199N error, reason
    code 1, since when the FMP is terminated, any in-flight work
    will now fail.
    

Local fix

  • Apply latest patch
    

Problem summary

  • Please see problem description.
    

Problem conclusion

Temporary fix

Comments

APAR Information

  • APAR number

    PH25073

  • Reported component name

    IBM BIG SQL

  • Reported component ID

    5737E7400

  • Reported release

    504

  • Status

    CLOSED FIN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-05-05

  • Closed date

    2020-09-09

  • Last modified date

    2020-09-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"504"}]

Document Information

Modified date:
10 September 2020