APAR status
Closed as fixed if next.
Error description
In Big SQL, when an out of memory error occurs from the Java I/O engine, the Java DFSIO FMP will properly get marked unstable to prevent additional requests from being sent to the Java process, and a new Java DFSIO FMP will be created for new requests. In most cases, the outstanding work being handled by that Java process that detected the out of memory can complete successfully, which is desired. However, in some cases, the out of memory leaves the Java process in such a state that communication is no longer possible, which prevents the corresponding Java DFSIO FMP from terminating successfully. In cases where the Java I/O engine continues to hit these severe out of memory conditions, there is a possibility that multiple Java DSFIO FMPs will be in an unstable state, which will then lead to all requests resulting in SQL5199N error with a reason code of 4 indicating that a 3rd Java DFSIO FMP will not be created. This is generally an indication that the memory allocated to Big SQL (INSTANCE_MEMORY) is not sufficient to handle the customer current workload, there is a rogue query causing the OOM, or some other spike in the workload. In these cases, the user application may see the following types of errors: (1) SQL0973N with heap name token of BigSQL IO. This indicates that the Java process has encountered an out of memory error. Big SQL will attempt to be quite resilient to this error, and will mark the corresponding Java DFSIO FMP unstable to prevent more work from being assigned to that process; but the process will remain up and running with the hope that in-flight work being done by that Java process will be able to complete successfully. Big SQL will immediately start up another Java DFSIO FMP and corresponding Java process for new work. In most cases, the in-flight work will complete successfully and the unstable Java DFSIO FMP and Java process will terminate cleanly. The SQL0973N is just an indication that the Java I/O engine process experienced an out of memory condition, but does not indicate an issue with Big SQL. It could be an indication of a heavy workload, too much concurrency or a poorly written SQL statement. The error will be returned to the application: SQL0973N Not enough storage is available in the "BigSQL IO" heap or stack toprocess the statement. SQLSTATE=57011 The following will be logged in the db2diag.log: 2020-04-29-14.26.20.744456-420 I49008E783 LEVEL: Severe PID : 18400 TID : 140426844530432 PROC : db2fmp ( INSTANCE: bigsql NODE : 002 HOSTNAME: 54kam4.fyre.ibm.com FUNCTION: DB2 UDB, routine_infrastructure, sqlerFmpGenerateCommonReply, probe:896 MESSAGE : A Java error occurred. The FMP will be marked unstable. DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes sqlcaid : SQLCA sqlcabc: 136 sqlcode: -973 sqlerrml: 9 sqlerrmc: BigSQL IO sqlerrp : SQLER000 sqlerrd : (1) 0x822401EF (2) 0x00000000 (3) 0x00000000 (4) 0x00000000 (5) 0x00000000 (6) 0x00000002 sqlwarn : (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) sqlstate: 57011 (2) SQL5199N with a reason code token of 1: This indicates that there was some issue with communication with the Java DFSIO FMP (and/or corresponding Java process). This error can follow the above error (SQL0973N / BigSQL IO) due to having a Java DFSIO FMP getting marked unstable and in such a state that communication with the Java process is no longer possible. Any in-flight scans that had work assigned to the now unstable Java DFSIO FMP will fail with this error, but new requests will be successful. The error will be returned to the application: SQL5199N The statement failed because a connection to a Hadoop/External Table I/O component could not be established or maintained. Hadoop/External Table I/O component name: "Java DFSIO". Reason code: "1". Database partition number: "1". SQLSTATE=57067 The following will be logged in the db2diag.log: 2020-04-26-18.34.33.914878-240 I6080192E750 LEVEL: Error PID : 545632 TID : 140381878478592 PROC : db2sysc 1 INSTANCE: mcknight NODE : 001 DB : BIGSQL APPHDL : 0-80 APPID: *N0.mcknight.200426205913 UOWID : 122 ACTID: 1 AUTHID : MCKNIGHT HOSTNAME: hotellnxbi01 EDUID : 7938 EDUNAME: db2agnts (BIGSQL) 1 FUNCTION: DB2 UDB, routine_infrastructure, sqlerTerminateFmp, probe:217 MESSAGE : ZRC=0x822401F0=-2111569424=SQLER_EXT_TABLE_FMP_FAILED "A DB2 Fenced Mode Process abnormally terminated." DATA #1 : String, 38 bytes SQL5199N: Communication error with FMP DATA #2 : String, 10 bytes Java DFSIO DATA #3 : String, 1 bytes 1 (3) SQL5199N with a reason code of 4: This indicates that there are 2 Java DFSIO FMPs which have been marked unstable, and an attempt has been made to create a 3rd Java DFSIO FMP, which is not allowed. If there are already two unstable FMPs it is indicative that the system is in a bad state. This is an indication that the memory allocated to Big SQL (INSTANCE_MEMORY) is not sufficient to handle the workload. Consideration should be made to increasing the amount of INSTANCE_MEMORY that is allocated to each Big SQL. The error will be returned to the application: SQL5199N The statement failed because a connection to a Hadoop/External Table I/O component could not be established or maintained. Hadoop/External Table I/O component name: "Java DFSIO". Reason code: "4". Database partition number: "1". SQLSTATE=57067 The following will be logged in the db2diag.log: 2020-05-01-15.33.19.759664-240 I23692222E1299 LEVEL: Error PID : 3165878 TID : 140145676248832 PROC : db2sysc 1 INSTANCE: mcknight NODE : 001 DB : BIGSQL APPHDL : 0-4242 APPID: *N0.mcknight.200501181009 UOWID : 937 ACTID: 1 AUTHID : MCKNIGHT HOSTNAME: hotellnxbi01 EDUID : 327 EDUNAME: db2agntp (BIGSQL) 1 FUNCTION: DB2 UDB, routine_infrastructure, sqlerGetFmpFromPool, probe:2591 MESSAGE : ZRC=0xFFFFEBB1=-5199 SQL5199N The statement failed because a connection to a Hadoop/External Table I/O component could not be established or maintained. Hadoop/External Table I/O component name: "". Reason code: "". Database partition number: "". DATA #1 : String, 38 bytes SQL5199N: Unable to start Java I/O FMP DATA #2 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes sqlcaid : SQLCA sqlcabc: 136 sqlcode: -5199 sqlerrml: 14 sqlerrmc: Java DFSIO 4 1 sqlerrp : SQLERFME sqlerrd : (1) 0x00000000 (2) 0x00000000 (3) 0x00000000 (4) 0x00000000 (5) 0x00000000 (6) 0x00000001 sqlwarn : (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) sqlstate: 57067 To help prevent the system from becoming completely unstable and requiring a restart of Big SQL, changes have been made to the handling of the Java DFSIO FMP such that if the Java process detects an out of memory condition, the Java DFSIO FMP will be forcefully terminated after a short period of time. This will reduce the likelihood of the customer experiencing the SQL5199N error with a reason code 4 for all requests, and requiring a restart of Big SQL. NOTE: If the memory allocated to Big SQL is not sufficient for the workload being run, there is still the potential for statements to fail with the errors shown above, as well as with other memory related errors. In fact, if a Java DFSIO FMP is terminated due to a Java out of memory condition, there may be several statements which will then fail with a SQL5199N error, reason code 1, since when the FMP is terminated, any in-flight work will now fail.
Local fix
Apply latest patch
Problem summary
Please see problem description.
Problem conclusion
Temporary fix
Comments
APAR Information
APAR number
PH25073
Reported component name
IBM BIG SQL
Reported component ID
5737E7400
Reported release
504
Status
CLOSED FIN
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-05-05
Closed date
2020-09-09
Last modified date
2020-09-09
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Applicable component levels
[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"504"}]
Document Information
Modified date:
10 September 2020