OOM due to concurrent INSERTs and SELECTs on Datalake tables
System might run out of memory (OOM) when there is high concurrency of INSERT and SELECT statements running on Datalake and Iceberg tables.
Symptoms
You might see the following
errors:
Server error message: Not enough storage is available in the "BigSQL IO" heap or stack to process the statement..
SQLCODE=-973, SQLSTATE=57011, DRIVER=4.24.92
Server error message: The statement failed because a connection to a Hadoop/External Table I/O component could not be established or maintained. Hadoop/External Table I/O component name: "SCHEDULER". Reason code: "1". Database partition number: "0"..
SQLCODE=-5199, SQLSTATE=57067, DRIVER=4.24.92
Server error message: The statement failed because of a communication error with a Db2 Big SQL component. Db2 Big SQL component name: "SCHEDULER". Reason code: "1". Log entry identifier: "DB2"..
SQLCODE=-5197, SQLSTATE=57066, DRIVER=4.24.92
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL5105N The statement failed because a Big SQL component encountered an
error. Component receiving the error: "BigSQL IO". Component returning the
error: "UNKNOWN". Log entry identifier: "[BSL-1-4b4bcfa]". Reason: "Buffer
size too small. size = ". SQLSTATE=58040
You might see the following type of errors in the bigsql.log on the
host:
023-03-03T05:48:23,768 ERROR com.ibm.biginsights.bigsql.dfsrw.reader.DfsBaseReader [Master-1-S:41.1.1000003.10.0.7926] : [BSL-1-6a603ab58] Exception raised by Reader at node: 1 Scan ID: S:41.1.1000003.10.0.7926 Table: gt.atable20 Spark: false VORC: false VPQ: true VAVRO: false VTEXT: false VRCFILE: false VANALYZE: false ICEBERG: true
Exception Label: UNMAPPED(java.lang.OutOfMemoryError: Java heap space)
java.lang.OutOfMemoryError: Java heap space
at java.lang.StringBuilder.<init>(StringBuilder.java:129) ~[?:2.9 (04-27-2022)]
at java.lang.StringBuilder.<init>(StringBuilder.java:106) ~[?:2.9 (04-27-2022)]
at org.apache.hadoop.fs.s3a.Invoker.toDescription(Invoker.java:509) ~[hadoop-aws-3.1.1.7.1.6.0-297.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:312) ~[hadoop-aws-3.1.1.7.1.6.0-297.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:286) ~[hadoop-aws-3.1.1.7.1.6.0-297.jar:?]
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:477) ~[hadoop-aws-3.1.1.7.1.6.0-297.jar:?]
at java.io.DataInputStream.read(DataInputStream.java:160) ~[?:1.8.0]
at org.apache.parquet.io.DelegatingSeekableInputStream.read(DelegatingSeekableInputStream.java:66) ~[parquet-common.jar:1.10.99.7.1.6.0-297]
at shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ~[parquet-format-structures.jar:1.10.99.7.1.6.0-297]
at shaded.parquet.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) ~[parquet-format-structures.jar:1.10.99.7.1.6.0-297]
at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:637) ~[parquet-format-structures.jar:1.10.99.7.1.6.0-297
2023-05-07T11:28:14,587 ERROR com.ibm.biginsights.bigsql.dfsrw.reader.DfsBaseReader [Master-1-S:26.1001.1.0.0.28220] : [BSL-1-7f5f82afb] Exception raised by Reader at node: 1 Scan ID: S:26.1001.1.0.0.28220 Table: gt05c.table94_integrity Spark: false VORC: true VPQ: false VAVRO: false VTEXT: false VRCFILE: false VANALYZE: false ICEBERG: false
Exception Label: UNMAPPED(java.lang.IllegalArgumentException: Buffer size too small. size = 32768 needed = 1099203 in footer)
java.lang.IllegalArgumentException: Buffer size too small. size = 32768 needed = 1099203 in footer
Causes
These OOM issues occur when there are too many concurrent clients running in the system. If processes run out of memory, this can cause unstable I/O engine FMP buffers to throw errors or shut down in some cases.
Resolving the problem
The problem can be mitigated by reducing the number of concurrent clients (especially running INSERTs).