SELECT might fail when run with INSERT on a Datalake table
The SELECT statement might fail with [SQLCODE=-5105] [SQLSTATE=58040] when there is concurrent insert into the same Datalake table.
Symptoms
You might see the following
error:
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL5105N The statement failed because a Big SQL component encountered an error.
Component receiving the error: "BigSQL IO". Component returning the error: "UNKNOWN".
Log entry identifier: "[BSL-2-72a8b078d]". Reason: "No such file or directory: s3a".
SQLSTATE=58040You might see the following type of error in the bigsql.log on the
host:
2023-03-28T23:26:07,304 ERROR com.ibm.biginsights.bigsql.dfsrw.reader.DfsBaseReader [Master-2-S:22.1001.1.0.0.1534] : [BSL-2-72a8b07c8] Exception raised by Reader at node: 2 Scan ID: S:22.1001.1.0.0.1534 Table: entrepot.mix_olap18 Spark: false VORC: false VPQ: true VAVRO: false VTEXT: false VRCFILE: false VANALYZE: false ICEBERG: false
Exception Label: UNMAPPED(java.io.FileNotFoundException: No such file or directory: s3a://qabucket4/rlthomas/March28/odf/greentea/28032023/ENTREPOT/MIX_OLAP18/_TEMP_1680044525412_793414083_20230328112554334/i_1680044525412_793414083_20230328112554334_2.0)Causes
When a table is in the middle of processing an INSERT statement and a SELECT statement is run on the same table, the scan might pick up the temp file as a file to read, but it later disappears when the commit for write is finalized.
Resolving the problem
Run the INSERT statement and populate the Datalake or Iceberg tables with data first. Then, run the SELECT statement workloads on the tables. This prevents concurrent running of INSERT and SELECT statements on the same table.