Concurrent INSERTs might fail on an Iceberg table

The INSERT statement might fail with [SQLCODE=-5105] [SQLSTATE=58040] when concurrent clients run inserts into the same Iceberg table.

Symptoms

You might see the following error:
DB21034E  The command was processed as an SQL statement because it was not a 
valid Command Line Processor command.  During SQL processing it returned:
SQL5105N  The statement failed because a Big SQL component encountered an 
error.  Component receiving the error: "SCHEDULER".  Component returning the 
error: "SCHEDULER".  Log entry identifier: "[SCL-0-19cb855]".  Reason: "".  
SQLSTATE=58040
You might see the following type of errors in the bigsql-sched.log on the scheduler host:
2023-05-26T23:57:59,326 ERROR com.ibm.biginsights.bigsql.scheduler.javarw.DfsInsertContext [TThreadPoolServer WorkerProcess-%d] : [SCL-0-5a7f689d] Failed to commit job for INSERT id: SQLZ_BIG_SQL_SCHEDULER_SCAN_ID(appHandle:SQLZ_APP_HDL(coordAgentIndex:16614, coor_node:0), privateScanSourceNode:1001, numLogicalNodes:0, coordWorkItemCounter:16, hadoopScanID:1, subtaskId:0, opType:INSERT).
org.apache.iceberg.exceptions.CommitFailedException: Timed out after 184718 ms waiting for lock on gt.t_icem1169
        at org.apache.iceberg.hive.HiveCommitLock.acquireLockFromHms(HiveCommitLock.java:174) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
        at org.apache.iceberg.hive.HiveCommitLock.acquire(HiveCommitLock.java:104) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
        at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:170) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
        at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
        at org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$5(BaseTransaction.java:403) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
        at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:402) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
        at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:212) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
        at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
        at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
        at org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:394) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
        at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:277) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
        at com.ibm.biginsights.bigsql.dfsrw.scheduler.DfsWriterCommitHandler.commitIcebergExecute(DfsWriterCommitHandler.java:1026) ~[bigsql-dfsrw.jar:?]
        at com.ibm.biginsights.bigsql.dfsrw.scheduler.DfsWriterCommitHandler.commit(DfsWriterCommitHandler.java:509) ~[bigsql-dfsrw.jar:?]
        at com.ibm.biginsights.bigsql.scheduler.javarw.DfsInsertContext.performFinalize(DfsInsertContext.java:302) ~[bigsql-sched.jar:?]
        at com.ibm.biginsights.bigsql.scheduler.javarw.DfsInsertContext.finalize(DfsInsertContext.java:230) ~[bigsql-sched.jar:?]
        at com.ibm.biginsights.bigsql.scheduler.server.SplitScheduler.dropContext(SplitScheduler.java:131) ~[bigsql-sched.jar:?]
        at com.ibm.biginsights.bigsql.scheduler.server.SchedulerServiceHandler.endQuery(SchedulerServiceHandler.java:559) ~[bigsql-sched.jar:?]
        at com.ibm.biginsights.bigsql.scheduler.server.thrift.SchedulerService$Processor$endQuery.getResult(SchedulerService.java:3672) ~[bigsql-thrift.jar:?]
        at com.ibm.biginsights.bigsql.scheduler.server.thrift.SchedulerService$Processor$endQuery.getResult(SchedulerService.java:3651) ~[bigsql-thrift.jar:?]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) ~[libthrift-0.14.1.jar:0.14.1]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) ~[libthrift-0.14.1.jar:0.14.1]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) ~[libthrift-0.14.1.jar:0.14.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) ~[?:1.8.0]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:1.8.0]
        at java.lang.Thread.run(Thread.java:825) ~[?:2.9 (09-29-2022)]
2023-05-30T05:37:29,176 ERROR com.ibm.biginsights.bigsql.scheduler.javarw.DfsInsertContext [TThreadPoolServer WorkerProcess-%d] : [SCL-0-6b294e58] Failed to commit job for INSERT id: SQLZ_BIG_SQL_SCHEDULER_SCAN_ID(appHandle:SQLZ_APP_HDL(coordAgentIndex:406, coor_node:0), privateScanSourceNode:1001, numLogicalNodes:0, coordWorkItemCounter:31, hadoopScanID:1, subtaskId:0, opType:INSERT).
com.ibm.biginsights.bigsql.dfsrw.DfsException: null
        at com.ibm.biginsights.bigsql.dfsrw.scheduler.DfsWriterCommitHandler.commit(DfsWriterCommitHandler.java:623) ~[bigsql-dfsrw.jar:?]
        at com.ibm.biginsights.bigsql.scheduler.javarw.DfsInsertContext.performFinalize(DfsInsertContext.java:302) ~[bigsql-sched.jar:?]
        at com.ibm.biginsights.bigsql.scheduler.javarw.DfsInsertContext.finalize(DfsInsertContext.java:230) ~[bigsql-sched.jar:?]
        at com.ibm.biginsights.bigsql.scheduler.server.SplitScheduler.dropContext(SplitScheduler.java:131) ~[bigsql-sched.jar:?]
        at com.ibm.biginsights.bigsql.scheduler.server.SchedulerServiceHandler.endQuery(SchedulerServiceHandler.java:559) ~[bigsql-sched.jar:?]
        at com.ibm.biginsights.bigsql.scheduler.server.thrift.SchedulerService$Processor$endQuery.getResult(SchedulerService.java:3672) ~[bigsql-thrift.jar:?]
        at com.ibm.biginsights.bigsql.scheduler.server.thrift.SchedulerService$Processor$endQuery.getResult(SchedulerService.java:3651) ~[bigsql-thrift.jar:?]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) ~[libthrift-0.14.1.jar:0.14.1]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) ~[libthrift-0.14.1.jar:0.14.1]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) ~[libthrift-0.14.1.jar:0.14.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) ~[?:1.8.0]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:1.8.0]
        at java.lang.Thread.run(Thread.java:825) ~[?:2.9 (09-29-2022)]

Causes

The suspected cause of these issues is lock contention between concurrent write operations on the table. This failure occurs after the lock times out and the value is not configurable.

Resolving the problem

Run less concurrent insert clients on Iceberg tables to avoid a lock timeout. This reduces the probability of having multiple clients trying to write to a single table at the same time.