Concurrent INSERTs might fail on an Iceberg table
The INSERT statement might fail with [SQLCODE=-5105] [SQLSTATE=58040] when concurrent clients run inserts into the same Iceberg table.
Symptoms
You might see the following
error:
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL5105N The statement failed because a Big SQL component encountered an
error. Component receiving the error: "SCHEDULER". Component returning the
error: "SCHEDULER". Log entry identifier: "[SCL-0-19cb855]". Reason: "".
SQLSTATE=58040You might see the following type of errors in the bigsql-sched.log on the
scheduler
host:
2023-05-26T23:57:59,326 ERROR com.ibm.biginsights.bigsql.scheduler.javarw.DfsInsertContext [TThreadPoolServer WorkerProcess-%d] : [SCL-0-5a7f689d] Failed to commit job for INSERT id: SQLZ_BIG_SQL_SCHEDULER_SCAN_ID(appHandle:SQLZ_APP_HDL(coordAgentIndex:16614, coor_node:0), privateScanSourceNode:1001, numLogicalNodes:0, coordWorkItemCounter:16, hadoopScanID:1, subtaskId:0, opType:INSERT).
org.apache.iceberg.exceptions.CommitFailedException: Timed out after 184718 ms waiting for lock on gt.t_icem1169
at org.apache.iceberg.hive.HiveCommitLock.acquireLockFromHms(HiveCommitLock.java:174) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
at org.apache.iceberg.hive.HiveCommitLock.acquire(HiveCommitLock.java:104) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:170) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
at org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$5(BaseTransaction.java:403) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:402) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:212) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
at org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:394) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:277) ~[hive-iceberg-handler-4.0.0-alpha-2-12-IBM.jar:4.0.0-alpha-2]
at com.ibm.biginsights.bigsql.dfsrw.scheduler.DfsWriterCommitHandler.commitIcebergExecute(DfsWriterCommitHandler.java:1026) ~[bigsql-dfsrw.jar:?]
at com.ibm.biginsights.bigsql.dfsrw.scheduler.DfsWriterCommitHandler.commit(DfsWriterCommitHandler.java:509) ~[bigsql-dfsrw.jar:?]
at com.ibm.biginsights.bigsql.scheduler.javarw.DfsInsertContext.performFinalize(DfsInsertContext.java:302) ~[bigsql-sched.jar:?]
at com.ibm.biginsights.bigsql.scheduler.javarw.DfsInsertContext.finalize(DfsInsertContext.java:230) ~[bigsql-sched.jar:?]
at com.ibm.biginsights.bigsql.scheduler.server.SplitScheduler.dropContext(SplitScheduler.java:131) ~[bigsql-sched.jar:?]
at com.ibm.biginsights.bigsql.scheduler.server.SchedulerServiceHandler.endQuery(SchedulerServiceHandler.java:559) ~[bigsql-sched.jar:?]
at com.ibm.biginsights.bigsql.scheduler.server.thrift.SchedulerService$Processor$endQuery.getResult(SchedulerService.java:3672) ~[bigsql-thrift.jar:?]
at com.ibm.biginsights.bigsql.scheduler.server.thrift.SchedulerService$Processor$endQuery.getResult(SchedulerService.java:3651) ~[bigsql-thrift.jar:?]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) ~[libthrift-0.14.1.jar:0.14.1]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) ~[libthrift-0.14.1.jar:0.14.1]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) ~[libthrift-0.14.1.jar:0.14.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) ~[?:1.8.0]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:1.8.0]
at java.lang.Thread.run(Thread.java:825) ~[?:2.9 (09-29-2022)]2023-05-30T05:37:29,176 ERROR com.ibm.biginsights.bigsql.scheduler.javarw.DfsInsertContext [TThreadPoolServer WorkerProcess-%d] : [SCL-0-6b294e58] Failed to commit job for INSERT id: SQLZ_BIG_SQL_SCHEDULER_SCAN_ID(appHandle:SQLZ_APP_HDL(coordAgentIndex:406, coor_node:0), privateScanSourceNode:1001, numLogicalNodes:0, coordWorkItemCounter:31, hadoopScanID:1, subtaskId:0, opType:INSERT).
com.ibm.biginsights.bigsql.dfsrw.DfsException: null
at com.ibm.biginsights.bigsql.dfsrw.scheduler.DfsWriterCommitHandler.commit(DfsWriterCommitHandler.java:623) ~[bigsql-dfsrw.jar:?]
at com.ibm.biginsights.bigsql.scheduler.javarw.DfsInsertContext.performFinalize(DfsInsertContext.java:302) ~[bigsql-sched.jar:?]
at com.ibm.biginsights.bigsql.scheduler.javarw.DfsInsertContext.finalize(DfsInsertContext.java:230) ~[bigsql-sched.jar:?]
at com.ibm.biginsights.bigsql.scheduler.server.SplitScheduler.dropContext(SplitScheduler.java:131) ~[bigsql-sched.jar:?]
at com.ibm.biginsights.bigsql.scheduler.server.SchedulerServiceHandler.endQuery(SchedulerServiceHandler.java:559) ~[bigsql-sched.jar:?]
at com.ibm.biginsights.bigsql.scheduler.server.thrift.SchedulerService$Processor$endQuery.getResult(SchedulerService.java:3672) ~[bigsql-thrift.jar:?]
at com.ibm.biginsights.bigsql.scheduler.server.thrift.SchedulerService$Processor$endQuery.getResult(SchedulerService.java:3651) ~[bigsql-thrift.jar:?]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) ~[libthrift-0.14.1.jar:0.14.1]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) ~[libthrift-0.14.1.jar:0.14.1]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) ~[libthrift-0.14.1.jar:0.14.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) ~[?:1.8.0]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:1.8.0]
at java.lang.Thread.run(Thread.java:825) ~[?:2.9 (09-29-2022)]Causes
The suspected cause of these issues is lock contention between concurrent write operations on the table. This failure occurs after the lock times out and the value is not configurable.
Resolving the problem
Run less concurrent insert clients on Iceberg tables to avoid a lock timeout. This reduces the probability of having multiple clients trying to write to a single table at the same time.