Bypassing temporary directories to increase the performance of insert operations into object stores

You can bypass temporary directories when you insert data into tables that are located on object stores.

Db2® Big SQL 7.2 service This feature requires the Db2 Big SQL 7.2 service on IBM Cloud Pak® for Data 4.0.

When INSERT statements are running, Db2 Big SQL stores newly created files in temporary directories by default. The temporary directories are stored on the table's data source, but separate from the table's location. If file creation is successful, the Db2 Big SQL scheduler performs an atomic commit operation to move those files into the table's location without impacting ongoing queries. This works well on data sources, such as HDFS, that provide atomic rename operations. It does not perform as well on data sources that do not provide atomic rename operations. This is the case for most object store platforms, such as Amazon Simple Storage Service (S3) or IBM Cloud® Object Storage (COS). When inserting data into tables on such object store platforms, the insert commit phase can take an excessive amount of time, which is proportional to the amount of committed data.

To increase insert performance in such cases, you can bypass temporary directories when you insert data into tables that are located on object stores. The data is written directly into a table's location. Enabling this feature can significantly improve ingestion performance and reduce the total number of operations that must be run against the object stores. When this feature is enabled, queries that run concurrently with insert operations can read partial results from the ongoing insert operations.

To enable this feature, set the value of the bigsql.insert.temporary.directory.bypass.enabled property in bigsql-conf.xml to true. The default is false.