Bypassing temporary directories to increase the performance of insert operations into object stores

You can bypass temporary directories when you insert data into tables that are located on object stores.

Db2® Big SQL 7.2 and later service This feature requires the Db2 Big SQL 7.2 and later service on IBM Cloud Pak® for Data 4.0 and later.

When INSERT statements are running, Db2 Big SQL stores newly created files in temporary directories by default. The temporary directories are stored on the table's data source, but separate from the table's location. If file creation is successful, the Db2 Big SQL scheduler performs an atomic commit operation to move those files into the table's location without impacting ongoing queries. This works well on data sources, such as HDFS, that provide atomic rename operations. It does not perform as well on data sources that do not provide atomic rename operations. This is the case for most object store platforms, such as Amazon Simple Storage Service (S3) or IBM Cloud® Object Storage (COS). When inserting data into tables on such object store platforms, the insert commit phase can take an excessive amount of time, which is proportional to the amount of committed data.

To increase insert performance in such cases, you can bypass temporary directories when you insert data into tables that are located on object stores. The data is written directly into a table's location. Enabling this feature can significantly improve ingestion performance and reduce the total number of operations that must be run against the object stores. When this feature is enabled, queries that run concurrently with insert operations can read partial results from the ongoing insert operations.

Enabling temporary storage bypass is not compatible with the background automatic partition discovery feature. As a result, partition discovery must be disabled for the table being inserted with bypass temporary storage.

To enable the temporary storage bypass feature, do the following steps:

  1. Enable the temporary storage bypass feature by running the following command:
    db2uctl adm bigsql config --key --value BYPASS

    For more information about updating the Db2 Big SQL configuration on IBM Cloud Pak for Data, see Updating the Db2 Big SQL and Hadoop configuration.

  2. Disable partition discovery for the tables being inserted to by running the following statement:
    ALTER TABLE <schema>.<table> SET TBLPROPERTIES ('discover.partitions' = 'false')