Parallel INSERT in column-organized tables

There are multiple ways of populating a table. For example, the LOAD utility is efficient but is not logged and does not allow concurrent modifications to the table. However, the INSERT statement, on the other hand, is better suited when concurrent access is needed and is logged, but is generally not as fast as LOAD. Parallel INSERT that is supported for a column-organized table as a target can help speed up a bulk INSERT.

INSERT statements can be parallelized in a system with database partitioning. However, in a non-partitioned database, or when more cores are present than database partitions, parallelism is not used to its fullest extent. When intra-query parallelism is enabled, even though the access and processing of the source can be adequately parallelized, the operation of inserting the data into the target table is serialized.

An enhancement to allow parallel insert into column-organized tables is enabled in V11.1.2.2. (In V11.1.1.1, it is also possible to enable parallel insert for experimentation in a test environment with: db2set DB2_EXTENDED_OPTIMIZATION=CDE_PAR_IUD -im). The performance of inserting data in parallel in column-organized tables can be significantly improved. The improved performance depends on how many cores there are in a non-partitioned database or how many cores there are per database partition.

Parallel INSERT overview

The parallel INSERT feature significantly improves the performance of individual INSERT statements and adds to the innovative and powerful technology of BLU Acceleration. In a partitioned database system (DPF), some parallelization at the database partition level is already present even with the feature disabled. Additionally, the parallel INSERT feature parallelizes the INSERT within each database partition when Intra-Query parallelism is configured.

In prior versions of Db2® and when the parallel INSERT feature is disabled, while the query portion of the INSERT statement is parallelized, the INSERT itself is executed serially. With this feature enabled, the INSERT portion of the query uses multiple threads to insert the data to speed up the elapsed time of the statement. This parallel processing is extended to associated operations, which include compression, locking, and logging. The degree of parallelism is automatically chosen appropriate by the optimizer.

INSERT Parallelization applies to both regular column-organized tables and NOT LOGGED column-organized Declared Global Temporary Tables (DGTTs). The source of the INSERT can be row-organized tables or column-organized tables. Only column-organized tables as targets are parallelized.

The DEGREE of parallelism that is chosen for the INSERT by the optimizer is shown in the RETURN operator of the EXPLAIN_ARGUMENT table.

Table 1. Description of the DEGREE argument type
ARGUMENT_TYPE ARGUMENT_VALUE DESCRIPTION
DEGREE INTEGER If the RETURN operator represents the return from column-organized data process of the insertion, update, or deletion of rows, the DEGREE argument indicates the number of column-organized processing subagents that are used to process the insert, update, or delete operations in parallel.

Scenarios where parallel INSERT cannot happen

Parallelism is used for large bulk INSERT because the implementation is such that each parallel LOAD agent puts data into its own pages. As a compromise between performance and disk space wastage, parallelism is not enabled for INSERT statements with a few rows inserted. Such small amounts of data that are entered into separate pages will fragment the table.

INSERT from a VALUES clause is a common scenario that falls under this category. Given that the VALUES clause in a single statement typically contains few rows, INSERT from VALUES is almost always not parallelized. The INGEST command is similar. In the background, this command is sent as multiple individual INSERT with VALUES statements and is not likely to be parallelized within a database partition.

Even for bulk INSERT from regular source tables, Db2 might not know how many rows are going to be produced from the source subquery. The Db2 optimizer decides whether the INSERT is parallelized and to what degree it parallelizes. If it estimates that the number of rows to be inserted is not large enough, it will not parallelize the INSERT portion of the statement.

This is relevant to scenarios where statistics cannot be collected on the source tables or with complex subqueries where the optimizer can underestimate the number of rows in the result of the subquery. As such, it is important to perform RUNSTATS on the source tables if they were not collected or the tables were significantly changed since the last RUNSTATS. It might also be necessary to collect Column Group Statistics as part of the RUNSTATS if multiple predicates exist on a table. Here, 'multiple predicates' can be interpreted as multiple local predicates or multiple join predicates between two tables on columns that might be statistically correlated.

Scenarios where parallel INSERT is not supported

Parallel INSERT is not supported for the following scenarios:

  • Parallel INSERT into row-organized tables is not supported. The source of the INSERT can be either row or column-organized tables.
  • IMPORT is not parallelized as it uses one INSERT statement per row.
  • Compound statements that have nested INSERTs are not parallelized. The MERGE statement and other statements using the data-change-table-reference clause are also not parallelized.
  • A transaction that does INSERT after an uncommitted DELETE or UPDATE requires a COMMIT before the INSERT in order to be parallelized.

Example

A sequence of operations with LOCK TABLE where the existence of a modifying statement (DELETE) before the INSERT prevents parallelism:
  1. LOCK TABLE X IN EXCLUSIVE MODE
  2. DELETE * FROM X
  3. INSERT INTO X SELECT * FROM Y [Not Parallelized]
  4. COMMIT