Load considerations for MDC and ITC tables

If you roll data in to your data warehouse on a regular basis, you can use multidimensional clustering (MDC) tables to your advantage. In MDC tables, load first reuses previously emptied blocks in the table before extending the table and adding new blocks for the remaining data.

After you delete a set of data, for example, all the data for a month, you can use the load utility to roll in the next month of data and it can reuse the blocks that were emptied after the (committed) deletion. You can also choose to use the MDC rollout feature with deferred cleanup. After the rollout, which is also a deletion, is committed, the blocks are not free and cannot yet be reused. A background process is invoked to maintain the record ID (RID) based indexes. When the maintenance is complete, the blocks are freed and can be reused. For insert time clustering (ITC) tables, blocks that are not in use are reused where possible before the table is extended. This includes blocks that were reclaimed. Rollout is not supported on ITC tables.

When loading data into MDC tables, the input data can be either sorted or unsorted. If unsorted, and the table has more than one dimension, consider doing the following:

  • Increase the util_heap_sz configuration parameter.

    To improve the performance of the load utility when loading MDC tables, increase the util_heap_sz database configuration parameter value. The mdc-load algorithm performs better when more memory is available to the utility. This reduces disk I/O during the clustering of data that is performed during the load phase. If the LOAD command is being used to load several MDC tables concurrently, util_heap_sz must be increased accordingly.

  • Increase the value given with the DATA BUFFER clause of the LOAD command.

    Increasing this value affects a single load request. The utility heap size must be large enough to accommodate the possibility of multiple concurrent load requests. Beginning in version 9.5, the value of the DATA BUFFER parameter of the LOAD command can temporarily exceed util_heap_sz if more memory is available in the system.

  • Ensure the page size used for the buffer pool is the same as the largest page size for the temporary table space.

    During the load phase, extra logging for the maintenance of the block map is performed. There are approximately two extra log records per extent allocated. To ensure good performance, the logbufsz database configuration parameter must be set to a value that takes this into account.

The following restrictions apply when loading data into MDC or ITC tables:
  • The SAVECOUNT parameter in the LOAD command is not supported.
  • The totalfreespace file type modifier is not supported since these tables manage their own free space.
  • The anyorder file type modifier is required for MDC or ITC tables. If a load is executed into an MDC or ITC table without the anyorder modifier, it is explicitly enabled by the utility.
When using the LOAD command with an MDC or ITC table, violations of unique constraints are handled as follows:
  • If the table included a unique key before the load operation and duplicate records are loaded into the table, the original record remains and the new records are deleted during the delete phase.
  • If the table did not include a unique key prior to the load operation and both a unique key and duplicate records are loaded into the table, only one of the records with the unique key is loaded and the others are deleted during the delete phase.
    Note: There is no explicit technique for determining which record is loaded and which is deleted.

Load begins at a block boundary, so it is best used for data belonging to new cells, for the initial populating of a table, and for loading additional data into ITC tables.

MDC and ITC load operations always have a build phase since all MDC and ITC tables have block indexes.