Ingest utility
The ingest utility (sometimes referred to as continuous data ingest, or CDI) is a high-speed client-side Db2® utility that streams data from files and pipes into Db2 target tables. Because the ingest utility can move large amounts of real-time data without locking the target table, you do not need to choose between the data currency and availability.
The ingest utility ingests pre-processed data directly or from files output by ETL tools or other means. It can run continually and thus it can process a continuous data stream through pipes. The data is ingested at speeds that are high enough to populate even large databases in partitioned database environments.
An INGEST command updates the target table with low latency in a single step. The ingest utility uses row locking, so it has minimal interference with other user activities on the same table.
With this utility, you can perform DML operations on a table using a SQL-like interface without locking the target table. These ingest operations support the following SQL statements: INSERT, UPDATE, MERGE, REPLACE, and DELETE. The ingest utility also supports the use of SQL expressions to build individual column values from more than one data field.
- Commit by time or number of rows. You can use the commit_count ingest configuration parameter to have commit frequency determined by the number of written rows or use the default commit_period ingest configuration parameter to have commit frequency determined by a specified time.
- Support for copying rejected records to a file or table, or discarding them. You can specify what the INGEST command does with rows rejected by the ingest utility (using the DUMPFILE parameter) or by Db2 (using the EXCEPTION TABLE parameter).
- Support for restart and recovery. By default, all INGEST commands are restartable from the last commit point. In addition, the ingest utility attempts to recover from certain errors if you have set the retry_count ingest configuration parameter.
- Delimited text
- Positional text and binary
- Columns in various orders and formats
- multidimensional clustering (MDC) and insert time clustering (ITC) tables
- range-partitioned tables
- range-clustered tables (RCT)
- materialized query tables (MQTs) that are defined as MAINTAINED BY USER, including summary tables
- temporal tables
- updatable views (except typed views)
- 1. Transport
- The transporters read from the data source and put records on the formatter queues. For INSERT and MERGE operations, there is one transporter thread for each input source (for example, one thread for each input file). For UPDATE and DELETE operations, there is only one transporter thread.
- 2. Format
- The formatters parse each record, convert the data into the format that Db2 database systems require, and put each formatted record on one of the flusher queues for that record's partition. The number of formatter threads is specified by the num_formatters configuration parameter. The default is (number of logical CPUs)/2.
- 3. Flush
- The flushers issue the SQL statements to perform the operations on the Db2 tables. The number of flushers for each partition is specified by the num_flushers_per_partition configuration parameter. The default is max( 1, ((number of logical CPUs)/2)/(number of partitions) ).