The ingest utility (sometimes referred to as continuous
data ingest, or CDI) is a high-speed client-side DB2® utility that streams data from files and
pipes into DB2 target
tables. Because the ingest utility can move large amounts of real-time
data without locking the target table, you do not need to choose between
the data currency and availability.
The ingest utility ingests pre-processed data
directly or from files output by ETL tools or other means. It can
run continually and thus it can process a continuous data stream through
pipes. The data is ingested at speeds that
are high enough to populate even large databases in partitioned database environments.
An INGEST command updates the target table with
low latency in a single step. The ingest utility uses row locking,
so it has minimal interference with other user activities on the same
table.
With this utility, you can perform DML operations on a table using
a SQL-like interface without locking the target table. These ingest
operations support the following SQL statements: INSERT, UPDATE, MERGE,
REPLACE, and DELETE. The ingest utility also supports the use of SQL
expressions to build individual column values from more than one data
field.
Other important features of the ingest utility include:
- Commit by time or number of rows. You can use the commit_count ingest
configuration parameter to have commit frequency determined by the
number of written rows or use the default commit_period ingest
configuration parameter to have commit frequency determined by a specified
time.
- Support for copying rejected records to a file or table, or
discarding them. You can specify what the INGEST command
does with rows rejected by the ingest utility (using the DUMPFILE parameter)
or by DB2 (using the EXCEPTION
TABLE parameter).
- Support for restart and recovery. By default,
all INGEST commands are restartable from the last
commit point. In addition, the ingest utility attempts to recover
from certain errors if you have set the retry_count ingest
configuration parameter.
The
INGEST command supports the following input
data formats:
- Delimited text
- Positional text and binary
- Columns in various orders and formats
In addition to regular tables and nicknames, the
INGEST command
supports the following table types:
- multidimensional clustering (MDC) and insert time clustering (ITC)
tables
- range-partitioned tables
- range-clustered tables (RCT)
- materialized query tables (MQTs) that are defined as MAINTAINED
BY USER, including summary tables
- temporal tables
- updatable views (except typed views)
A single INGEST command goes
through three major phases: - 1. Transport
- The transporters read from the data source and put records on
the formatter queues. For INSERT and MERGE operations, there is one
transporter thread for each input source (for example, one thread
for each input file). For UPDATE and DELETE operations, there is only
one transporter thread.
- 2. Format
- The formatters parse each record, convert the data into the format
that DB2 database systems require,
and put each formatted record on one of the flusher queues for that
record's partition. The number of formatter threads is specified
by the num_formatters configuration parameter.
The default is (number of logical CPUs)/2.
- 3. Flush
- The flushers issue the SQL statements to perform the operations
on the DB2 tables. The number
of flushers for each partition is specified by the num_flushers_per_partition configuration
parameter. The default is max( 1, ((number of logical CPUs)/2)/(number
of partitions) ).