Iceberg table file maintenance
When working with Apache Iceberg tables, each INSERT operation will produce at least one new data file per table partition that is touched by the insert.
Similarly, a DELETE operation will write a delete file for every impacted table partition (typically positional delete files containing a list of deleted records (rows) in the data files).
SELECT operations on the table require data and delete files to be opened and read (and merged),
so a large number of (potentially small) files will impact the read performance for a table. In
order to maintain good read performance, file maintenance needs to be done with operation such as:
- Consolidating small data files into fewer files with a certain target file size
- Resolving delete files in a way that data files are rewritten without the deleted records. This will reduce the overall size of all data files and allows the delete files to be removed, thus reducing the number of files that need to be read
Db2 offers table functions that can be used to generate reports about a tables condition and a stored procedure to execute maintenance operations for consolidating a tables data and delete files. See REWRITE_TABLE_FILES, TABLE_FILES, TABLE_PARTITIONS, and DELETE_ORPHAN_FILES for more information.