Schema evolution support

Schema evolution is the process of modifying the structure of a database over time as new requirements and data models emerge. It involves changing the schema of tables to accommodate new data elements or changes to existing ones.

Db2 provides extensive support for schema evolution changes on local Db2 tables to accommodate the changing of the table while preserving the integrity and consistency of the data. Support for schema evolution of Datalake tables is more complex and more problematic given that the data is stored externally and is not owned by Db2.

Schema evolution can be challenging, particularly in large or complex databases, so it is recommended to handle such changes very carefully. Some of the challenges of schema evolution include:
  1. Data loss: Changes to the schema can result in the loss of data or errors accessing the data if not managed carefully.
  2. Compatibility issues: Changes to the schema can cause compatibility issues with existing applications.
  3. Performance issues: Changes to the schema can impact the performance of reads and writes to the table.
  4. Data inconsistencies: Changes to the schema can result in data inconsistencies if not managed carefully, which can lead to inaccurate or incomplete data.
For externally managed Datalake tables, schema evolution changes are quite restricted. Only the following changes are allowed:
  • Adding/dropping/altering Informational constraints
    • Unique constraint
      • Unique columns
      • Primary key
    • Referential (foreign key) constraint
    • Check constraint
  • Setting most column options
    • Column constraints
    • Column comment
  • Setting NOT NULL/NULL for a column
  • Enabling/disabling RCAC

For more information on externally managed tables see, Terminology.

Schema evolution changes for Iceberg tables are more restrictive than for Hive Datalake tables most likely to prevent issues with the integrity and consistency of the data. Below is the list of supported datatype changes:
  • For Iceberg tables, the only supported data type changes that are permitted for numeric data types are those that support expanding the size of data type. For example, changing a column defined as an INTEGER to a BIGINT. For DECIMAL types, the precision can be expanded, but the scale cannot be changed.
  • The length of string types can be changed to make the defined length either larger or smaller. Note that if the stored data is larger than the defined length of the column, the data will be truncated when it is returned. An error will occur if the data on insert is larger than the defined length. This applies to all string types.
  • The precision of a TIMESTAMP column can also be changed.

The management of partitioning information for Iceberg tables is different than for Hive Datalake tables. Although additional schema evolution changes are supported for Hive Datalake tables, it is recommended to adhere to those supported for Iceberg tables.

For more details regarding changes to the schema can be made on Datalake tables, see ALTER DATALAKE TABLE statement.