ALTER DATALAKE TABLE statement

The ALTER DATALAKE TABLE statement alters the definition of a data table.

Invocation

This statement can only be executed as a dynamic statement using EXECUTE IMMEDIATE. It is an executable statement that can be dynamically prepared only if DYNAMICRULES run behavior is in effect for the package (SQLSTATE 42509).

Authorization

The privileges that are held by the authorization ID of the statement must include at least one of the following authorities:

ALTER privilege on the table to be altered
CONTROL privilege on the table to be altered
ALTERIN privilege on the schema of the table
SCHEMAADM authority on the schema of the table
DBADM authority

To alter a table to ACTIVATE and DEACTIVATE row and column access control, the privileges that are held by the authorization ID of the statement must include the SECADM authority.

Syntax

Description

table-nameThe table-name must identify an existing Datalake table.

ADD column-definition

Defines the attributes of a new column. The column-name cannot be qualified, and the same name cannot be used for more than one column of the table. For a description of the valid data types, see data-types.

If a column is added to a table on which a mask or a permission is defined, or to a table that is referenced in the definition of a mask or a permission, that mask or permission is invalidated. Access to a table that activates column access control and a defined invalid mask on it is blocked until the invalid mask is either disabled, dropped, or re-created (SQLSTATE 560D0). Access to a table that activates row access control and a defined invalid row permission on it is blocked until the invalid permission is either disabled, dropped, or re-created (SQLSTATE 560D0).

column-optionsYou can define column options including a comment, the nullability attribute, or constraints.

COMMENT column-comment

A comment that provides additional information about the column and which is stored in the internal Hive metastore.

NULL | NOT NULL

NOT NULL specifies that the column will not contain null values, which can be problematic for tables which contain existing data, so this is not recommended. The Datalake table is external and it is not modified as part of the ALTER processing. As a result, errors can occur when reading existing table data which does not have data for the new column (SQL5104N or SQL5105N). The nullability attribute of a column is enforced during an INSERT operation. Do not assign a null value to a column that is defined as NOT NULL (SQLSTATE 23502)

CONSTRAINT constraint-name

Specifies the name of a constraint. You can define the following constraints on a column:

PRIMARY KEY: A primary key must be unique and must have the NOT NULL attribute. A table cannot have more than one primary key.
UNIQUE: A unique key prevents duplicate values in one or more NOT NULL columns of a table. A table can have multiple unique keys.
REFERENCES table-name: Ensures that required table relationships are maintained and that data entry rules are followed. The column on which this parameter is specified becomes the foreign key to the referenced parent table table-name.

constraint-attributesDefines attributes that are associated with primary key, unique, referential integrity, or check constraints.

NOT ENFORCED

Specifies that the constraint is not enforced by the database manager. A primary key or unique constraint cannot be NOT ENFORCED if there is a dependent ENFORCED referential constraint.

TRUSTED | NOT TRUSTED

Specifies whether the data can be trusted to conform to the constraint.

TRUSTED: Specifies that the data can be trusted to conform to the constraint. This is the default. Use this option only if the table data is independently known to conform to the constraint; otherwise, unpredictable results might occur.
NOT TRUSTED: Specifies that the data cannot be trusted to conform to the constraint. This option is valid only for referential integrity constraints (SQLSTATE 42613). Use this option when the data conforms to the constraint for most rows, but it is not independently known whether all current and future rows will conform. If a constraint is both not trusted and enabled for query optimization, it will not be used for any optimization that depends on the data being in complete conformity with the constraint.

ADD constraint

Constraints are not enforced, but are used to help optimize query performance.

unique-constraintDefines a unique key. This constraint does not allow duplicate values in one or more columns of the table. A table can have multiple unique keys. The columns that are specified in a unique constraint must be defined as NOT NULL.

referential-constraintDefines a referential constraint.

CONSTRAINT constraint-name: Specifies a name for the referential constraint.
FOREIGN KEY column-name, ): The foreign key of the referential constraint is composed of the identified columns. Each name in the list of column names must identify a column of the parent table, and the same column must not be identified more than once. The number of foreign key columns must be the same as the number of parent key columns, and the data types of the corresponding columns must be compatible (SQLSTATE 42830).

check-constraintDefines a check constraint.

constraint-nameSpecifies a name for the check constraint.

CHECK check-condition

Sets restrictions on data that is added to a specific table. This parameter is enforced during INSERT operations or on data that is made available outside of the database manager (by using Hive operations, for example).

search-condition

The search-condition must be true or unknown for every row of the table, and has the following restrictions:

A column reference must pertain to a column of the table that is being created.
The search condition cannot contain a TYPE predicate.
The search condition cannot contain any of the following items (SQLSTATE 42621):
- Subqueries
- CAST specifications with a SCOPE clause
- Column functions
- Functions that are not deterministic
- Functions that are defined to have an external action
- User-defined functions that are defined with either MODIFIES SQL or READS SQL DATA
- OLAP specifications
- Global variables
- References to generated columns other than the identity column
- An error tolerant nested table expression

Check constraints with search-condition are enforced when rows in the table are inserted or updated. A check constraint that is defined on a table automatically applies to all subtables of that table.

functional-dependency

Defines a functional dependency between columns or sets of columns.

The parent set of columns precedes the DETERMINED BY clause, and the child set of columns follows the DETERMINED BY clause. All restrictions on search-condition apply to both parent set and child set columns, and only simple column references are allowed in the column sets (SQLSTATE 42621). The same column must not be identified more than once in a functional dependency (SQLSTATE 42709). No column in the child set of columns can be a nullable column (SQLSTATE 42621).

A functional dependency is not enforced by the database manager during normal operations such as insert, update, or delete. The functional dependency might be used during query rewrite for optimization. Incorrect results might be returned if the integrity of a functional dependency is not maintained.

ADD data-partition

Specifies a new partition definition. For Datalake tables, a partition is a split of table data into file path "directories" based on the partitioning values in the table. The table must already have one or more partitioning columns specified. All columns that are referenced in the partitioning key must be specified in the new partition value. If you specify a set of partition column values that already exists, an error is returned.

IF NOT EXISTS

Checks whether the specified partition already exists.

PARTITION (column-name=constant-value)

Specifies a column name and a value on which you define the partition. The following data types cannot be used as partitioning columns:

BINARY
BOOLEAN
DOUBLE
FLOAT
REAL
VARBINARY

Important: Although partitioning by TIMESTAMP is supported, it is not recommended. Partitioning on a TIMESTAMP value will result in many small files which can cause performances issues on reads and writes to the table.

Important: A partitioning key value cannot include any trailing blank spaces. This could be a problem if you plan to partition on a CHAR data type, because these values are always padded to their full length with blank spaces. Db2 Big SQL also does not support partitioning on a zero-length string.

DROP constraint

Drops an existing primary key, foreign key, unique, or check constraint from the table.

DROP data-partition

Specifies a partition that is to be dropped.

Note: DROP data-partition is NOT supported for Iceberg tables.

IF EXISTS

Checks whether the specified partition exists.

PARTITION column-name=constant-value

Specifies a column name and a value that identifies the partition.

DELETE DATA

If specified, the partition directory and data files will be removed. This will only be successful if the Datalake table referenced by the ALTER was defined with the external.table.purge TBLPROPERTIES value set to true. Otherwise an error is returned.

If this clause is not specified, the information about the partition will be removed from the local meta data, but the data will remain and the MSCK REPAIR TABLE statement can be used to restore the partition.

If this clause is specified without the purge option, it will only be successful if the trash bin support has been enabled, otherwise an error is returned. For more information regarding trash bin support, see Datalake Table Trash Bin support.

PURGE

When this option is specified as part of the DELETE DATA clause, the data will be permanently removed and cannot be recovered.

DROP partition spec partition-expression

The partition-expression can be an identity expression which references an existing column in the table, or it can be an expression which references an existing column in the table. The only valid partition expressions that are supported by Iceberg Datalake tables are the following:

identity: References an existing column in the table.
truncate[W]: Truncates the column value to the specified width (W) to create the partition value.
year: Extracts a date or timestamp year from the column value to create the partition value.
month: Extracts a date or timestamp month from the column value to create the partition value.
day: Extracts a date or timestamp day from the column value to create the partition value.
hour: Extracts a timestamp hour from the column value to create the partition value.

ALTER constraint

Changes an existing foreign key or check constraint.

constraint-alteration

For details, see constraint-attributes.

ALTER COLUMN column-alteration

Alters the definition of a column. Only the specified attributes are altered; others remain unchanged.

Specifies the name of the column that is to be altered. The column-name must identify an existing column of the table (SQLSTATE 42703). The name must not be qualified. The name must not identify a column that is otherwise being added, altered, or dropped in the same ALTER TABLE statement (SQLSTATE 42711).
SET NOT NULL: Specifies that the column cannot contain null values. No value for this column in existing rows of the table should be null; otherwise, an error might be returned when the data is selected, even if the row with the null value is not included in the final result set. This clause is not allowed if the column is specified in the foreign key of a referential constraint with a DELETE rule of SET NULL, and no other nullable columns exist in the foreign key (SQLSTATE 42831).
DROP NOT NULL: Drops the NOT NULL attribute of the column, thereby enabling the column to have null values. This clause is not allowed if the column is specified in the primary key or in a unique constraint on the table (SQLSTATE 42831).

RENAME COLUMN source-column-name TO target-column-name

Renames the specified column with target-column-name.

RENAME COLUMN must not rename a column that is referenced in the definition of a row permission or a column mask. Also, it must not rename a column for which a column mask is defined (SQLSTATE 42917). If you rename a column that belongs to a table on which a mask or a permission is defined, or to a table that is referenced in the definition of a mask or a permission, that mask or permission is invalidated. Access to a table that activated column access control and a defined invalid mask on it is blocked until the invalid mask is either disabled, dropped, or re-created (SQLSTATE 560D0). Access to a table that activated row access control and defined an invalid row permission on it is blocked until the invalid permission is either disabled, dropped, or re-created (SQLSTATE 560D0).

RENAME TO table-name

Renames the table. You cannot change the schema name when you rename a table.

Note: Not supported for Iceberg tables.

SET TBLPROPERTIES ('string-constant1'='string-constant2')

Defines job properties that can configure input or output formats, and that you can use to specify more detailed information about the table. The properties are passed as is to the underlying InputFormat and OutputFormat classes. Properties that are not implemented by these classes are ignored. These properties are not validated.

SET partition spec partition-expression

identity: Directly references an existing column in the table. For example, if a table has a column named C1, you can include a partition clause as PARTITION BY C1.
truncate[W]: Truncates the column value to the specified width (W) to create the partition value.
year: Extracts a date or timestamp year from the column value to create the partition value.
month: Extracts a date or timestamp month from the column value to create the partition value.
day: Extracts a date or timestamp day from the column value to create the partition value.
hour: Extracts a timestamp hour from the column value to create the partition value.

ACTIVATE ROW ACCESS CONTROL

Activates row level access control on the table.

A default row permission is implicitly created and allows no access to any rows of the table, unless permitted by a row permission that is explicitly created by a user with SECADM authority.

When the table is referenced in a data manipulation statement, all enabled row permissions that were created for the table, including the default row permission, are applied implicitly by the database manager to control the set of rows in the table that are accessible.

The table must not be referenced in the definition of a view if an INSTEAD OF trigger that is defined with the NOT SECURED attribute exists for the view (SQLSTATE 55019).

If a materialized query table references the table, the functions that are referenced in the fullselect of materizalized-query-definition must be defined with the SECURED attribute (SQLSTATE 55019).

If a materialized query table (or a staging table) that depends on the table (directly or indirectly through a view) for which row level access control is being activated and that materialized query table (or a staging table) did not already activate row level access control, row level access control is implicitly activated for the materialized query table (or a staging table). This restricts direct access to the contents of the materialized query table (or a staging table). A query that explicitly references the table before such a row permission is defined returns a warning that no data in the table exists (SQLSTATE 02000). To provide access to the materialized query table (or a staging table), an appropriate row permission can be created, or an ALTER TABLE DEACTIVATE ROW ACCESS CONTROL statement on the materialized query table (or a staging table) can be issued to remove the row level protection if that is appropriate.

ACTIVATE ROW ACCESS CONTROL is ignored if row access control is already defined as activated for the table.

ACTIVATE COLUMN ACCESS CONTROL

Activates column level access control on the table.

The access to the table is not restricted but when the table is referenced in a data manipulation statement, all enabled column masks that were created for the table are applied implicitly by the database manager to mask the values that are returned for the referenced columns in the final result table of the queries.

If a materialized query table references the table, the functions that are referenced in the fullselect of materizalized-query-definition must be defined with the SECURED attribute (SQLSTATE 55019).

The table must not be referenced in the definition of a view if an INSTEAD OF trigger that is defined with the NOT SECURED attribute exists for the view (SQLSTATE 55019). If a materialized query table that depends on the table (directly or indirectly through a view) for which column level access control is being activated and that materialized query table did not already activate row level access control, row level access control is implicitly activated for the materialized query table. This restricts direct access to the contents of the materialized query table. A query that explicitly references the table before such a row permission is defined returns a warning that no data in the table exists (SQLSTATE 02000). To provide access to the materialized query table, an appropriate row permission can be created, or an ALTER TABLE DEACTIVATE ROW ACCESS CONTROL statement on the materialized query table can be issued to remove the row level protection if that is appropriate.

ACTIVATE COLUMN ACCESS CONTROL is ignored if column level access control is already defined as activated for the table.

Usage notes

When altering a Datalake table, only one alter is possible per statement. Altering multiple elements requires multiple ALTER Datalake TABLE statements.
Altering a column of a Datalake table to specify a new data type for the column does not modify the table data, and only the metadata for the table is affected. To change the data, you must update the files in object storage or create a new table and populate that table with data in the new format.
If you alter a table column to specify a new data type for the column, and the length of the new type is smaller than that of the original data type length, table data values might be returned in a truncated form.
Foreign key constraints are possible between Datalake and local database tables.
When the ALTER Datalake TABLE…DROP data-partition statement runs concurrently with a SELECT statement, the SELECT statement might return SQLCODE -5105, and you might see FileNotFoundException in the bigsql.log file on your worker hosts. For more information, see SELECT or INSERT fails when run concurrently with DROP PARTITION and Restrictions and limitations on Datalake tables.
When the ALTER Datalake TABLE…DROP data-partition statement runs concurrently with an INSERT statement, you might see FileNotFoundException in the bigsql-sched.log file on your scheduler host. For more information, see SELECT or INSERT fails when run concurrently with DROP PARTITION and Restrictions and limitations on Datalake tables.
Row access control that is activated explicitly:
- The ACTIVATE ROW ACCESS CONTROL clause is used to activate row access control for a table. When this happens, a default row permission is implicitly created and allows no access to any rows of the table, unless permitted by a row permission that is explicitly created by the security administrator. The default row permission is always enabled.
- When the table is referenced in a data manipulation statement, all enabled row permissions that have been created for the table, including the default row permission, are implicitly applied by the database manager to control which rows in the table are accessible. A row access control search condition is derived by application of the logical OR operator to the search condition in each enabled row permission. This derived search condition acts as a filter to the table before any user specified operations, such as predicates, grouping, ordering, are processed. This derived search condition permits the authorization IDs that are specified in the permission definitions to access certain rows in the table.
- When the ACTIVATE ROW ACCESS CONTROL clause is used, all the packages and dynamically cached statements that reference the table are invalidated.
- Row access control remains enforced until the DEACTIVATE ROW ACCESS CONTROL clause is used to stop enforcing it.
Implicit object that is created when row access control is activated for a table:
- When the ACTIVATE ROW ACCESS CONTROL clause is used to activate row access control for a table, the database manager implicitly creates a default row permission for the table. The default row permission prevents all access to the table. The implicitly created row permission resides in the same schema of the base table and has a name in the form of SYS_DEFAULT_ROW_PERMISSION__table-name ... up to 128 characters. Notice two underscores after "PERMISSION". If this name is not unique, the last 4 characters are reserved for a unique number 'nnnn', where 'nnnn' is a four-alphanumeric-character string that start at '0000' and is incremented by one value each time until a unique name is found.
- The owner of the default row permission is SYSIBM. The default row permission is always enabled. The default row permission is dropped when row access control is deactivated or when the table is dropped.
Activating column access control:
- The ACTIVATE COLUMN ACCESS CONTROL clause is used to activate column level access control for a table. The access to the table is not restricted but when the table is referenced in a data manipulation statement, all enabled column masks that were created for the table are applied to mask the column values referenced in the final result table.
- When column masks are used to mask the column values, they determine the values in the final result table. If a column has a column mask and the column (specifically a simple reference to a column name or a column that is embedded in an expression) appears in the outermost select list, the column mask is applied to the column to produce the values for the final result table. If the column does not appear in the outermost select list but it participates in the final result table, for example, it appears in a materialized table expression or view, the column mask is applied to the column in such a way that the masked value is included in the result table of the materialized table expression or view so that it can be used in the final result table.
- The application of column masks does not interfere with the operations of other clauses within the statement such as the WHERE, GROUP BY, HAVING, SELECT DISTINCT, and ORDER BY. The rows that are returned in the final result table remain the same, except that the values in the resulting rows might have been masked by the column masks. As such, if the masked column also appears in an ORDER BY sort-key, the order is based on the original column values and the masked values in the final result table might not reflect that order. Similarly, the masked values might not reflect the uniqueness that is enforced by SELECT DISTINCT.
- A column mask is applied in the following contexts:
  - The outermost SELECT clause or clauses of a SELECT or SELECT INTO statement, or if the column does not appear in the outermost select list but it participates in the final result table, one or more outermost SELECT clauses of the corresponding materialized table expression or view where the column appears.
  - The outermost SELECT clause or clauses of a SELECT FROM INSERT, SELECT FROM UPDATE, or SELECT FROM DELETE operation.
  - The outermost SELECT clause or clauses that are used to derive the new values for an INSERT, UPDATE, or MERGE statement, or a SET transition-variable-name assignment statement. The same masking applies to a scalar fullselect expression that appears in the outermost SELECT clause or clauses of the previously mentioned statements, the right side of a SET host-variable assignment statement, the VALUES INTO statement, or the VALUES statement.
- Column masks are not applied when the masked column appears in the following contexts:
  - WHERE clauses
  - GROUP BY clauses
  - HAVING clauses
  - SELECT DISTINCT
  - ORDER BY clauses
Stop enforcing row or column access control:
The DEACTIVATE ROW ACCESS CONTROL clause is used to stop enforcing row access control for a table. The default row permission is dropped. Thereafter, when the table is referenced in a data manipulation statement, explicitly created row permissions are not applied.

The DEACTIVATE COLUMN ACCESS CONTROL clause is used to stop enforcing column access control for a table. Thereafter, when the table is referenced in a data manipulation statement, the column masks are not applied.

The explicitly created row permissions or column masks, if any, remain but have no effect.

All the packages and dynamically cached statements that reference the table are invalidated when row or column access control is deactivated.

Secure user-defined functions for row and column access control:
If a row permission or column mask definition references a user-defined function, the function must be altered with the SECURED option because the sensitive data might be passed as arguments to the function. When a user-defined function is referenced in a data manipulation statement where a table that enforces row or column access control is referenced, and the function arguments reference the columns from such a table, if the function is not secure, this impacts the access plan selection and might yield poor performance.

The database manager considers the SECURED option an assertion that declares the user has established a change control audit procedure for all changes to the user-defined function. It is assumed that such a control audit procedure is in place and that all subsequent ALTER FUNCTION statements or changes to external packages are being reviewed by this audit process.

Restrictions

You can run the ALTER DATALAKE TABLE statement only as a top-level statement. It cannot be run from routines, compound blocks, or prepared statements.
Constraints that are defined on Datalake tables cannot be enforced (SQLSTATE 42858).
You cannot alter a partitioning column in a Datalake table (SQLSTATE 42858).
Additional restrictions apply to externally managed Datalake tables. For more information, see Restrictions and limitations on Datalake tables. For information on externally managed tables, see Terminology.

Examples

To add a column named price to an existing Datalake table named items, run the following command:

ALTER DATALAKE TABLE items ADD COLUMN price INTEGER

To rename a column named price to cost, run the following command:

ALTER DATALAKE TABLE items RENAME COLUMN price TO cost

To rename a Datalake table named items to Product, run the following command:

ALTER DATALAKE TABLE items RENAME TO Product

To add a unique constraint to a column named item_code so that no two entries are the same in a table named items, run the following command:

ALTER DATALAKE TABLE items ADD UNIQUE (item_code)

Suppose that item_code is the primary key of a table named items, to drop the primary key constraint from the item_code column, run the following command:

ALTER DATALAKE TABLE items DROP PRIMARY KEY

To add a constraint called valid_price to the column price in a table named items, run the following command:

ALTER DATALAKE TABLE items ADD CONSTRAINT valid_price CHECK (price>0)

To add a referential constraint to the order table's item_code column so that it references the items table column named item_code, run the following command:

ALTER DATALAKE TABLE order ADD FOREIGN KEY (item_code) REFERENCES items(item_code)