ALTER DATALAKE TABLE statement
The ALTER DATALAKE TABLE statement alters the definition of a data table.
Invocation
This statement can only be executed as a dynamic statement using EXECUTE IMMEDIATE. It is an executable statement that can be dynamically prepared only if DYNAMICRULES run behavior is in effect for the package (SQLSTATE 42509).
Authorization
- ALTER privilege on the table to be altered
- CONTROL privilege on the table to be altered
- ALTERIN privilege on the schema of the table
- SCHEMAADM authority on the schema of the table
- DBADM authority
To alter a table to ACTIVATE and DEACTIVATE row and column access control, the privileges that are held by the authorization ID of the statement must include the SECADM authority.
Syntax
- 1 Only applicable to Hive Datalake tables.
- 2 Only applicable to Hive Datalake tables.
- 3 Only applicable to Iceberg tables.
- 4 Not supported for Iceberg tables.
- 5 Only applicable to Iceberg tables.
- 6 Although the ENFORCED keyword is available for Db2 local tables, an error is returned if specified for a Datalake table.
- 1 Although the ENFORCED keyword is available for Db2 local tables, an error is returned if specified for a Datalake table.
- 2 If an ACTIVATE or DEACTIVATE clause is specified for row access control, no other clause except ACTIVATE or DEACTIVATE column access control can be specified in the same ALTER DATALAKE TABLE statement (SQLSTATE 42613).
- 3 If an ACTIVATE or DEACTIVATE clause is specified for column access control, no other clause except ACTIVATE or DEACTIVATE row access control can be specified in the same ALTER DATALAKE TABLE statement (SQLSTATE 42613).
Description
-
table-name
- The table-name must identify an existing Datalake table.
- ADD column-definition
- Defines the attributes of a new column. The column-name cannot be qualified, and the same name cannot be used for more than one column of the table. For a description of the valid data types, see data-types.
- ADD constraint
- Constraints are not enforced, but are used to help optimize query performance.
-
unique-constraint
- Defines a unique key. This constraint does not allow duplicate values in one or more columns of the table. A table can have multiple unique keys. The columns that are specified in a unique constraint must be defined as NOT NULL. referential-constraint
- Defines a referential constraint.
- CONSTRAINT constraint-name
- Specifies a name for the referential constraint.
- FOREIGN KEY column-name, )
- The foreign key of the referential constraint is composed of the identified columns. Each name in the list of column names must identify a column of the parent table, and the same column must not be identified more than once. The number of foreign key columns must be the same as the number of parent key columns, and the data types of the corresponding columns must be compatible (SQLSTATE 42830).
check-constraint
- Defines a check constraint.
-
constraint-name
- Specifies a name for the check constraint.
- CHECK check-condition
- Sets restrictions on data that is added to a specific table. This parameter is enforced during
INSERT operations or on data that is made available outside of the database manager (by using Hive
operations, for example).
-
search-condition
- The search-condition must be true or unknown for every row of the table, and
has the following restrictions:
- A column reference must pertain to a column of the table that is being created.
- The search condition cannot contain a TYPE predicate.
- The search condition cannot contain any of the following items (SQLSTATE 42621):
- Subqueries
- CAST specifications with a SCOPE clause
- Column functions
- Functions that are not deterministic
- Functions that are defined to have an external action
- User-defined functions that are defined with either MODIFIES SQL or READS SQL DATA
- OLAP specifications
- Global variables
- References to generated columns other than the identity column
- An error tolerant nested table expression
Check constraints with search-condition are enforced when rows in the table are inserted or updated. A check constraint that is defined on a table automatically applies to all subtables of that table.
functional-dependency
- Defines a functional dependency between columns or sets of columns.
The parent set of columns precedes the DETERMINED BY clause, and the child set of columns follows the DETERMINED BY clause. All restrictions on search-condition apply to both parent set and child set columns, and only simple column references are allowed in the column sets (SQLSTATE 42621). The same column must not be identified more than once in a functional dependency (SQLSTATE 42709). No column in the child set of columns can be a nullable column (SQLSTATE 42621).
A functional dependency is not enforced by the database manager during normal operations such as insert, update, or delete. The functional dependency might be used during query rewrite for optimization. Incorrect results might be returned if the integrity of a functional dependency is not maintained.
- The search-condition must be true or unknown for every row of the table, and
has the following restrictions:
- ADD data-partition
- Specifies a new partition definition. For Datalake tables, a partition is a
split of table data into file path "directories" based on the partitioning values in the table. The
table must already have one or more partitioning columns specified. All columns that are referenced
in the partitioning key must be specified in the new partition value. If you specify a set of
partition column values that already exists, an error is returned.
- IF NOT EXISTS
- Checks whether the specified partition already exists.
- PARTITION (column-name=constant-value)
- Specifies a column name and a value on which you define the partition. The following data types
cannot be used as partitioning columns:
- BINARY
- BOOLEAN
- DOUBLE
- FLOAT
- REAL
- VARBINARY
Important: Although partitioning by TIMESTAMP is supported, it is not recommended. Partitioning on a TIMESTAMP value will result in many small files which can cause performances issues on reads and writes to the table.Important: A partitioning key value cannot include any trailing blank spaces. This could be a problem if you plan to partition on a CHAR data type, because these values are always padded to their full length with blank spaces. Db2 Big SQL also does not support partitioning on a zero-length string.
- DROP constraint
- Drops an existing primary key, foreign key, unique, or check constraint from the table.
- DROP data-partition
- Specifies a partition that is to be dropped.Note: DROP data-partition is NOT supported for Iceberg tables.
- IF EXISTS
- Checks whether the specified partition exists.
- PARTITION column-name=constant-value
- Specifies a column name and a value that identifies the partition.
- DELETE DATA
-
If specified, the partition directory and data files will be removed. This will only be successful if the Datalake table referenced by the ALTER was defined with the external.table.purge TBLPROPERTIES value set to true. Otherwise an error is returned.
If this clause is not specified, the information about the partition will be removed from the local meta data, but the data will remain and the MSCK REPAIR TABLE statement can be used to restore the partition.
If this clause is specified without the purge option, it will only be successful if the trash bin support has been enabled, otherwise an error is returned. For more information regarding trash bin support, see Datalake Table Trash Bin support.
- PURGE
-
When this option is specified as part of the DELETE DATA clause, the data will be permanently removed and cannot be recovered.
- DROP partition spec partition-expression
- The partition-expression can be an identity expression which references an existing
column in the table, or it can be an expression which references an existing column in the table.
The only valid partition expressions that are supported by Iceberg Datalake tables are the following:
- identity
- References an existing column in the table.
- truncate[W]
- Truncates the column value to the specified width (W) to create the partition value.
- year
- Extracts a date or timestamp year from the column value to create the partition value.
- month
- Extracts a date or timestamp month from the column value to create the partition value.
- day
- Extracts a date or timestamp day from the column value to create the partition value.
- hour
- Extracts a timestamp hour from the column value to create the partition value.
- ALTER constraint
- Changes an existing foreign key or check constraint.
-
constraint-alteration
- For details, see constraint-attributes.
- ALTER COLUMN column-alteration
- Alters the definition of a column. Only the specified attributes are altered; others remain unchanged.
-
column-name
- Specifies the name of the column that is to be altered. The column-name must identify an existing column of the table (SQLSTATE 42703). The name must not be qualified. The name must not identify a column that is otherwise being added, altered, or dropped in the same ALTER TABLE statement (SQLSTATE 42711).
- SET NOT NULL
- Specifies that the column cannot contain null values. No value for this column in existing rows of the table should be null; otherwise, an error might be returned when the data is selected, even if the row with the null value is not included in the final result set. This clause is not allowed if the column is specified in the foreign key of a referential constraint with a DELETE rule of SET NULL, and no other nullable columns exist in the foreign key (SQLSTATE 42831).
- DROP NOT NULL
- Drops the NOT NULL attribute of the column, thereby enabling the column to have null values. This clause is not allowed if the column is specified in the primary key or in a unique constraint on the table (SQLSTATE 42831).
- RENAME COLUMN source-column-name TO target-column-name
- Renames the specified column with target-column-name.
RENAME COLUMN must not rename a column that is referenced in the definition of a row permission or a column mask. Also, it must not rename a column for which a column mask is defined (SQLSTATE 42917). If you rename a column that belongs to a table on which a mask or a permission is defined, or to a table that is referenced in the definition of a mask or a permission, that mask or permission is invalidated. Access to a table that activated column access control and a defined invalid mask on it is blocked until the invalid mask is either disabled, dropped, or re-created (SQLSTATE 560D0). Access to a table that activated row access control and defined an invalid row permission on it is blocked until the invalid permission is either disabled, dropped, or re-created (SQLSTATE 560D0).
- RENAME TO table-name
- Renames the table. You cannot change the schema name when you rename a table.Note: Not supported for Iceberg tables.
- SET TBLPROPERTIES ('string-constant1'='string-constant2')
- Defines job properties that can configure input or output formats, and that you can use to
specify more detailed information about the table. The properties are passed
as is
to the underlying InputFormat and OutputFormat classes. Properties that are not implemented by these classes are ignored. These properties are not validated. - SET partition spec partition-expression
- The partition-expression can be an identity expression which references an existing
column in the table, or it can be an expression which references an existing column in the table.
The only valid partition expressions that are supported by Iceberg Datalake tables are the following:
- identity
- Directly references an existing column in the table. For example, if a table has a column named
C1, you can include a partition clause as
PARTITION BY C1. - truncate[W]
- Truncates the column value to the specified width (W) to create the partition value.
- year
- Extracts a date or timestamp year from the column value to create the partition value.
- month
- Extracts a date or timestamp month from the column value to create the partition value.
- day
- Extracts a date or timestamp day from the column value to create the partition value.
- hour
- Extracts a timestamp hour from the column value to create the partition value.
- ACTIVATE ROW ACCESS CONTROL
-
Activates row level access control on the table.
A default row permission is implicitly created and allows no access to any rows of the table, unless permitted by a row permission that is explicitly created by a user with SECADM authority.
When the table is referenced in a data manipulation statement, all enabled row permissions that were created for the table, including the default row permission, are applied implicitly by the database manager to control the set of rows in the table that are accessible.
The table must not be referenced in the definition of a view if an INSTEAD OF trigger that is defined with the NOT SECURED attribute exists for the view (SQLSTATE 55019).
If a materialized query table references the table, the functions that are referenced in the fullselect of materizalized-query-definition must be defined with the SECURED attribute (SQLSTATE 55019).
If a materialized query table (or a staging table) that depends on the table (directly or indirectly through a view) for which row level access control is being activated and that materialized query table (or a staging table) did not already activate row level access control, row level access control is implicitly activated for the materialized query table (or a staging table). This restricts direct access to the contents of the materialized query table (or a staging table). A query that explicitly references the table before such a row permission is defined returns a warning that no data in the table exists (SQLSTATE 02000). To provide access to the materialized query table (or a staging table), an appropriate row permission can be created, or an ALTER TABLE DEACTIVATE ROW ACCESS CONTROL statement on the materialized query table (or a staging table) can be issued to remove the row level protection if that is appropriate.
ACTIVATE ROW ACCESS CONTROL is ignored if row access control is already defined as activated for the table.
- ACTIVATE COLUMN ACCESS CONTROL
-
Activates column level access control on the table.
The access to the table is not restricted but when the table is referenced in a data manipulation statement, all enabled column masks that were created for the table are applied implicitly by the database manager to mask the values that are returned for the referenced columns in the final result table of the queries.
If a materialized query table references the table, the functions that are referenced in the fullselect of materizalized-query-definition must be defined with the SECURED attribute (SQLSTATE 55019).
The table must not be referenced in the definition of a view if an INSTEAD OF trigger that is defined with the NOT SECURED attribute exists for the view (SQLSTATE 55019). If a materialized query table that depends on the table (directly or indirectly through a view) for which column level access control is being activated and that materialized query table did not already activate row level access control, row level access control is implicitly activated for the materialized query table. This restricts direct access to the contents of the materialized query table. A query that explicitly references the table before such a row permission is defined returns a warning that no data in the table exists (SQLSTATE 02000). To provide access to the materialized query table, an appropriate row permission can be created, or an ALTER TABLE DEACTIVATE ROW ACCESS CONTROL statement on the materialized query table can be issued to remove the row level protection if that is appropriate.
ACTIVATE COLUMN ACCESS CONTROL is ignored if column level access control is already defined as activated for the table.
Usage notes
-
When altering a Datalake table, only one alter is possible per statement. Altering multiple elements requires multiple ALTER Datalake TABLE statements.
- Altering a column of a Datalake table to specify a new data type for the column does not modify the table data, and only the metadata for the table is affected. To change the data, you must update the files in object storage or create a new table and populate that table with data in the new format.
- If you alter a table column to specify a new data type for the column, and the length of the new type is smaller than that of the original data type length, table data values might be returned in a truncated form.
- Foreign key constraints are possible between Datalake and local database tables.
- When the ALTER Datalake TABLE…DROP data-partition statement runs concurrently with a SELECT statement, the SELECT statement might return SQLCODE -5105, and you might see FileNotFoundException in the bigsql.log file on your worker hosts. For more information, see SELECT or INSERT fails when run concurrently with DROP PARTITION and Restrictions and limitations on Datalake tables.
- When the ALTER Datalake TABLE…DROP data-partition statement runs concurrently with an INSERT statement, you might see FileNotFoundException in the bigsql-sched.log file on your scheduler host. For more information, see SELECT or INSERT fails when run concurrently with DROP PARTITION and Restrictions and limitations on Datalake tables.
- Row access control that is activated explicitly:
- The ACTIVATE ROW ACCESS CONTROL clause is used to activate row access control for a table. When this happens, a default row permission is implicitly created and allows no access to any rows of the table, unless permitted by a row permission that is explicitly created by the security administrator. The default row permission is always enabled.
- When the table is referenced in a data manipulation statement, all enabled row permissions that have been created for the table, including the default row permission, are implicitly applied by the database manager to control which rows in the table are accessible. A row access control search condition is derived by application of the logical OR operator to the search condition in each enabled row permission. This derived search condition acts as a filter to the table before any user specified operations, such as predicates, grouping, ordering, are processed. This derived search condition permits the authorization IDs that are specified in the permission definitions to access certain rows in the table.
- When the ACTIVATE ROW ACCESS CONTROL clause is used, all the packages and dynamically cached statements that reference the table are invalidated.
- Row access control remains enforced until the DEACTIVATE ROW ACCESS CONTROL clause is used to stop enforcing it.
- Implicit object that is created when row access control is activated for a table:
- When the ACTIVATE ROW ACCESS CONTROL clause is used to activate row access control for a table, the database manager implicitly creates a default row permission for the table. The default row permission prevents all access to the table. The implicitly created row permission resides in the same schema of the base table and has a name in the form of SYS_DEFAULT_ROW_PERMISSION__table-name ... up to 128 characters. Notice two underscores after "PERMISSION". If this name is not unique, the last 4 characters are reserved for a unique number 'nnnn', where 'nnnn' is a four-alphanumeric-character string that start at '0000' and is incremented by one value each time until a unique name is found.
- The owner of the default row permission is SYSIBM. The default row permission is always enabled. The default row permission is dropped when row access control is deactivated or when the table is dropped.
- Activating column access control:
- The ACTIVATE COLUMN ACCESS CONTROL clause is used to activate column level access control for a table. The access to the table is not restricted but when the table is referenced in a data manipulation statement, all enabled column masks that were created for the table are applied to mask the column values referenced in the final result table.
- When column masks are used to mask the column values, they determine the values in the final result table. If a column has a column mask and the column (specifically a simple reference to a column name or a column that is embedded in an expression) appears in the outermost select list, the column mask is applied to the column to produce the values for the final result table. If the column does not appear in the outermost select list but it participates in the final result table, for example, it appears in a materialized table expression or view, the column mask is applied to the column in such a way that the masked value is included in the result table of the materialized table expression or view so that it can be used in the final result table.
- The application of column masks does not interfere with the operations of other clauses within the statement such as the WHERE, GROUP BY, HAVING, SELECT DISTINCT, and ORDER BY. The rows that are returned in the final result table remain the same, except that the values in the resulting rows might have been masked by the column masks. As such, if the masked column also appears in an ORDER BY sort-key, the order is based on the original column values and the masked values in the final result table might not reflect that order. Similarly, the masked values might not reflect the uniqueness that is enforced by SELECT DISTINCT.
- A column mask is applied in the following contexts:
- The outermost SELECT clause or clauses of a SELECT or SELECT INTO statement, or if the column does not appear in the outermost select list but it participates in the final result table, one or more outermost SELECT clauses of the corresponding materialized table expression or view where the column appears.
- The outermost SELECT clause or clauses of a SELECT FROM INSERT, SELECT FROM UPDATE, or SELECT FROM DELETE operation.
- The outermost SELECT clause or clauses that are used to derive the new values for an INSERT, UPDATE, or MERGE statement, or a SET transition-variable-name assignment statement. The same masking applies to a scalar fullselect expression that appears in the outermost SELECT clause or clauses of the previously mentioned statements, the right side of a SET host-variable assignment statement, the VALUES INTO statement, or the VALUES statement.
- Column masks are not applied when the masked column appears in the following contexts:
- WHERE clauses
- GROUP BY clauses
- HAVING clauses
- SELECT DISTINCT
- ORDER BY clauses
- Stop enforcing row or column access control:
The DEACTIVATE ROW ACCESS CONTROL clause is used to stop enforcing row access control for a table. The default row permission is dropped. Thereafter, when the table is referenced in a data manipulation statement, explicitly created row permissions are not applied.
The DEACTIVATE COLUMN ACCESS CONTROL clause is used to stop enforcing column access control for a table. Thereafter, when the table is referenced in a data manipulation statement, the column masks are not applied.
The explicitly created row permissions or column masks, if any, remain but have no effect.
All the packages and dynamically cached statements that reference the table are invalidated when row or column access control is deactivated.
- Secure user-defined functions for row and column access control:
If a row permission or column mask definition references a user-defined function, the function must be altered with the SECURED option because the sensitive data might be passed as arguments to the function. When a user-defined function is referenced in a data manipulation statement where a table that enforces row or column access control is referenced, and the function arguments reference the columns from such a table, if the function is not secure, this impacts the access plan selection and might yield poor performance.
The database manager considers the SECURED option an assertion that declares the user has established a change control audit procedure for all changes to the user-defined function. It is assumed that such a control audit procedure is in place and that all subsequent ALTER FUNCTION statements or changes to external packages are being reviewed by this audit process.
Restrictions
- You can run the ALTER DATALAKE TABLE statement only as a top-level statement. It cannot be run from routines, compound blocks, or prepared statements.
- Constraints that are defined on Datalake tables cannot be enforced (SQLSTATE 42858).
- You cannot alter a partitioning column in a Datalake table (SQLSTATE 42858).
- Additional restrictions apply to externally managed Datalake tables. For more information, see Restrictions and limitations on Datalake tables. For information on externally managed tables, see Terminology.
Examples
price to an existing Datalake table named
items, run the following
command:ALTER DATALAKE TABLE items ADD COLUMN price INTEGERprice to cost, run the following
command:ALTER DATALAKE TABLE items RENAME COLUMN price TO costitems to Product, run the
following
command:ALTER DATALAKE TABLE items RENAME TO Productitem_code so that no two entries
are the same in a table named items, run the following
command:ALTER DATALAKE TABLE items ADD UNIQUE (item_code)item_code is the primary key of a table named
items, to drop the primary key constraint from the item_code
column, run the following
command:ALTER DATALAKE TABLE items DROP PRIMARY KEYvalid_price to the column price in a
table named items, run the following
command:ALTER DATALAKE TABLE items ADD CONSTRAINT valid_price CHECK (price>0)order table's item_code
column so that it references the items table column named
item_code, run the following
command:ALTER DATALAKE TABLE order ADD FOREIGN KEY (item_code) REFERENCES items(item_code)