ALTER TABLE (HADOOP/HBASE) statement

The ALTER TABLE (HADOOP/HBASE) statement alters the definition of a Hadoop or HBase table.

Invocation

This statement can be embedded in an application program or issued by using dynamic SQL statements. It is an executable statement that can be dynamically prepared only if DYNAMICRULES run behavior is in effect for the package (SQLSTATE 42509).

Authorization

The privileges that are held by the authorization ID of the statement must include at least one of the following authorities:
  • ALTER privilege on the table to be altered
  • CONTROL privilege on the table to be altered
  • ALTERIN privilege on the schema of the table
  • SCHEMAADM authority on the schema of the table
  • DBADM authority
To use the LOCATION clause, the authorization ID of the statement must have one of the following privileges:
  • DATAACCESS authority
  • The following privileges on the files and the directory in the HDFS:
    • READ and WRITE privileges on the directory and all files in the directory, including files in any subdirectories that might be recursively specified in the LOCATION clause
    • EXECUTE privileges on all the directories and subdirectories in the specified LOCATION clause
If impersonation is not enabled, the bigsql user needs the same permissions that the Hive user needs for this operation. For example, with impersonation enabled in Hive, the bigsql user must have READ, WRITE, and EXECUTE privileges on the directory that is specified in the LOCATION clause, and READ and EXECUTE privileges on the parent directory.
Important: The LOCATION clause is not applicable to HBase tables.

Syntax

Read syntax diagramSkip visual syntax diagram ALTER TABLE table-name ADDCOLUMNcolumn-definitionhbase-column-mapping1unique-constraintreferential-constraintcheck-constraintIF NOT EXISTS,partition-specLOCATIONlocationDROPPRIMARY KEYFOREIGN KEYUNIQUECHECKCONSTRAINTconstraint-nameIF EXISTS,partition-specPURGEALTERFOREIGN KEYCHECKconstraint-nameconstraint-alterationCOLUMNcolumn-alterationACTIVATEDEACTIVATEROW ACCESS CONTROLACTIVATEDEACTIVATECOLUMN ACCESS CONTROLRENAMECOLUMNsource-column-nameTOtarget-column-nameTOtable-nameSETTBLPROPERTIES(,string-constant1=string-constant2)SERDEserde-classWITHserde-propertiesserde-propertiesTABLE OPTIONS(,table-opt-name=table-opt-value)COLUMN FAMILY OPTIONS FOR(,hbase-cfamilyhbase-cfamily-opts)LOCATIONlocationFILEFORMATfile-formatCHANGECOLUMNcolumn-namecolumn-definitionTOUCHpartition-specpartition-specSET LOCATIONlocationRENAME TOpartition-specpartition-specCOMPACTMAJORMINORAND WAITWITH OVERWRITE TBLPROPERTIES(,string-constant1=string-constant2)EXCHANGE PARTITION2(partition-spec)WITH TABLEsource-table-name
column-definition
Read syntax diagramSkip visual syntax diagram column-name data-type column-options
data-type
Read syntax diagramSkip visual syntax diagramprimitive-typearray-typerow-type
primitive-type
Read syntax diagramSkip visual syntax diagramTINYINTSMALLINTINT2INTEGERINTINT4BIGINTINT8DECIMALDECNUMERICNUM(10,0)( integer,0, integer)FLOATFLOAT8REALFLOAT4DOUBLEPRECISIONFLOAT8CHARACTER(integer)CHARSTRINGVARCHAR(integer)CHARACTER VARYINGCHARTIMESTAMPDATETIME(integer)DATEBINARY(integer)VARBINARY(integer)
array-type
Read syntax diagramSkip visual syntax diagram primitive-typerow-typeARRAY [integer-constant][INTINTEGERVARCHAR(int)]
row-type
Read syntax diagramSkip visual syntax diagram ROW ( field-namedata-type )
column-options
Read syntax diagramSkip visual syntax diagramCOMMENTcolumn-commentNULLNOT NULLCONSTRAINTconstraint-nameconstraint-choicesconstraint-attributes
constraint-choices
Read syntax diagramSkip visual syntax diagramPRIMARY KEYUNIQUEreferences-clause
references-clause
Read syntax diagramSkip visual syntax diagram REFERENCES table-name (,column-name)
constraint-attributes
Read syntax diagramSkip visual syntax diagram NOT ENFORCEDTRUSTEDNOT TRUSTEDENFORCED ENABLE QUERY OPTIMIZATIONDISABLE QUERY OPTIMIZATION
hbase-column-mapping
Read syntax diagramSkip visual syntax diagram UPDATE HBASE COLUMN MAPPING APPENDcfamily:cnameKEYADDcfamily:cnamehbase-cfamily-opts
hbase-cfamily-opts
Read syntax diagramSkip visual syntax diagramCOMPRESSIONBLOOM FILTERIN MEMORYNO IN MEMORYBLOCKCACHEBLOCKSIZEDATA_BLOCK_ENCODINGKEEP_DELETED_CELLSVERSIONSMIN_VERSIONSREPLICATION_SCOPETTL(hbase-cfamily-option-value)
unique-constraint
Read syntax diagramSkip visual syntax diagramCONSTRAINTconstraint-nameUNIQUEPRIMARY KEY ( ,column-name ) constraint-attributes
referential-constraint
Read syntax diagramSkip visual syntax diagramCONSTRAINTconstraint-name FOREIGN KEY ( ,column-name ) references-clauseconstraint-attributes
check-constraint
Read syntax diagramSkip visual syntax diagramCONSTRAINTconstraint-name CHECK ( check-condition )
check-condition
Read syntax diagramSkip visual syntax diagramsearch-conditionfunctional-dependency
functional-dependency
Read syntax diagramSkip visual syntax diagramcolumn-name(,column-name) DETERMINED BY column-name(,column-name)
partition-spec
Read syntax diagramSkip visual syntax diagram PARTITION ( ,column-name=constant-value )
constraint-alteration
Read syntax diagramSkip visual syntax diagramENABLE QUERY OPTIMIZATIONDISABLE QUERY OPTIMIZATIONNOT ENFORCEDTRUSTEDNOT TRUSTEDENFORCED
column-alteration
Read syntax diagramSkip visual syntax diagram column-name SETDATA TYPEdata-typeNOT NULLDROPNOT NULL
serde-properties
Read syntax diagramSkip visual syntax diagram SERDEPROPERTIES ( ,string-constant1=string-constant2 )
file-format
Read syntax diagramSkip visual syntax diagramTEXTFILESEQUENCEFILEBINARY SEQUENCEFILETEXT SEQUENCEFILERCFILEORCPARQUETFILEPARQUETcustom-file-format
custom-file-format
Read syntax diagramSkip visual syntax diagram INPUTFORMAT class-name OUTPUTFORMAT class-name INPUTDRIVERclass-nameOUTPUTDRIVERclass-name
Notes:
  • 1 When adding a column to an HBase table, hbase-column-mapping is required.
  • 2 Db2 Big SQL 7.2 service This feature requires the Db2 Big SQL 7.2 service on IBM Cloud Pak® for Data 4.0.

Description

table-name
The table-name must identify an existing Hadoop or HBase table.
ADD column-definition
Defines the attributes of a new column. The column-name cannot be qualified, and the same name cannot be used for more than one column of the table. For a description of the valid data types, see data-type.
Important: When you add columns to an HBase table, you must specify the UPDATE HBASE COLUMN MAPPING clause.
column-options
You can define column options including a comment, the nullability attribute, or constraints.
COMMENT column-comment
A comment that provides additional information about the column.
NULL | NOT NULL
NOT NULL specifies that the column will not contain null values. Because the Hadoop table is external to any local database server, Db2 Big SQL enforces nullability at query execution time (SQL5104N). The nullability attribute of a column is enforced during an INSERT or LOAD USING operation. Do not assign a null value to a column that is defined as NOT NULL (SQLSTATE 23502).
CONSTRAINT constraint-name
Specifies the name of a constraint. You can define the following constraints on a column:
PRIMARY KEY
A primary key must be unique and must have the NOT NULL attribute. A table cannot have more than one primary key.
UNIQUE
A unique key prevents duplicate values in one or more NOT NULL columns of a table. A table can have multiple unique keys.
REFERENCES table-name
Ensures that required table relationships are maintained and that data entry rules are followed. The column on which this parameter is specified becomes the foreign key to the referenced parent table table-name.
constraint-attributes
Defines attributes that are associated with primary key, unique, referential integrity, or check constraints.
NOT ENFORCED | ENFORCED
Specifies whether the constraint is enforced by the database manager during normal operations such as insert, update, or delete.
NOT ENFORCED
Specifies that the constraint is not enforced by the database manager. This is the default for Hadoop tables. A primary key or unique constraint cannot be NOT ENFORCED if there is a dependent ENFORCED referential constraint.
TRUSTED | NOT TRUSTED
Specifies whether the data can be trusted to conform to the constraint.
TRUSTED
Specifies that the data can be trusted to conform to the constraint. This is the default. Use this option only if the table data is independently known to conform to the constraint; otherwise, unpredictable results might occur.
NOT TRUSTED
Specifies that the data cannot be trusted to conform to the constraint. This option is valid only for referential integrity constraints (SQLSTATE 42613). Use this option when the data conforms to the constraint for most rows, but it is not independently known whether all current and future rows will conform. If a constraint is both not trusted and enabled for query optimization, it will not be used for any optimization that depends on the data being in complete conformity with the constraint.
ENFORCED
Although the ENFORCED keyword is available for compatibility with the syntax for local tables that are created with the CREATE TABLE statement, an error is returned when you specify this option for Hadoop tables (SQLSTATE 42858).
ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION
Specifies whether the constraint or functional dependency can be used for query optimization under appropriate circumstances.
ENABLE QUERY OPTIMIZATION
Specifies that the constraint is assumed to be true and can be used for query optimization. This is the default.
DISABLE QUERY OPTIMIZATION
Specifies that the constraint cannot be used for query optimization. This option cannot be specified for primary key and unique constraints (SQLSTATE 42613).
hbase-column-mapping
Specifies how the column that is being added to the table should be mapped to an HBase column.
UPDATE HBASE COLUMN MAPPING
A column can be appended to an existing HBase column, or mapped to a new HBase column.
APPEND
Specifies that the new column is to be appended to an existing HBase column or to the HBase row key.
cfamily:cname
The existing HBase column to which the new column is to be appended is specified by its name and by the name of the column family to which it belongs. The values are case sensitive: cf:cq is not the same as cf:CQ, for example.
KEY
Specifies that the new column is to be appended to the HBase row key.
ADD
Specifies that the new column is to be mapped to a new HBase column that is not being referenced by an existing column mapping.
cfamily:cname
The new HBase column to which the new column is to be mapped is specified by its name and by the name of the column family to which it belongs. The values are case sensitive: cf:cq is not the same as cf:CQ, for example.
hbase-cfamily-opts
Specifies the HBase configuration options that are applied to a specific column family. You specify the options in the form of option name/value pairs. In the case of a Boolean option, you can omit the option value to specify true.
COMPRESSION
Specifies a compression algorithm for the column family. Valid values are NONE (default), GZ (GZIP), SNAPPY, and LZ4.
BLOOM FILTER
Bloom filters help to identify whether a specific row and column combination exists in a block without having to load it. Bloom filters use extra space but can improve lookup times and help to reduce block cache churn by loading only the blocks that are required.
NONE
Specifies that no bloom filter is maintained. This is the default.
ROW
Specifies that the hash of the row is added to the bloom filter on each key insert. Use this option when queries have predicates on row keys.
ROWCOL
Specifies that the hash of the row plus column family plus column family qualifier is added to the bloom filter on each key insert. Use this option when queries have column projections.
IN MEMORY
Specifies whether data is to be cached for as long as possible to improve performance. Valid values are true and false (default).
NO IN MEMORY
This clause is retained for compatibility with prior releases. Use the IN MEMORY clause instead.
BLOCKCACHE
Specifies whether storage memory blocks are to be cached. Block caching can improve read performance. Valid values are true and false. If block caching is enabled and some blocks have to be evicted to free up space, the column family blocks for whom IN MEMORY is set to true are the last blocks to be evicted.
BLOCKSIZE
Specifies the memory block size (in bytes). This value is used to control how much data HBase is required to read from storage files during retrieval and what is cached in memory for subsequent access. The default is 64 KB.
DATA_BLOCK_ENCODING
Specifies in-cache key compression, which enables you to take advantage of sorted key redundancy in an HFile block. You can choose to store only the differences between consecutive keys.
KEEP_DELETED_CELLS
Specifies that deleted cells can be retrieved with a GET or SCAN operation, as long as these operations have a specified time range that ends before the deletion timestamp. This allows for point-in-time queries even in the presence of delete operations. Valid values are true and false. The default is false.
VERSIONS
An integer value that specifies the number of different versions of column values to store in the HBase table. The default is 1. Db2 Big SQL reads only the latest version of a column value.
MIN_VERSIONS
Specifies the minimum number of row versions to keep for each column family. The default is 0, which disables the option. If the TTL clause is set to a specific value, the data is deleted after the TTL value is passed. You can use the MIN_VERSIONS clause if you want to retain at least a few versions of the data.
REPLICATION_SCOPE
Specifies whether replication is to be enabled on the column family to minimize the effects of system failure. The default is 0, which means that the column family is not replicated. A value of 1 means that the column family is replicated.
TTL
Specifies a time to live (TTL) value (in seconds) for the column family. HBase automatically deletes all versions of the row (including the current version) that have exceeded the specified TTL value. The default is forever.
ADD constraint
Constraints are not enforced, but are used to help optimize query performance.
unique-constraint
Defines a unique key. This constraint does not allow duplicate values in one or more columns of the table. A table can have multiple unique keys. The columns that are specified in a unique constraint must be defined as NOT NULL.
referential-constraint
Defines a referential constraint.
CONSTRAINT constraint-name
Specifies a name for the referential constraint.
FOREIGN KEY (column-name, ...)
The foreign key of the referential constraint is composed of the identified columns. Each name in the list of column names must identify a column of the parent table, and the same column must not be identified more than once. The number of foreign key columns must be the same as the number of parent key columns, and the data types of the corresponding columns must be compatible (SQLSTATE 42830).
check-constraint
Defines a check constraint.
constraint-name
Specifies a name for the check constraint.
CHECK check-condition
Sets restrictions on data that is added to a specific table. This parameter is enforced during INSERT operations, but is not enforced during a LOAD HADOOP operation, or on data that is made available outside of the database manager (by using Hive operations, for example).
search-condition
The search-condition must be true or unknown for every row of the table, and has the following restrictions:
  • A column reference must pertain to a column of the table that is being created.
  • The search condition cannot contain a TYPE predicate.
  • The search condition cannot contain any of the following items (SQLSTATE 42621):
    • Subqueries
    • CAST specifications with a SCOPE clause
    • Column functions
    • Functions that are not deterministic
    • Functions that are defined to have an external action
    • User-defined functions that are defined with either MODIFIES SQL or READS SQL DATA
    • OLAP specifications
    • Global variables
    • References to generated columns other than the identity column
    • An error tolerant nested table expression

Check constraints with search-condition are enforced when rows in the table are inserted or updated. A check constraint that is defined on a table automatically applies to all subtables of that table.

functional-dependency
Defines a functional dependency between columns or sets of columns.

The parent set of columns precedes the DETERMINED BY clause, and the child set of columns follows the DETERMINED BY clause. All restrictions on search-condition apply to both parent set and child set columns, and only simple column references are allowed in the column sets (SQLSTATE 42621). The same column must not be identified more than once in a functional dependency (SQLSTATE 42709). No column in the child set of columns can be a nullable column (SQLSTATE 42621).

A functional dependency is not enforced by the database manager during normal operations such as insert, update, or delete. The functional dependency might be used during query rewrite for optimization. Incorrect results might be returned if the integrity of a functional dependency is not maintained.

ADD partition-spec
Specifies a new partition definition. For Hadoop tables, a partition is a split of table data into file system directories based on the partitioning values in the table. The table must already have one or more partitioning columns specified. All columns that are referenced in the partitioning key must be specified in the new partition value. If you specify a set of partition column values that already exists, an error is returned. Partitions are not valid for HBase tables.
IF NOT EXISTS
Checks whether the specified partition already exists.
PARTITION (column-name=constant-value)
Specifies a column name and a value on which you define the partition. The following data types cannot be used as partitioning columns:
  • ARRAY
  • BINARY
  • BOOLEAN
  • DOUBLE
  • FLOAT
  • MAP
  • REAL
  • ROW
  • STRUCT
  • TIMESTAMP
  • VARBINARY
Important: A partitioning key value cannot include any trailing blank spaces. This could be a problem if you plan to partition on a CHAR data type, because these values are always padded to their full length with blank spaces. Db2 Big SQL also does not support partitioning on a zero-length string.
LOCATION location
Specifies an HDFS location if the data is already in a nonstandard location. The location value must be delimited by single quotation marks. The default location for partitioned data in a partitioned table named tab1 with partition columns pcol1...pcoln and values pval1...pvaln is shown in the following example:
/apps/hive/warehouse/schema.dbname/tab1/pcol1=pval1/.../pcoln=pvaln/...
Use the LOCATION clause to specify a non-default location. If the specified directory does not already exist, a directory is created. A drop table operation removes the location directory.
DROP constraint
Drops an existing primary key, foreign key, unique, or check constraint from the table.
DROP partition-spec
Specifies a partition that is to be dropped.
IF EXISTS
Checks whether the specified partition exists.
PARTITION column-name=constant-value
Specifies a column name and a value that identifies the partition.
PURGE
Specifies that the partition is to be removed from the catalogs and that the data is not to be passed from an encrypted zone to the Hive metastore trash. Specify this option if the partition was initially created in an encrypted zone, because moving the data to a non-encrypted zone returns an error during the drop operation.
ALTER constraint
Changes an existing foreign key or check constraint.
constraint-alteration
For details, see constraint-attributes.
ALTER COLUMN column-alteration
Alters the definition of a column. Only the specified attributes are altered; others remain unchanged.
column-name
Specifies the name of the column that is to be altered. The column-name must identify an existing column of the table (SQLSTATE 42703). The name must not be qualified. The name must not identify a column that is otherwise being added, altered, or dropped in the same ALTER TABLE statement (SQLSTATE 42711).
SET DATA TYPE
Specifies a new data type for the column. The new data type must be castable from the existing data type of the column (SQLSTATE 42837). The column alteration must also be a data type change that is supported in Hive. When a column is altered in this way, only the metadata for the table is affected; existing data on disk remains unchanged. For a description of the supported data types, see data-type.
SET NOT NULL
Specifies that the column cannot contain null values. No value for this column in existing rows of the table should be null; otherwise, an error might be returned when the data is selected, even if the row with the null value is not included in the final result set. This clause is not allowed if the column is specified in the foreign key of a referential constraint with a DELETE rule of SET NULL, and no other nullable columns exist in the foreign key (SQLSTATE 42831).
DROP NOT NULL
Drops the NOT NULL attribute of the column, thereby enabling the column to have null values. This clause is not allowed if the column is specified in the primary key or in a unique constraint on the table (SQLSTATE 42831).
ACTIVATE | DEACTIVATE ROW ACCESS CONTROL
Specifies whether row-level access control is to be activated on the table.
ACTIVATE ROW ACCESS CONTROL
Activates row-level access control on the table. The table must not be a typed table, a catalog table (SQLSTATE 55019), a nickname (SQLSTATE 42809), or a view (SQLSTATE 42809).

A default row permission is implicitly created and allows no access to any rows of the table, unless permitted by a row permission explicitly created by a user with SECADM authority.

When the table is referenced in a data manipulation statement, all enabled row permissions that have been created for the table, including the default row permission, are applied implicitly by the database manager to control the set of rows in the table that are accessible.

If a materialized query table (or a staging table) that depends on the table (directly or indirectly through a view) for which row-level access control is being activated and that materialized query table (or a staging table) does not already have row-level access control activated, row-level access control is implicitly activated for the materialized query table (or a staging table). This restricts direct access to the contents of the materialized query table (or a staging table). A query that explicitly references the table before such a row permission is defined will return a warning that there is no data in the table (SQLSTATE 02000). To provide access to the materialized query table (or a staging table), an appropriate row permission can be created, or an ALTER TABLE DEACTIVATE ROW ACCESS CONTROL statement on the materialized query table (or a staging table) can be issued to remove the row-level protection, if that is appropriate.

This clause is ignored if row access control is already defined as activated for the table.

DEACTIVATE ROW ACCESS CONTROL
Deactivates row-level access control on the table. When the table is referenced in a data manipulation statement, any existing enabled row permissions defined on the table are not applied by the database manager to control the set of rows in the table that are accessible.

This clause is ignored if row access control is not activated for the table.

ACTIVATE | DEACTIVATE COLUMN ACCESS CONTROL
Specifies whether column-level access control is to be activated on the table.
ACTIVATE COLUMN ACCESS CONTROL
Activates column-level access control on the table. The table must not be a typed table, a catalog table (SQLSTATE 55019), a nickname (SQLSTATE 42809) or a view (SQLSTATE 42809).

The access to the table is not restricted but when the table is referenced in a data manipulation statement, all enabled column masks that have been created for the table are applied implicitly by the database manager to mask the values returned for the columns referenced in the final result table of the queries.

If a materialized query table that depends on the table (directly or indirectly through a view) for which column-level access control is being activated and that materialized query table does not already have row-level access control activated, row-level access control is implicitly activated for the materialized query table. This restricts direct access to the contents of the materialized query table. A query that explicitly references the table before such a row permission is defined returns a warning that there is no data in the table (SQLSTATE 02000). To provide access to the materialized query table, an appropriate row permission can be created, or an ALTER TABLE DEACTIVATE ROW ACCESS CONTROL statement on the materialized query table can be issued to remove the row-level protection, if that is appropriate.

This clause is ignored if column-level access control is already defined as activated for the table.

DEACTIVATE COLUMN ACCESS CONTROL
Deactivates column-level access control on the table. When the table is referenced in a data manipulation statement, any existing enabled column masks defined on the table are not applied by the database manager to control the values returned for the columns referenced in the final result table of the queries.

This clause is ignored if column access control is not activated for the table.

RENAME COLUMN source-column-name TO target-column-name
Renames the specified column with target-column-name.
RENAME TO table-name
Renames the table. You cannot change the schema name when you rename a table.
SET TBLPROPERTIES ('string-constant1'='string-constant2')
Defines job properties that can configure input or output formats, and that you can use to specify more detailed information about the table. The properties are passed as is to the underlying InputFormat and OutputFormat classes. Properties that are not implemented by these classes are ignored. Db2 Big SQL does not validate the properties.
SET SERDE serde-class
Specifies the name of a Java™ class in the Db2 Big SQL server CLASSPATH that implements the Hive SerDe interface org.apache.hadoop.hive.serde2.SerDe.
WITH SERDEPROPERTIES ('string-constant1'='string-constant2')
Associates SerDe properties with the SerDe class. For example, if you want to eventually load HBase data into a Hive table, create the table by using this option and the hbase.columns.mapping parameter.
SET SERDEPROPERTIES ('string-constant1'='string-constant2')
Associates SerDe properties with the SerDe class. For example, if you want to eventually load HBase data into a Hive table, create the table by using this option and the hbase.columns.mapping parameter.
SET TABLE OPTIONS table-opt-name=table-opt-value
You can set one or more of the following table options.
MEMSTORE_FLUSHSIZE=n
Specifies the maximum size (in bytes) of the region server main memory storage before its contents are flushed to the file system. The memory storage for all column families is flushed together. The default is 64 MB.
READONLY
Specifies whether the table can be updated. Valid values are true or false. The default is false.
COMPACTION_ENABLED
Specifies whether the table is enabled for HBase compaction. Valid values are true or false. The default is determined in the HBase configuration.
DURABILITY = ASYNC_WAL | FSYNC_WAL | SKIP_WAL | SYNC_WAL | USE_DEFAULT
Specifies the durability of the write-ahead log (WAL). The WAL contains a record of the data changes. Choose one of the following values:
ASYNC_WAL
Writes mutations to the WAL asynchronously.
FSYNC_WAL
Writes mutations to the WAL synchronously and forces the entries to disk.
SKIP_WAL
Does not write mutations to the WAL.
SYNC_WAL
Writes mutations to the WAL synchronously.
USE_DEFAULT
Uses the HBase global default value (SYNC_WAL).
MAX_FILESIZE = n
Sets the maximum size (in bytes) to which a table region can grow. A region represents a set of table rows that sort between a specific start key and end key. If this size is exceeded, a region split is triggered. The default is 256 MB (268435456 bytes).
SET COLUMN FAMILY OPTIONS FOR (hbase-cfamily hbase-cfamily-opts)
Changes the HBase column family options for hbase-cfamily. For details, see hbase-cfamily-opts.
SET LOCATION location
Changes the specific distributed file system directory that is used to store data files.
Remember: If you alter the location of a table or a partition that is cached, the old location is uncached first, and then the new location is cached automatically.
SET FILEFORMAT file-format
Specifies the data file format. For information about the supported file formats, see File formats supported by Db2 Big SQL.
CHANGE COLUMN
Specify this clause to change the attributes of an existing column.
column-name
Specifies the name of the column whose attributes are to be changed.
column-definition
column-name
Specifies a new name for the column. If you do not want to change the column name when you change other attributes of the column, you must set the new column name to be the same as the current column name.
data-type
Specifies a new data type for the column. The new data type must be castable from the existing data type of the column (SQLSTATE 42837). The column alteration must also be a data type change that is supported in Hive. When a column is altered in this way, only the metadata for the table is affected; existing data on disk remains unchanged. For a description of the supported data types, see data-type.
column-options
You can define column options including a comment, the nullability attribute, or constraints. For details, see column-options.
TOUCH partition-spec
Specifies that the timestamp of the last partition column modification be updated in the catalog to the current time.
PARTITION column-name=constant-value
Specifies a column name and a value on which you define the partition.
SET LOCATION location
For details, see LOCATION.
RENAME TO partition-spec
Updates the value of a partition column and renames the partition. If the partition is part of an external table, Db2 Big SQL renames the file system directory that contains the partition data if the following conditions are met:
  • The data location for the partition has not been set explicitly by using the LOCATION clause.
  • The data location is not in object storage.
  • The table property external.partition.rename has not been set to false.
The table property external.partition.rename is unset by default, but you can use it to override default behavior when partitions are being renamed. If the property is set to true, and an external table has partition data in object storage, Db2 Big SQL attempts to rename the data location.
COMPACT
Triggers a compaction operation on a transactional Hive table or partition. Compaction rewrites a transactional table's underlying data files and helps to maintain query performance. For more information, see File compaction. If the table is not defined as transactional, the statement completes successfully but has no effect. If the table is partitioned, a partition-spec must be specified; otherwise, an error is returned.
MAJOR | MINOR
If a major compaction is specified, all base and delta files for a table or partition are rewritten as a single base file. If a minor compaction is specified, all delta files for a table or partition are rewritten as a single delta file. Although a minor compaction is less expensive than a major compaction, the former is less effective because the data might remain distributed across both base files and delta files.
AND WAIT
Specifies that the statement is to run until the compaction operation has completed. If this option is omitted, compaction is queued as a background process and the statement returns immediately before compaction is complete. For more information, see File compaction.
WITH OVERWRITE TBLPROPERTIES ('string-constant1'='string-constant2')
Specifies that the table properties are to be updated. You can use this clause to change compaction properties for the table or to update any other table properties for the table.
EXCHANGE PARTITION
Db2 Big SQL 7.2 service This feature requires the Db2 Big SQL 7.2 service on IBM Cloud Pak for Data 4.0. Exchanging multiple partitions is supported in Hive 1.2.2, 1.3.0, 2.0.0, and later.
Moves a partition from source-table-name to the table being altered (table-name) and updates each table's metadata in the Hive metastore. The source table's partition folder in HDFS is renamed to move it to the destination table's partition folder. When using this option, ensure that all of the following conditions are true:
  • The source and destination table have the same schema name.
  • The destination table does not contain the partition that is to be exchanged.
  • Neither the source table nor the destination table is a transactional table. (To move partitions across transactional tables, you can use the LOAD HADOOP statement or the INSERT…OVERWRITE statement.)
  • There is no index.
partition-spec
This can be either a full or partial partition specification.
WITH TABLE source-table-name
Specifies the table whose partition is to be moved.
For examples, see 13.

Usage notes

  • Altering a column in a Hive table to specify a new data type for the column does not modify the table data, and only the metadata for the table is affected. To change the data, you must update the files in HDFS or create a new table and populate that table with data in the new format.
  • If you alter a table column to specify a new data type for the column, and the domain of the new type is lower than that of the original data type, table data values might be returned in a truncated form. Ensure that the entire range of values in your table column fits within the domain of the new data type.
  • Column alterations to tables in the Avro file format require a matching update to either the 'avro.schema.literal' property or to the contents of the file that is referenced by the 'avro.schema.url' property. For more information, see File formats that are supported by Db2 Big SQL.
  • Because of a current incompatibility between Hive and Db2 Big SQL, altering a Hive table to drop a column is not supported. Create a new table that does not include the unneeded column.
  • You cannot alter an HBase column from one data type to another data type.
  • When you alter a table to add a column that is derived from an HBase data source, the new column and its data are created. In the case of external data sources (such as delimited files, for example), data for the column that is being added must already exist.
  • Foreign key constraints are possible between Hadoop and local database tables.
  • The following rules apply when you alter the data type of a ROW or ARRAY column:
    • Any ARRAY data type can be converted to any other ARRAY data type, regardless of cardinality and element type.
    • Any ROW data type can be converted to any other ROW data type.
    • Any MAP data type can be converted to any other MAP data type.
  • When the ALTER TABLE…DROP partition-spec statement runs concurrently with a SELECT statement, the SELECT statement might return SQLCODE -5105, and you might see FileNotFoundException in the bigsql.log file on your worker hosts. For more information, see General SQL limitations and restrictions.
  • When the ALTER TABLE…DROP partition-spec statement runs concurrently with an INSERT statement, you might see FileNotFoundException in the bigsql-sched.log file on your scheduler host. For more information, see General SQL limitations and restrictions.
  • When the COMPACT...AND WAIT option is specified, a lock timeout (SQLCODE -911 with reason code 68) might occur if the compaction operation takes a long time to complete.
  • The following syntax alternatives are supported for compatibility with previous versions of this product and with other database products:
    • You can use the ARRAY syntax that Hive uses to define an ordinary ARRAY data type. For example, you can specify ARRAY<VARCHAR(8)> instead of VARCHAR(8) ARRAY[ ].
    • You can use the MAP syntax that Hive uses to define an associative ARRAY data type. For example, you can specify MAP<VARCHAR(10), DOUBLE> instead of DOUBLE ARRAY [VARCHAR(10)].
    • You can use the STRUCT syntax that Hive uses to define a ROW data type. For example, you can specify STRUCT<F1:INTEGER, F2:DOUBLE> instead of ROW(F1 INTEGER, F2 DOUBLE).

Restrictions

  • You can run the ALTER TABLE (HADOOP/HBASE) statement only as a top-level statement. It cannot be run from routines, compound blocks, or prepared statements.
  • Constraints that are defined on Hadoop tables cannot be enforced (SQLSTATE 42858).
  • You cannot alter a partitioning column in a Hadoop table (SQLSTATE 42858).

Examples

  1. Create a table and then change the data type of one of its columns.
    CREATE HADOOP TABLE orders (
        orderkey INT NOT NULL,
        orderdate TIMESTAMP,
        o_comment STRING COMMENT 'comment column' NOT NULL
      );
    
    LOAD HADOOP
      USING FILE URL '/tmp/somedata/orders.txt'
      WITH SOURCE PROPERTIES (
        'ignore.extra.fields'='true',
        'field.delimiter'='|'
      )
      INTO TABLE orders
      OVERWRITE;
    
    SELECT orderkey
      FROM orders 
      WHERE orderdate < '1992-07-01'  
      FETCH FIRST 3 ROWS ONLY;
    
    ORDERKEY
    ---------
         6
        37
       128
    
    ALTER TABLE orders
      ALTER COLUMN orderkey
      SET DATA TYPE DOUBLE;
    
    SELECT orderkey
      FROM orders
      WHERE orderdate < '1992-07-01'
      FETCH FIRST 3 ROWS ONLY;
    
         ORDERKEY
    ---------------------
    +6.00000000000000E+000
    +3.70000000000000E+001  
    +1.28000000000000E+002
  2. Create a table named COMP and then alter it to add an SQL column (C3), mapping that column to an existing HBase column.
    CREATE HBASE TABLE comp (
        k0 VARCHAR(10),
        k1 INT,
        k2 VARCHAR(10),
        k3 INT,
        c0 VARCHAR(10),
        c1 INT,
        c2 INT
      )
      COLUMN MAPPING (
        KEY MAPPED BY (k0, k1, k2, k3),
        cf:cq MAPPED BY (c0, c1, c2)
      );
    
    ALTER TABLE comp 
      ADD c3 INT
      UPDATE HBASE COLUMN MAPPING APPEND cf:cq;
  3. Alter the COMP table to add an SQL column (C3), mapping that column to a new HBase column in an existing HBase column family.
    ALTER TABLE comp
      ADD c3 INT
      UPDATE HBASE COLUMN MAPPING ADD cf:cq2;
  4. Alter the COMP table to add an SQL column (C3), mapping that column to a new HBase column family.
    ALTER TABLE comp
      ADD c3 INT
      UPDATE HBASE COLUMN MAPPING ADD cf2:cq;
  5. Create a table named COMP1 and then alter it to change the properties of multiple column families.
    CREATE HBASE TABLE comp (
        k0 VARCHAR(3),
        k1 INT,
        k2 VARCHAR(2),
        k3 INT,
        c0 VARCHAR(10),
        c1 INT,
        c2 INT
      )
      COLUMN MAPPING (
        KEY MAPPED BY (k0, k1, k2, k3),
        f1:q1 MAPPED BY (c0, c1),
        f2:q2 MAPPED BY (c2)
      );
    
    ALTER TABLE comp
      SET COLUMN FAMILY OPTIONS FOR (
        f1 TTL(10) REPLICATION_SCOPE(1),
        f2 MIN_VERSIONS(1)
      );
  6. Alter an HBase table to change its table options.
    ALTER TABLE hb1
      SET TABLE OPTIONS (
        "MEMSTORE_FLUSHSIZE"="1000",
        "READONLY"="true"
      );
  7. Alter a table to set its table properties.
    ALTER TABLE sample.t1
      SET TBLPROPERTIES (
        'numFiles'='1',
        'rawDataSize'='4', 
        'totalSize'='200',
        'numPartitions'='0'
      );
  8. Alter a table to add or drop unique constraints.
    ALTER TABLE t1
      ADD CONSTRAINT uq1 UNIQUE (i) NOT ENFORCED;
    ALTER TABLE t1
      ADD UNIQUE (j) NOT ENFORCED;
    ALTER TABLE t1
      DROP UNIQUE uq1;
  9. Alter a table to add or drop primary key constraints.
    ALTER TABLE t1
      ADD CONSTRAINT pk1 PRIMARY KEY (i) NOT ENFORCED;
        or
    ALTER TABLE t1
      ADD PRIMARY KEY (i) NOT ENFORCED;
    
    ALTER TABLE t1
      DROP PRIMARY KEY;
  10. Alter a table to add or drop foreign key constraints.
    ALTER TABLE t1
      ADD CONSTRAINT fk1 FOREIGN KEY (i) REFERENCES t2 NOT ENFORCED;
    ALTER TABLE t1
      ADD FOREIGN KEY (j) REFERENCES t2 (k) NOT ENFORCED;
    ALTER TABLE t1
      DROP FOREIGN KEY fk1;
  11. You can dynamically change how files in a table are split for scanning by setting the scheduler.minSplitSize table property. By default, files are split at block boundaries, but you can set a size value that is larger than the block size, or you can specify that the split should follow the file boundaries.
    • You can set the minimum split size to a specific value. In this example, 256 MB splits (268435456 bytes) are specified:
      ALTER TABLE t3
        SET TBLPROPERTIES (
          'scheduler.minSplitSize'='268435456'
        );
    • You can specify that the split should follow the file boundaries. In this example, each file is assigned to one reader:
      ALTER TABLE t3
        SET TBLPROPERTIES (
          'scheduler.minSplitSize'='file'
        );
    • The following example shows you how to reset the scheduler.minSplitSize table property to its default value:
      ALTER TABLE t3
        SET TBLPROPERTIES (
          'scheduler.minSplitSize'=''
        );
  12. Trigger a major compaction operation on a transactional Hive table named T1. Specify that the statement is to run until the compaction operation has completed. Update the compactor.mapreduce.map.memory.mb compaction map job property to have a value of 4096 MB.
    ALTER TABLE t1
      COMPACT MAJOR AND WAIT WITH OVERWRITE TBLPROPERTIES(
        'compactor.mapreduce.map.memory.mb'='4096'
      );
  13. You can use the EXCHANGE PARTITION clause to move a partition from a source table to the target table and alter each table's metadata.Db2 Big SQL 7.2 service This feature requires the Db2 Big SQL 7.2 service on IBM Cloud Pak for Data 4.0.
    • Create two tables, each of which is partitioned by DS. Add a partition to the first table T1, and then move that partition to the second table T2.
      CREATE TABLE t1 (
          a VARCHAR(100),
          b VARCHAR(100)
        ) PARTITIONED BY (ds VARCHAR(100));
      
      CREATE TABLE t2 (
          a VARCHAR(100),
          b VARCHAR(100)
        ) PARTITIONED BY (ds VARCHAR(100));
      
      ALTER TABLE t1
        ADD PARTITION (ds='1');
      
      ALTER TABLE t2
        EXCHANGE PARTITION (ds='1') WITH TABLE t1;
    • Create two tables with multiple partitioning columns. Add three partitions to the first table T1, and then move all three partitions to the second table T2.
      CREATE TABLE t1 (
          a VARCHAR(100),
          b VARCHAR(100)
        ) PARTITIONED BY (ds VARCHAR(100), hr VARCHAR(100));
      
      CREATE TABLE t2 (
          a VARCHAR(100),
          b VARCHAR(100)
        ) PARTITIONED BY (ds VARCHAR(100), hr VARCHAR(100));
      
      ALTER TABLE t1
        ADD PARTITION (ds = '1', hr = '00');
      
      ALTER TABLE t1
        ADD PARTITION (ds = '1', hr = '01');
      
      ALTER TABLE t1
        ADD PARTITION (ds = '1', hr = '03');
      
      ALTER TABLE t2
        EXCHANGE PARTITION (ds='1') WITH TABLE t1;
    • Create two tables with multiple partitioning columns. Add a partition to the first table T1, and then move partitioned data (d1=1, d2=2) to the second table T2.
      CREATE TABLE t1 (
          a INT
        ) PARTITIONED BY (d1 INT, d2 INT);
      
      CREATE TABLE t2 (
          a INT
        ) PARTITIONED BY (d1 INT, d2 INT);
      
      ALTER TABLE t1
        ADD PARTITION (d1=1, d2=2);
      
      ALTER TABLE t2
        EXCHANGE PARTITION (d1 = 1, d2 = 2) WITH TABLE t1;