Configuring ANALYZE to automatically alter the length of string or VARCHAR(32672) columns
You can change the automatic alter behavior of the ANALYZE command by setting the biginsights.stats.auto.alter property to true or false. The default value of biginsights.stats.auto.alter is true.
About this task
- When syncing table definitions and table alterations to table columns that are defined as STRING, those columns are automatically converted by Db2 Big SQL to VARCHAR(bigsql.string.size). The default value for bigsql.string.size is 32672. The automatic alter operation does not truncate data stored in HDFS, nor does it change the definition of the table in Hive.
- If the underlying data does not require the maximum VARCHAR length for storage (for example, if
the column never exceeds 100 characters), then Db2 Big SQL
allocates unnecessary resources for the handling of that column. To improve performance, use a
VARCHAR(n) with a defined size instead of a STRING. Alter the table to define an
appropriate length for the column by using the following
syntax:
This alter operation changes the definition of the table both in Db2 Big SQL and in Hive. If data in the VARCHAR(n) column exceedsALTER TABLE <schema>.<table> ALTER COLUMN <col> SET DATA TYPE VARCHAR(<size>)
<size>
, a null value is returned when that data is queried.
The HCAT_SYNC_OBJECTS procedure might make the size of the VARCHAR column less than 32672 if the table is wide. When ANALYZE runs against such converted (or alterable) columns, the columns are altered to an appropriate size, based on the data that is seen by ANALYZE. Only the Db2 Big SQL definition is altered; the original DDL and data in Hive or HBase is preserved. Such alteration helps the optimizer and improves performance, but does not affect resynchronization or updates to the table.
An analyze operation alters the length of alterable columns to three times the maximum length of data in the column. The minimum altered length is 33. If automatic alter is enabled (the default), ANALYZE will always alter the length of an applicable column, whether or not sampling is done. Only alterable columns that were specified in the ANALYZE command are altered. Therefore, all alterable columns will be altered if the FOR ALL COLUMNS clause is specified (which is the default for the auto-analyze feature).
ALTER TABLE <schema>.<table> ALTER COLUMN <col> SET DATA TYPE VARCHAR(<size>)
This
alter operation changes the definition of the table both in Db2 Big SQL and in Hive. If data in the VARCHAR(n) column exceeds
<size>
, a null value is returned when that data is queried.Procedure
If you want to avoid this automatic alter behavior of the ANALYZE command, complete the following steps: