Configuring ANALYZE to automatically alter the length of string or VARCHAR(32672) columns

You can change the automatic alter behavior of the ANALYZE command by setting the biginsights.stats.auto.alter property to true or false. The default value of biginsights.stats.auto.alter is true.

About this task

This property causes the ANALYZE command to automatically alter the length of columns that are defined as STRING or VARCHAR(32672) to a length that is optimal for the data in those columns. This behavior occurs for performance reasons; if a row becomes too long, Db2® Big SQL might not be able to optimize certain queries.
Note:
  • When syncing table definitions and table alterations to table columns that are defined as STRING, those columns are automatically converted by Db2 Big SQL to VARCHAR(bigsql.string.size). The default value for bigsql.string.size is 32672. The automatic alter operation does not truncate data stored in HDFS, nor does it change the definition of the table in Hive.
  • If the underlying data does not require the maximum VARCHAR length for storage (for example, if the column never exceeds 100 characters), then Db2 Big SQL allocates unnecessary resources for the handling of that column. To improve performance, use a VARCHAR(n) with a defined size instead of a STRING. Alter the table to define an appropriate length for the column by using the following syntax:
    ALTER TABLE <schema>.<table> ALTER COLUMN <col> SET DATA TYPE VARCHAR(<size>)
    This alter operation changes the definition of the table both in Db2 Big SQL and in Hive. If data in the VARCHAR(n) column exceeds <size>, a null value is returned when that data is queried.

The HCAT_SYNC_OBJECTS procedure might make the size of the VARCHAR column less than 32672 if the table is wide. When ANALYZE runs against such converted (or alterable) columns, the columns are altered to an appropriate size, based on the data that is seen by ANALYZE. Only the Db2 Big SQL definition is altered; the original DDL and data in Hive or HBase is preserved. Such alteration helps the optimizer and improves performance, but does not affect resynchronization or updates to the table.

An analyze operation alters the length of alterable columns to three times the maximum length of data in the column. The minimum altered length is 33. If automatic alter is enabled (the default), ANALYZE will always alter the length of an applicable column, whether or not sampling is done. Only alterable columns that were specified in the ANALYZE command are altered. Therefore, all alterable columns will be altered if the FOR ALL COLUMNS clause is specified (which is the default for the auto-analyze feature).

Attention: If the underlying data does not require the maximum VARCHAR length for storage (for example, if the column never exceeds 100 characters), be sure to use a VARCHAR(n) with a defined size instead of a STRING to reduce resource allocation and improve performance. Alter the table to define an appropriate length for the column by using the following syntax:
ALTER TABLE <schema>.<table> ALTER COLUMN <col> SET DATA TYPE VARCHAR(<size>)
This alter operation changes the definition of the table both in Db2 Big SQL and in Hive. If data in the VARCHAR(n) column exceeds <size>, a null value is returned when that data is queried.

Procedure

If you want to avoid this automatic alter behavior of the ANALYZE command, complete the following steps:

  1. Open the bigsql-conf.xml configuration file at $BIGSQL_HOME/conf/bigsql-conf.xml on the head node only.
  2. Complete one of the following updates:
    • Before creating a table, set bigsql.string.size to a value that is less than 32672. Ideally, this value should be optimized for the data being stored. For example:
      
      <property>
       <name>bigsql.string.size</name>
       <value>14520</value> 
      </property>
    • Before a load operation, an HCAT_SYNC_OBJECTS call, or an analyze operation, set biginsights.stats.auto.alter to false:
      
      <property>
       <name>biginsights.stats.auto.alter</name>
       <value>false</value> 
      </property>
  3. Restart the Db2 Big SQL service.