Data types that are supported by Db2 Big SQL for Hadoop and HBase tables
Db2® Big SQL supports a number of data types for Hadoop and HBase tables.
Data type supported in CREATE TABLE (HADOOP/HBASE) | Db2 Big SQL data type | Notes® |
---|---|---|
ARRAY | ARRAY | For more information, see ARRAY. |
BIGINT | 64-bit signed integer | INT8 is a synonym for BIGINT. |
BINARY | BINARY | BINARY is a synonym for VARBINARY. If BINARY is used, it will be treated as though VARBINARY were specified. If a length is not specified, 32,672 characters is assumed. It is recommended that an explicit length be specified. |
BOOLEAN | BOOLEAN | For more information, see Boolean values. |
CHAR(n) | Up to 255 bytes | |
DATE | Date | For more information, see DATE. |
DECIMAL(p,s) | DECIMAL | For more information, see DECIMAL. |
DOUBLE | 64-bit IEEE float | |
FLOAT | 64-bit IEEE float | FLOAT8 is a synonym for FLOAT. Change to REAL in DDL to keep Db2 Big SQL 1.0 behavior. For more information, see FLOAT and REAL. |
INTEGER | 32-bit signed integer | INT4 is a synonym for INTEGER. |
REAL | 32-bit IEEE float | FLOAT4 is a synonym for REAL. For more information, see FLOAT and REAL. |
SMALLINT | 16-bit signed integer | INT2 is a synonym for SMALLINT. |
STRING | 32,672 bytes | Use of the STRING type in Db2 Big SQL is discouraged because of the resulting increase in storage requirements and the potential for reduced performance. See Data types and performance. See also STRING. |
STRUCT | STRUCT | For more information, see ROW. |
TIMESTAMP | Date and time with 9 digits of fractional time | DATETIME is a synonym for TIMESTAMP. |
TINYINT | 16-bit signed integer | For more information, see TINYINT. |
[VAR]BINARY(n) | [VAR]BINARY(n) | BINARY is a synonym for VARBINARY. If BINARY is used, it will be treated as though VARBINARY were specified. If a length is not specified, 32,672 characters is assumed. It is recommended that an explicit length be specified. |
VARCHAR(n) | Up to 32,672 bytes |
To learn about ways in which applications can use Db2 Big SQL data types, see Understanding data types.
ARRAY
The ARRAY type can be an ordinary ARRAY or an associative ARRAY. You can use the ARRAY syntax that Hive uses to define an ordinary ARRAY data type, or the MAP syntax that Hive uses to define an associative ARRAY data type, but the SQL standard is recommended.
DATE
When a Db2 Big SQL HBase or Hadoop table is created with a column of type DATE, this column is mapped to the Hive DATE type.
- The range of the month part is 1 - 12.
- The range of the day part is 1 - x, where x is 28, 29, 30, or 31, and depends on the month.
- The range of the year part is 0001 to 9999.
If you are using the ORC file
format with historical DATE values (values prior to 1582-10-15), and the ORC files were inserted
with Db2 Big SQL 6.0 or Hive on HDP 3.0, use
tblproperties('orc.proleptic.gregorian.default' = 'true')
in the CREATE TABLE
(HADOOP) statement or the ALTER TABLE (HADOOP/HBASE) statement. If the ORC files were inserted with
Db2 Big SQL 5.0.4 (or earlier) or Hive on HDP 2.6.5 (or
earlier), use tblproperties('orc.proleptic.gregorian.default' = 'false')
or leave
it unset. False is the default setting for
orc.proleptic.gregorian.default.
DECIMAL
The DECIMAL data type is a numeric data type with fixed scale and precision. The precision represents the total number of digits that can be represented by the column. Therefore, +12.34 and -1.234 are both defined with a precision of 4. The scale is the number of fractional digits, and must be less than or equal to the precision.
The maximum precision value is 31. Therefore, the largest integral value is represented by DECIMAL(31,0) and contains the value 9 repeated 31 times. The most precise fractional value between 0 and 1 or between 0 and -1 is represented by DECIMAL(31,31), with 31 digits to the right of the decimal point. The closest value to 0 has 30 zeros and a 1 following the decimal point. The closest value to 1 has 31 nines following the decimal point.
Db2 Big SQL converts between DECIMAL and other numeric types. For example, a DECIMAL value with zero scale is converted to the smallest appropriate integer type.
FLOAT and REAL
Db2 Big SQL defines a REAL data type as a 32-bit floating point value. The definition of a FLOAT data type is a synonym for DOUBLE.
In Hive, a FLOAT always refers to a 32-bit floating point value, which corresponds to the JAVA FLOAT data type. In Db2 Big SQL, the REAL data type is a synonym for the Hive FLOAT data type.
- For CREATE HADOOP TABLE statements, change all columns defined as FLOAT to REAL to retain the same storage characteristics.
- When you access Hive or Db2 Big SQL tables, FLOAT values are treated as DOUBLE values when read, which can produce minor rounding differences.
You can see precision errors when you do operations with the DOUBLE data type column and DECIMAL literals in a table defined with ROW FORMAT DELIMITED. As a best practice, use values in an E-notation format for the double data type column (+1.14000000000000E+000). You can use the ROUND function to work around some precision errors.
ROW
The ROW type contains field definitions that contain a field name and a data type. You can use the STRUCT syntax that Hive uses to define a ROW data type, but the SQL standard is recommended.
STRING
- If a value in the underlying storage exceeds the maximum VARCHAR length, that value is silently truncated when converted to the Db2 Big SQL native data type.
- If the underlying data does not require the maximum VARCHAR length for storage (for example, if
the column never exceeds 100 characters), then Db2 Big SQL
allocates unnecessary resources for the handling of that column. To improve performance, use a
VARCHAR(n) with a defined size instead of a STRING. Alter the table to define an
appropriate length for the column by using the following
syntax:
This alter operation changes the definition of the table both in Db2 Big SQL and in Hive. If data in the VARCHAR(n) column exceedsALTER TABLE <schema>.<table> ALTER COLUMN <col> SET DATA TYPE VARCHAR(<size>)
<size>
, a null value is returned when that data is queried.
SET HADOOP PROPERTY 'bigsql.string.size'=4096;
The result of this
property change is that all STRING types will be represented by the SQL data type VARCHAR(4096).TIMESTAMP
- The range of the hour part is 0 - 23.
- The range of the minute and second parts is 0 - 59.
The number of digits in the fractional seconds part is specified by using an attribute whose value is between 0 and 9 (default 9).
Timestamps are considered to be time zone-less. For convenience, functions are provided for converting values to and from specific time zones (TO_UTC_TIMESTAMP, FROM_UTC_TIMESTAMP).
If you are using the ORC file
format with historical TIMESTAMP values (values prior to 1582-10-15), and the ORC files were
inserted with Db2 Big SQL 6.0 or Hive on HDP 3.0, use
tblproperties('orc.proleptic.gregorian.default' = 'true')
in the CREATE TABLE
(HADOOP) statement or the ALTER TABLE (HADOOP/HBASE) statement. If the ORC files were inserted with
Db2 Big SQL 5.0.4 (or earlier) or Hive on HDP 2.6.5 (or
earlier), use tblproperties('orc.proleptic.gregorian.default' = 'false')
or leave
it unset. False is the default setting for
orc.proleptic.gregorian.default.
TINYINT
The TINYINT type in the context of the CREATE TABLE (HADOOP) statement is supported only to maintain compatibility with Hive. It is unavailable in other contexts. TINYINT is mapped to the Db2 SMALLINT type, but the values are constrained (-128 to 127) to match the Hive definition of a TINYINT.
Data types and performance
The Db2 Big SQL engine works most efficiently when the table definition enables a row to fit within 32 KB of memory. When the calculated row size exceeds 32 KB, some queries might see a performance degradation. The STRING data type should be avoided because it maps internally to VARCHAR(32672), which means that the table rows will usually exceed 32 KB in size. You can avoid this performance degradation by changing any STRING data type references to explicit VARCHAR(n) types that best match the data size.
ALTER TABLE <schema>.<table> ALTER COLUMN <col> SET DATA TYPE VARCHAR(<size>)
This
alter operation changes the definition of the table both in Db2 Big SQL and in Hive. If data in the VARCHAR(n) column exceeds
<size>
, a null value is returned when that data is queried.