Data types supported by Datalake tables
Db2® supports a number of data types for Datalake tables.
Datalake table data type 1 | Db2 data type | Hive data type 2 | Notes |
---|---|---|---|
BIGINT | BIGINT | BIGINT | This is a 64-bit signed integer. INT8 is a synonym for BIGINT. |
BINARY | VARBINARY | BINARY (starting with Hive 0.8.0) | BINARY is a synonym for VARBINARY. If BINARY is used, it will be treated as though VARBINARY were specified. If a length is not specified, 32,672 characters is assumed. It is recommended that an explicit length be specified. |
BOOLEAN | BOOLEAN | BOOLEAN | For more information, see Boolean values. |
CHAR(n) | CHAR(n) | CHAR | |
DATE | DATE | DATE | For more information, see DATE. |
DECIMAL(p,s) | DECIMAL(p,s) | DECIMAL (introduced in Hive 0.11.0 with a precision of 38 digits) | For more information, see DECIMAL. |
DOUBLE | DOUBLE | DOUBLE | 64-bit IEEE float. |
FLOAT(n) (FLOAT4) | REAL | FLOAT | FLOAT4 is a synonym for REAL. Columns defined as FLOAT(n) are stored as float where n <= 24. 32-bit IEEE 754 floating point. For more information, see FLOAT and REAL. |
FLOAT(n) (FLOAT8) | DOUBLE | DOUBLE | FLOAT8 is a synonym for DOUBLE. Columns defined as FLOAT(n) are stored as double where n > 24. 64-bit IEEE 754 floating point. For more information, see FLOAT and REAL. |
INTEGER | INTEGER | INT | This is a 32-bit signed integer. INT4 is a synonym for INTEGER. |
REAL | REAL | FLOAT | 32-bit IEEE float. FLOAT4 is a synonym for REAL. For more information, see FLOAT and REAL. |
SMALLINT | SMALLINT | SMALLINT | 16-bit signed integer. INT2 is a synonym for SMALLINT. |
STRING | VARCHAR(32672) | STRING | Use of the STRING type in Db2 is discouraged because of the resulting increase in storage requirements and the potential for reduced performance. For more information, see STRING. |
TIMESTAMP | TIMESTAMP | TIMESTAMP | DATETIME is a synonym for TIMESTAMP. |
TINYINT | SMALLINT | TINYINT | For more information, see TINYINT. |
[VAR]BINARY(n) | [VAR]BINARY(n) | BINARY | BINARY is a synonym for VARBINARY. If BINARY is used, it will be treated as though VARBINARY were specified. If a length is not specified, 32,672 characters is assumed. It is recommended that an explicit length be specified. |
VARCHAR(n) | VARCHAR(n) | VARCHAR | The maximum length for a VARCHAR is 32672 bytes. |
|
Datalake table data type | Db2 data type | Iceberg data type | Notes |
---|---|---|---|
BIGINT | BIGINT | long | This is a 64-bit signed integer. INT8 is a synonym for BIGINT. |
BINARY | VARBINARY | binary | BINARY is a synonym for VARBINARY. If BINARY is used, it will be treated as though VARBINARY were specified. If a length is not specified, 32,672 characters is assumed. It is recommended that an explicit length be specified. |
BOOLEAN | BOOLEAN | boolean | Boolean values representing true or false. For more information, see Boolean values. |
CHAR(n) | CHAR(n) | string | Maximum length in Db2 is 255 characters. Longer values will be truncated. |
DATE | DATE | date | Calendar date without timezone or time. For more information, see DATE. |
DECIMAL(p,s) | DECIMAL(p,s) | decimal(p,s) | Maximum precision is 31, default is 10. Default scale is 0. For more information, see DECIMAL. |
DOUBLE | DOUBLE | double | |
FLOAT(n) (FLOAT4) | REAL | float | FLOAT4 is a synonym for REAL. Columns defined as FLOAT(n) are stored as float where n <= 24. 32-bit IEEE 754 floating point. For more information, see FLOAT and REAL. |
FLOAT(n) (FLOAT8) | DOUBLE | double | FLOAT8 is a synonym for DOUBLE. Columns defined as FLOAT(n) are stored as double where n > 24. 64-bit IEEE 754 floating point. For more information, see FLOAT and REAL. |
INTEGER (INT, INT4) | INTEGER | int | This is a 32-bit signed integer. INT4 is a synonym for INTEGER. |
REAL | REAL | float | 32-bit IEEE 754 floating point. |
SMALLINT (INT2) | INTEGER | int | 32-bit signed integer. |
STRING | VARCHAR(32672) | string | Use of the STRING type in Db2 is discouraged because of the resulting increase in storage requirements and the potential for reduced performance. For more information, see STRING. |
TIMESTAMP(s) | TIMESTAMP(s) | timestamp | Timestamp without timezone. For mor information, see TIMESTAMP. |
TINYINT | INTEGER | int | This is a 32-bit signed integer. |
VARBINARY(n) | VARBINARY(n) | binary | Arbitrary-length byte array. Maximum length in Db2 is 32672 bytes. No practical limit length in Iceberg - longer values are truncated. |
VARCHAR(n) | VARCHAR(n) | string | Maximum length in Db2 is 32672 characters. No practical limit length in Iceberg - longer values are truncated. |
DATE
When a Db2 Datalake table is created with a column of type DATE, this column is mapped to the Hive DATE type.
- The range of the month part is 1 - 12.
- The range of the day part is 1 - x, where x is 28, 29, 30, or 31, and depends on the month.
- The range of the year part is 0001 to 9999.
If you are using the ORC file
format with historical DATE values (values prior to 1582-10-15), and the ORC files were inserted
with Db2 or Hive on HDP 3.0, use
tblproperties('orc.proleptic.gregorian.default' = 'true')
in the CREATE Datalake
TABLE statement or the ALTER Datalake TABLE statement. If the ORC files were inserted with Db2 or Hive on HDP 2.6.5 (or earlier), use
tblproperties('orc.proleptic.gregorian.default' = 'false')
or leave it unset.
False is the default setting for
orc.proleptic.gregorian.default.
DECIMAL
The DECIMAL data type is a numeric data type with fixed scale and precision. The precision represents the total number of digits that can be represented by the column. Therefore, +12.34 and -1.234 are both defined with a precision of 4. The scale is the number of fractional digits, and must be less than or equal to the precision.
The maximum precision value is 31. Therefore, the largest integral value is represented by DECIMAL(31,0) and contains the value 9 repeated 31 times. The most precise fractional value between 0 and 1 or between 0 and -1 is represented by DECIMAL(31,31), with 31 digits to the right of the decimal point. The closest value to 0 has 30 zeros and a 1 following the decimal point. The closest value to 1 has 31 nines following the decimal point.
Db2 converts between DECIMAL and other numeric types. For example, a DECIMAL value with zero scale is converted to the smallest appropriate integer type.
FLOAT and REAL
Db2 defines a REAL data type as a 32-bit floating point value. The definition of a FLOAT data type is a synonym for DOUBLE.
In Hive, a FLOAT always refers to a 32-bit floating point value, which corresponds to the JAVA FLOAT data type. In Db2, the REAL data type is a synonym for the Hive FLOAT data type.
- For CREATE Datalake TABLE statements, change all columns defined as FLOAT to REAL to retain the same storage characteristics.
- When you access Hive or Db2 Datalake tables, FLOAT values are treated as DOUBLE values when read, which can produce minor rounding differences.
You can see precision errors when you do operations with the DOUBLE data type column and DECIMAL literals in a table defined with ROW FORMAT DELIMITED. As a best practice, use values in an E-notation format for the double data type column (+1.14000000000000E+000). You can use the ROUND function to work around some precision errors.
STRING
- If a value in the underlying storage exceeds the maximum VARCHAR length, that value is silently truncated when converted to the Db2 native data type.
- If the underlying data does not require the maximum VARCHAR length for storage (for example, if
the column never exceeds 100 characters), then Db2 allocates
unnecessary resources for the handling of that column. To improve performance, use a
VARCHAR(n) with a defined size instead of a STRING. Alter the table to define an
appropriate length for the column by using the following
syntax:
This alter operation changes the definition of the table in both Db2 and in Hive. If data in the VARCHAR(n) column exceedsALTER Datalake TABLE <schema>.<table> ALTER COLUMN <col> SET DATA TYPE VARCHAR(<size>)
<size>
, the data will be truncated.
TIMESTAMP
- The range of the hour part is 0 - 23.
- The range of the minute and second parts is 0 - 59.
The number of digits in the fractional seconds part is specified using a precision attribute. For regular Datalake tables, the precision can be a value between 0 and 9, with 9 being the default. For Iceberg Datalake tables, the precision can be a value between 0 and 6, with 6 being the default.
Timestamps are considered to be time zone-less. For convenience, functions are provided for converting values to and from specific time zones (TO_UTC_TIMESTAMP, FROM_UTC_TIMESTAMP).
If you are using the ORC file
format with historical TIMESTAMP values (values prior to 1582-10-15), and the ORC files were
inserted with Db2 or Hive on HDP 3.0, use
tblproperties('orc.proleptic.gregorian.default' = 'true')
in the CREATE Datalake
TABLE statement or the ALTER Datalake TABLE statement. If the ORC files were inserted with Db2 or Hive on HDP 2.6.5 (or earlier), use
tblproperties('orc.proleptic.gregorian.default' = 'false')
or leave it unset.
False is the default setting for
orc.proleptic.gregorian.default.
TINYINT
The TINYINT type in the context of the CREATE TABLE (HADOOP) statement is supported only to maintain compatibility with Hive. It is unavailable in other contexts. TINYINT is mapped to the Db2 SMALLINT type, but the values are constrained (-128 to 127) to match the Hive definition of a TINYINT.
Db2 data types not supported for Datalake tables
Not all Db2 data types are supported for Datalake tables. For more information, see Restrictions and limitations on Datalake tables.