Data types supported by Datalake tables

Db2® supports a number of data types for Datalake tables.

Table 1. Data types for regular Datalake tables
Datalake table data type 1 Db2 data type Hive data type 2 Notes
BIGINT BIGINT BIGINT This is a 64-bit signed integer. INT8 is a synonym for BIGINT.
BINARY VARBINARY BINARY (starting with Hive 0.8.0) BINARY is a synonym for VARBINARY. If BINARY is used, it will be treated as though VARBINARY were specified. If a length is not specified, 32,672 characters is assumed. It is recommended that an explicit length be specified.
BOOLEAN BOOLEAN BOOLEAN For more information, see Boolean values.
CHAR(n) CHAR(n) CHAR  
DATE DATE DATE For more information, see DATE.
DECIMAL(p,s) DECIMAL(p,s) DECIMAL (introduced in Hive 0.11.0 with a precision of 38 digits) For more information, see DECIMAL.
DOUBLE DOUBLE DOUBLE 64-bit IEEE float.
FLOAT(n) (FLOAT4) REAL FLOAT FLOAT4 is a synonym for REAL. Columns defined as FLOAT(n) are stored as float where n <= 24. 32-bit IEEE 754 floating point. For more information, see FLOAT and REAL.
FLOAT(n) (FLOAT8) DOUBLE DOUBLE FLOAT8 is a synonym for DOUBLE. Columns defined as FLOAT(n) are stored as double where n > 24. 64-bit IEEE 754 floating point. For more information, see FLOAT and REAL.
INTEGER INTEGER INT This is a 32-bit signed integer. INT4 is a synonym for INTEGER.
REAL REAL FLOAT 32-bit IEEE float. FLOAT4 is a synonym for REAL. For more information, see FLOAT and REAL.
SMALLINT SMALLINT SMALLINT 16-bit signed integer. INT2 is a synonym for SMALLINT.
STRING VARCHAR(32672) STRING Use of the STRING type in Db2 is discouraged because of the resulting increase in storage requirements and the potential for reduced performance. For more information, see STRING.
TIMESTAMP TIMESTAMP TIMESTAMP DATETIME is a synonym for TIMESTAMP.
TINYINT SMALLINT TINYINT For more information, see TINYINT.
[VAR]BINARY(n) [VAR]BINARY(n) BINARY BINARY is a synonym for VARBINARY. If BINARY is used, it will be treated as though VARBINARY were specified. If a length is not specified, 32,672 characters is assumed. It is recommended that an explicit length be specified.
VARCHAR(n) VARCHAR(n) VARCHAR The maximum length for a VARCHAR is 32672 bytes.
  1. This column lists the data types that you can specify in the CREATE DATALAKE TABLE statement.
  2. This column lists the corresponding data types as defined in the Hive metastore. For example, a column that is defined as FLOAT in the statement is listed as a DOUBLE column in the Hive catalogs, and table values for that column must conform to Hive (or SerDe) rules for DOUBLE values.
Table 2. Data types for Datalake Iceberg tables
Datalake table data type Db2 data type Iceberg data type Notes
BIGINT BIGINT long This is a 64-bit signed integer. INT8 is a synonym for BIGINT.
BINARY VARBINARY binary BINARY is a synonym for VARBINARY. If BINARY is used, it will be treated as though VARBINARY were specified. If a length is not specified, 32,672 characters is assumed. It is recommended that an explicit length be specified.
BOOLEAN BOOLEAN boolean Boolean values representing true or false. For more information, see Boolean values.
CHAR(n) CHAR(n) string Maximum length in Db2 is 255 characters. Longer values will be truncated.
DATE DATE date Calendar date without timezone or time. For more information, see DATE.
DECIMAL(p,s) DECIMAL(p,s) decimal(p,s) Maximum precision is 31, default is 10. Default scale is 0. For more information, see DECIMAL.
DOUBLE DOUBLE double  
FLOAT(n) (FLOAT4) REAL float FLOAT4 is a synonym for REAL. Columns defined as FLOAT(n) are stored as float where n <= 24. 32-bit IEEE 754 floating point. For more information, see FLOAT and REAL.
FLOAT(n) (FLOAT8) DOUBLE double FLOAT8 is a synonym for DOUBLE. Columns defined as FLOAT(n) are stored as double where n > 24. 64-bit IEEE 754 floating point. For more information, see FLOAT and REAL.
INTEGER (INT, INT4) INTEGER int This is a 32-bit signed integer. INT4 is a synonym for INTEGER.
REAL REAL float 32-bit IEEE 754 floating point.
SMALLINT (INT2) INTEGER int 32-bit signed integer.
STRING VARCHAR(32672) string Use of the STRING type in Db2 is discouraged because of the resulting increase in storage requirements and the potential for reduced performance. For more information, see STRING.
TIMESTAMP(s) TIMESTAMP(s) timestamp Timestamp without timezone. For mor information, see TIMESTAMP.
TINYINT INTEGER int This is a 32-bit signed integer.
VARBINARY(n) VARBINARY(n) binary Arbitrary-length byte array. Maximum length in Db2 is 32672 bytes. No practical limit length in Iceberg - longer values are truncated.
VARCHAR(n) VARCHAR(n) string Maximum length in Db2 is 32672 characters. No practical limit length in Iceberg - longer values are truncated.

DATE

When a Db2 Datalake table is created with a column of type DATE, this column is mapped to the Hive DATE type.

A date is a three-part value (year, month, and day):
  • The range of the month part is 1 - 12.
  • The range of the day part is 1 - x, where x is 28, 29, 30, or 31, and depends on the month.
  • The range of the year part is 0001 to 9999.

If you are using the ORC file format with historical DATE values (values prior to 1582-10-15), and the ORC files were inserted with Db2 or Hive on HDP 3.0, use tblproperties('orc.proleptic.gregorian.default' = 'true') in the CREATE Datalake TABLE statement or the ALTER Datalake TABLE statement. If the ORC files were inserted with Db2 or Hive on HDP 2.6.5 (or earlier), use tblproperties('orc.proleptic.gregorian.default' = 'false') or leave it unset. False is the default setting for orc.proleptic.gregorian.default.

DECIMAL

The DECIMAL data type is a numeric data type with fixed scale and precision. The precision represents the total number of digits that can be represented by the column. Therefore, +12.34 and -1.234 are both defined with a precision of 4. The scale is the number of fractional digits, and must be less than or equal to the precision.

The maximum precision value is 31. Therefore, the largest integral value is represented by DECIMAL(31,0) and contains the value 9 repeated 31 times. The most precise fractional value between 0 and 1 or between 0 and -1 is represented by DECIMAL(31,31), with 31 digits to the right of the decimal point. The closest value to 0 has 30 zeros and a 1 following the decimal point. The closest value to 1 has 31 nines following the decimal point.

Important: Hive supports a DECIMAL data type with a maximum precision value of 38. Db2 tolerates a DECIMAL(38,0) data type in CREATE DATALAKE TABLE and ALTER DATALAKE TABLE statements for Datalake tables, and maps it to a DECIMAL(31,0) data type. This toleration support is limited to DECIMAL(38,0) only. Db2 does not handle data that exceeds a precision value of 31. When the definition of a Hive table is imported into Db2 , any column that is defined as DECIMAL(38,0) is changed to DECIMAL(31,0). If any data in the table exceeds a precision of 31, a null value is returned.

Db2 converts between DECIMAL and other numeric types. For example, a DECIMAL value with zero scale is converted to the smallest appropriate integer type.

FLOAT and REAL

Db2 defines a REAL data type as a 32-bit floating point value. The definition of a FLOAT data type is a synonym for DOUBLE.

In Hive, a FLOAT always refers to a 32-bit floating point value, which corresponds to the JAVA FLOAT data type. In Db2, the REAL data type is a synonym for the Hive FLOAT data type.

To migrate Hive and Db2 applications, complete the following actions:
  • For CREATE Datalake TABLE statements, change all columns defined as FLOAT to REAL to retain the same storage characteristics.
  • When you access Hive or Db2 Datalake tables, FLOAT values are treated as DOUBLE values when read, which can produce minor rounding differences.

You can see precision errors when you do operations with the DOUBLE data type column and DECIMAL literals in a table defined with ROW FORMAT DELIMITED. As a best practice, use values in an E-notation format for the double data type column (+1.14000000000000E+000). You can use the ROUND function to work around some precision errors.

STRING

The STRING data type is mapped directly to Hive STRING. The native Db2 data type is VARCHAR(max). This has several implications:
  • If a value in the underlying storage exceeds the maximum VARCHAR length, that value is silently truncated when converted to the Db2 native data type.
  • If the underlying data does not require the maximum VARCHAR length for storage (for example, if the column never exceeds 100 characters), then Db2 allocates unnecessary resources for the handling of that column. To improve performance, use a VARCHAR(n) with a defined size instead of a STRING. Alter the table to define an appropriate length for the column by using the following syntax:
    ALTER Datalake TABLE <schema>.<table> ALTER COLUMN <col> SET DATA TYPE VARCHAR(<size>)
    This alter operation changes the definition of the table in both Db2 and in Hive. If data in the VARCHAR(n) column exceeds <size>, the data will be truncated.

TIMESTAMP

The TIMESTAMP data type is a six- or seven-part value (year, month, day, hour, minute, second and, optionally, fractional seconds). See DATE data type for more information on the date portion of the timestamp (year, month, day). The TIME portion, is a three-part value (hour, minute, and second) designating a time of day under a 24-hour clock:
  • The range of the hour part is 0 - 23.
  • The range of the minute and second parts is 0 - 59.

The number of digits in the fractional seconds part is specified using a precision attribute. For regular Datalake tables, the precision can be a value between 0 and 9, with 9 being the default. For Iceberg Datalake tables, the precision can be a value between 0 and 6, with 6 being the default.

Timestamps are considered to be time zone-less. For convenience, functions are provided for converting values to and from specific time zones (TO_UTC_TIMESTAMP, FROM_UTC_TIMESTAMP).

If you are using the ORC file format with historical TIMESTAMP values (values prior to 1582-10-15), and the ORC files were inserted with Db2 or Hive on HDP 3.0, use tblproperties('orc.proleptic.gregorian.default' = 'true') in the CREATE Datalake TABLE statement or the ALTER Datalake TABLE statement. If the ORC files were inserted with Db2 or Hive on HDP 2.6.5 (or earlier), use tblproperties('orc.proleptic.gregorian.default' = 'false') or leave it unset. False is the default setting for orc.proleptic.gregorian.default.

TINYINT

The TINYINT type in the context of the CREATE TABLE (HADOOP) statement is supported only to maintain compatibility with Hive. It is unavailable in other contexts. TINYINT is mapped to the Db2 SMALLINT type, but the values are constrained (-128 to 127) to match the Hive definition of a TINYINT.

Db2 data types not supported for Datalake tables

Not all Db2 data types are supported for Datalake tables. For more information, see Restrictions and limitations on Datalake tables.