Exporting a Hive table that was not imported from Db2 Warehouse

You can export a Hive table from Hadoop to Db2® Warehouse, even if the table was not previously imported from Db2 Warehouse, so no metadata exists for this table. You can also use the SQL WHERE clause when exporting to narrow down the results.

Before you begin

This type of export does not require any metadata information. The table schema is created based on the Hive table description. All existing export modes (create, overwrite, append) are supported, however for create and append modes, if the table already exists on Db2 Warehouse, it must have the same schema as the schema created for Hive table.

The following table presents data type conversion from Hive to Db2 Warehouse:
Table 1. Data type conversion
Hive type Db2 Warehouse type
INT INTEGER
SMALLINT SMALLINT
TINYINT BYTEINT
BIGINT BIGINT
DECIMAL (p,s)
Note: See Limitations following the table.
NUMERIC (p,s)
STRING NATIONAL CHARACTER VARYING (max(length()))
Note: String data type is converted to NVARCHAR on Db2 Warehouse. The length is calculated as the maximum length of all values inside the column.
CHAR (x) NATIONAL CHARACTER (x)
VARCHAR (x) NATIONAL CHARACTER VARYING (x)
BOOLEAN BOOLEAN
DOUBLE DOUBLE PRECISION
FLOAT REAL
DATE DATE
TIMESTAMP TIMESTAMP
Limitations
  • You can only move one table at a time. The fq.tables parameter cannot be used to specify more than one tables.
  • DECIMAL(Precision, Scale) data type is only supported for Hive versions 0.13 and above. Versions prior to 0.13 (for example, on BigInsights 3) use DECIMAL with other structure, and export of a table with column type DECIMAL fails with error.
  • This method of export only works in remote mode, so JDBC access to Hive is required. Also, you must use fq-export-remote-conf.xml template for configuration, and specify the JDBC properties there.
  • Db2 Warehouse only supports UTF-8 coding.
  • The following characters are not supported in column name of table which is to be exported:
    : (colon)
    . (full stop)
    ` (acute accent)
    , (comma)
    Behavior may vary depending on your Hadoop distribution or Hive version, but most likely export of such table will fail.
  • If the Hive table that you want to export contains text data, from performance point of view it might be more efficient to use the method described in Exporting text files from Hadoop.

About this task

You can run the export operation from Hadoop using the fdm.sh script. Remember to use the remote mode, and fq-export-remote-conf.xml template.

Procedure

  1. Create a copy of the fq-export-remote-conf.xml template and edit it as required.
  2. Run the fdm.sh script.
    You must specify at least the following parameters:
    • fq.pda.db is a destination database on Db2 Warehouse
    • fq.input.path must be listed in the command and the value must be empty,
    • fq.hive.tablename is the name of the Hive table that you want to export.
    Optionally, you can also set the following parameters related to Hive table export:
    • If you want to change the target table name, use fq.table to specify it. Otherwise they table name at destination is the same as for the source table.
    • If you want to filter the data on the Hive table with an SQL WHERE clause before exporting it, you can use the fq.hive.where property.
    • If you want to specify an explicit distribution key for the exported table, provide one or more column names as values for fq.hive.set.distribute.on. By default, the value is empty and distribution is random.
    • You can set fq.skip.table.structure.validation=true if you want to disable the schema validation between source and destination tables. By default, schema on NPS is validated and it must contain NVARCHAR, not VARCHAR. Moreover, string length on Hive is calculated during export, as all string fields in Hive table are mapped to NVARCHAR with size set to max(length(col)). However, if you know that the strings in your Hive data can be converted to VARCHAR, you can set fq.skip.table.structure.validation=true, which means table structure validation and string column length calculation will be omitted.