Support for Hadoop-specific file formats

The data imported from Db2® Warehouse can be stored on Hadoop in one of the following compressed file formats: Parquet, Avro, ORC, RCFile, SEQUENCEFILE. You can also export data that is stored in these Hadoop-specific formats to Db2 Warehouse.

Note: There are several limitations to the functionality. Many of the limitations are a consequence of Hive and Parquet/Avro/ORC/RCFile dependency. It is strongly recommended to verify data consistency after import to Hadoop formats. Unlike for text transfer, data consistency cannot be guaranteed for Hadoop formats because of these dependencies. For a list of known limitations, read Troubleshooting import and export of Hadoop formats.

You can run data movement from Hadoop and you must use a JDBC driver to connect to a Hive server. You can use the remote import (or export) configuration XML templates to configure data movement with new formats.

You specify one of Hadoop formats using the fq.data.format property.

You can also select the type of compression for the format of your choice using fq.output.compressed property.

Mixed mode of transfer is not supported for the Hadoop-specific formats, so when using these formats, you must either delete the fq.compress property, leave it empty or set it to false.

Detailed configuration steps for Hadoop formats are described in t_hdp_move_formats.html#task_lyz_jnx_55.

When you do not want to use the Hadoop formats, you can leave the fq.data.format property empty.