Preparing the configuration XML file for the Hadoop file formats

Db2® Warehouse supports conversion to the following Hadoop-specific file formats: Parquet, Avro, ORC, RCFile, SEQUENCEFILE. You can import the files from Db2 Warehouse and store them on Hadoop in a format of your choice.

Procedure

  1. Edit the fq-import-remote-conf.xml template.
  2. Set the fq.data.format property using one of the options: PARQUET, ORC, RCFILE, AVRO, SEQUENCEFILE
    <property>
            <name>fq.data.format</name>
            <value>PARQUET</value>
        </property>
  3. Set the fq.output.compressed parameter to select compression type. If you set the property to false or leave it empty, the default compression type that is specified on Hadoop for the selected format will be used.
    Depending on the format that you use, select one of the following values:
    • PARQUET: Snappy, gzip, uncompressed
    • ORC: NONE, ZLIB, SNAPPY
    • RCFILE: The value has to contain the exact class name of the codec which is available on Hadoop system. For example: org.apache.hadoop.io.compress.SnappyCodec
    • AVRO: snappy, deflate
    • SEQUENCEFILE: The value has to contain the exact class name of the codec which is available on Hadoop system. For example: org.apache.hadoop.io.compress.SnappyCodec
  4. Because mixed mode of transfer is not supported when using Hadoop formats, the fq.compress property must be set to false or empty.
  5. Save the XML file and take note of the file path.

Results

You can now run the import task as described in Configuring and running data movement.