Exporting text files from Hadoop

You can export text files from Hadoop to Db2® Warehouse, even if those files were not previously imported from Db2 Warehouse.

About this task

If you have text files on Hadoop that fit the existing table structure on Db2 Warehouse, you can transfer these files to Db2 Warehouse. If such structure does not exist, you can create the target table on Db2 Warehouse that will match the structure of text data on Hadoop, and then transfer the files.

You can either run the export operation from Hadoop, using the fdm.sh script.

Procedure

  1. Verify the structure of your text data on Hadoop:
    • If the table exists in Hive, run the describe formatted command. The output contains details on your Hive table definition.
    • If the table does not exist in Hive, you must check HDFS location and the structure of the text files, such as fields, separators, null representation.
  2. On Db2 Warehouse, locate, or manually create a table to which you will export the data.
    Note:
    1. Table structure must be the same as the data structure in the text file on Hadoop.
    2. You must verify the value for field delimiter (fq.format.fielddelim), and the representation of NULL for text columns (fq.format.null). You can find this information in Hive table definition, or in text files structure.
  3. On Hadoop, edit the available template for export configuration XML, and specify at least the following values:
    • fq.input.path - This parameter must point to the path where the text files are located on Hadoop;
    • fq.table - This parameter must point to the target table on Db2 Warehouse.
    • fq.format.fielddelim - Look for field.delim information in the Hive table definition. In the configuration XML you must specify an ASCII code for the character. If there is no specification for field.delim on Hive, use default value, that is 1.
    • fq.format.null - Look for serialization.null.format in Hive table definition and provide the same value in the XML file. If it is not specified in the output, set the value to \N, which is default for Hadoop.
  4. Specify any other required parameters and save the XML file.
  5. On Hadoop, run the command:
    fdm.sh -conf <configuration_file>.xml
    Tip: If required, you can overwrite any parameters included in the configuration file with the -D parameter, for example:
    fdm.sh -conf fq-export-conf.xml -D fq.table=test -D fq.format.fielddelim=124

Results

Text files are transferred from Hadoop to the specified Db2 Warehouse table.