Data export configuration XML

Learn how to customize the XML file which configures exporting data from Hadoop to Db2® Warehouse.

General configuration

  • The fq.command property sets the type of data movement: import (Db2 Warehouse->Hadoop) or export (Hadoop->Db2 Warehouse).
    <property>
    	<name>fq.command</name>
    	<value>export</value>
    </property>

HDFS information

  • The fq.input.path parameter sets the directory on HDFS where the retrieved data is stored.
    <property>
    	<name>fq.input.path</name>
    	<value>/dbbackup/fqtest1</value>
    </property>
    Note: The properties fq.input.path and fq.hive.tablename are exclusive, that is only one of them might be used to define the location of the data that you want to export. fq.input.path provides more flexibility, as fq.hive.tablename allows you to specify one table only, not multiple tables.
  • The fq.format.fielddelim parameter sets the integer value of the single character field delimiter in the plain text output file.
    <property>
    	<name>fq.format.fielddelim</name>
    	<value>124</value>
    </property>
  • The fq.format.null property defines how the NULL value will be represented. Default is NULL. The value cannot be longer than 4 characters.
    <property>
    	<name>fq.format.null</name>
    	<value>NULL</value>
    </property>
  • The fq.fs.temp parameter sets the location of temporary files (such as logs and status files) on HDFS.
    <property>
    	<name>fq.fs.temp</name>
    	<value>/tmp</value>
    </property>

Warehouse information

  • The fq.pda.db property sets the warehouse database name. Include double quotations around delimited database names.
    <property>
    	<name>fq.pda.db</name>
    	<value>bludb</value>
    </property>
  • The fq.table parameter can be used to rename a specific table during export so that the exported table on Db2 Warehouse has a different name than the source table on Hadoop. Include double quotations around delimited table names. The format of this value is <SCHEMA>.<table>.
    <property>
    	<name>fq.table</name>
    	<value>ADMIN.tab</value>
    </property>
  • The fq.tables property provides a list of target tables for the exported data, but it also allows to filter these tables with the use of a wildcard character *. The use of the * filter is described in detail in Importing or exporting multiple tables at a time. You must include double quotations around delimited table names.
  • The fq.append.mode property defines how data is transferred to the specified HDFS location. The property takes the following values:
    • create During import, new table is created in the target database. If the table with the same name already exists, import fails.
    • overwrite During import, a table from the source database is imported into the target database, and if any table with the same name already existed in the target database, it is replaced (overwritten).
    • append The data from the source table is imported into the target table and appended to the already existing records. That is, if a table with the same name already exists in the target database, all data from the source table is added (appended) to the existing table.
    For more information on append modes, read Append modes for import and export.
    Default value for this property is create.
    <property>
    	<name>fq.append.mode</name>
    	<value>create</value>
    </property>
  • The fq.pda.server property sets the wall IP address or the fully qualified host name of the warehouse server.
    <property>
    	<name>fq.pda.server</name>
    	<value>hostname.ibm.com</value>
    </property>
  • The fq.pda.port property sets the port number for the warehouse database instance NZ_DBMS_PORT.
    <property>
    	<name>fq.pda.port</name>
    	<value>50000</value>
    </property>
  • The fq.pda.user property sets The warehouse database user account name for access to the database.
    <property>
    	<name>fq.pda.user</name>
    	<value>bluadmin</value>
    </property>
  • The fq.pda.password property sets the password for the warehouse database user account.
    <property>
    	<name>fq.pda.password</name>
    	<value>password</value>
    </property>
  • The fq.exttab.columns parameter sets the external table column names in proper order, for example CODE, TITLE, PRICE. The default value is *. All possible external table options and values are listed in CREATE EXTERNAL TABLE.
    <property>
    	<name>fq.exttab.columns</name>
    	<value>*</value>
    </property>
  • The fq.exttab.schema parameter sets the external table schema definition, for example CODE CHAR(5), TITLE VARCHAR(255), PRICE INTEGER.
    <property>
    	<name>fq.exttab.schema</name>
    	<value></value>
    </property>
    The default value is the empty string, which uses the schema of the target table. For more information on external tables see CREATE EXTERNAL TABLE.
  • The fq.custom.exttab.options is an optional parameter for advanced users. It allows to provide user defined options to external tables. All possible external table options and values are listed in CREATE EXTERNAL TABLE. You can provide multiple pairs of external table option and its value, separated by a space.
    Note: The options provided with the fq.custom.exttab.options parameter overwrite the corresponding ones that might exist in the Fast Data Movement code.
    <property>
    	<name>fq.custom.exttab.options</name>
    	<value>MaxErrors 1 SocketBufSize 8000000</value>
    </property>
  • The fq.splits property sets the number of concurrent JDBC load sessions to the warehouse host.
    <property>
    	<name>fq.splits</name>
    	<value>12</value>
    </property>

Hadoop Hive SQL connection configuration

JDBC connection properties

  • The fq.sql.server property sets the Hive server address on Hadoop where the imported table is created.
    <property>
    	<name>fq.sql.server</name>
    	<value>rack1-master</value>
    </property>
  • The fq.sql.port property sets the Hive server port number on Hadoop.
    <property>
    	<name>fq.sql.port</name>
    	<value>10000</value>
    </property>
  • Import only: The fq.sql.type property sets the server type. Supported types are hive2 or bigsql.
    <property>
    	<name>fq.sql.type</name>
    	<value>hive2</value>
    </property>
    Note: Use this property for remote import, not export.

JDBC authentication properties

Note: You must provide values for either user and password or for the Kerberos service principal name.
  • The fq.sql.user property sets the user name. It is required if you want to use a User/Password authentication.
    <property>
    	<name>fq.sql.user</name>
    	<value>biadmin</value>
    </property>
  • The fq.sql.password property sets the password. It is required if user name was provided.
    <property>
    	<name>fq.sql.password</name>
    	<value>passw0rd</value>
    </property>

JDBC SSL properties

  • The fq.sql.ssl property defines whether SSL is required to connect to the selected Hadoop SQL server. Value can be true or false.
    <property>
    	<name>fq.sql.ssl</name>
    	<value>false</value>
    </property>
  • The fq.sql.sslTrustStore property sets the path to the SSL trustStore that is to be used.
    <property>
    	<name>fq.sql.sslTrustStore</name>
    	<value>$HIVE_HOME/src/data/files/cacerts_test.jks</value>
    </property>
  • The fq.sql.sslTrustStorePassword property sets the password to the specified SSL trustStore.
    <property>
    	<name>fq.sql.sslTrustStorePassword</name>
    	<value>passw0rd</value>
    </property>

JDBC Kerberos properties

  • The fq.kerberos.principal property sets the Kerberos principal name. Required if Kerberos authentication should be used.
    <property>
            <name>fq.kerberos.principal</name>
            <value>hdfs/rack1-master.hadoop.ibm.com@REALM.IBM.COM</value>
        </property>
  • The fq.kerberos.keytab property sets the Kerberos keytab. Required if Kerberos authentication should be used.
    <property>
            <name>fq.kerberos.keytab</name>
            <value>/path/to/your/client.keytab</value>
        </property> 
  • The fq.sql.spn property sets the Kerberos service principal name. It is required if you want to use Kerberos authentication.
    <property>
    	<name>fq.sql.spn</name>
    	<value>hive/horton-master.ibm.com@XXXXXX.IBM.COM</value>
    </property>

Hadoop remote execution settings

  • The fq.hadoop.remote.user parameter is used when running data movement from a system different than Hadoop. It specifies the remote Hadoop user name that will run data movement. This user should have access to HDFS.
    <property>
            <name>fq.hadoop.remote.user</name>
            <value></value>
        </property>
  • The fq.hadoop.client.configs parameter is used when running data movement from a system different than Hadoop. It specifies the path to the downloaded client configuration files.
        <property>
            <name>fq.hadoop.client.configs</name>
            <value></value>
        </property>
    

Additional settings

  • The fq.hive.tablename parameter provides the name of the source table on Hive when exporting a Hive table . When this value is provided, the fq.input.path and fq.tables must be empty. For more information see Exporting a Hive table that was not imported from Db2 Warehouse
        <property>
            <name>fq.hive.tablename</name>
            <value></value>
        </property>
  • The fq.hive.set.distribute.on parameter is used to specify data distribution key by column name(s). Can be used in case of Hive table export. Empty string is default, which means that data will be distributed randomly.
       <property>
            <name>fq.hive.set.distribute.on</name>
            <value></value>
        </property>
  • The fq.hive.where parameter can be used to define an SQL WHERE clause to use for selecting the data to transfer during export of a Hive table. This parameter only works when fq.hive.tablename is set.
  • The fq.hive.use.where.in.length.discovery parameter specifies if string length in schema should be defined with the fq.hive.where parameter when exporting a Hive table for the first time. The parameter is set to false by default, that is, string length is calculated for all records in the Hive table during schema preparation, not restricted by the WHERE clause. When set to true, only the records from the WHERE clause are taken into consideration for a given table. Note that when you later change the WHERE clause to include longer records, and export the data to the same Netezza table again, then the export will fail.
    <property>
            <name>fq.hive.use.where.in.length.discovery</name>
            <value>false</value>
        </property>
  • The fq.skip.table.structure.validation parameter is used in case of exporting a Hive table to the already existing destination table in warehouse. When this value is set to true, the validation between the structures of source Hive table and destination warehouse table is omitted. Default value is false.
    <property>
            <name>fq.skip.table.structure.validation</name>
            <value>false</value>
        </property>