Data import configuration XML

Learn how to customize the XML file which configures importing data from data warehouse to Hadoop.

General configuration

  • The fq.command property sets the type of data movement: import (warehouse → Hadoop) or export (Hadoop → warehouse).
    <property>
    	<name>fq.command</name>
    	<value>import</value>
    </property>

HDFS information

  • The fq.output.path property sets the directory on HDFS where the transferred data is stored.
    <property>
    	<name>fq.output.path</name>
    	<value>/dbbackup/backup1</value>
    </property>
  • The fq.format.fielddelim property sets the ASCII value of the single character field delimiter in the plain text output file.
    <property>
    	<name>fq.format.fielddelim</name>
    	<value>124</value>
    </property>
    Note: It is not recommended to set value 32 (space) as the value in this property.
  • The fq.format.null property defines how the NULL value will be represented. Default is NULL. The value cannot be longer than 4 characters.
    <property>
    	<name>fq.format.null</name>
    	<value>NULL</value>
    </property>
  • The fq.fs.temp property sets the location of temporary files (such as logs and status files) on HDFS.
    <property>
    	<name>fq.fs.temp</name>
    	<value>/tmp</value>
    </property>
  • The fq.hive.schema property sets the target schema name in Hive under which all imported tables are created.
    <property>
    	<name>fq.hive.schema</name>
    	<value></value>
    </property>
    If the property does not exist, it is automatically created. If the property is not set, the default schema is used.
  • The fq.hive.tablename property is optional. It defines the output table name in Hive. If not provided, source table name is used. Use this property especially when importing a table with special characters in its name to Hive version lower than 0.13.0.
    <property>
    	<name>fq.hive.tablename</name>
    	<value>Test</value>
    </property>

Compression options

  • Option fq.compress must not be set as these compression modes are currently not supported.
  • The fq.output.compressed property defines whether the transferred data is stored on Hadoop in compressed format. This property is used with fq.data.format with Hadoop-specific formats. Depending on the format that you use, select one of the following values:
    • PARQUET: Snappy, gzip, uncompressed
    • ORC: NONE, ZLIB, SNAPPY
    • RCFILE: The value has to contain the exact class name of the codec which is available on Hadoop system. For example: org.apache.hadoop.io.compress.SnappyCodec
    • AVRO: snappy, deflate
    • SEQUENCEFILE: The value has to contain the exact class name of the codec which is available on Hadoop system. For example: org.apache.hadoop.io.compress.SnappyCodec
    You can also set the property to false or leave it empty, and the default compression type that is specified on Hadoop for the selected format will be used.
    <property>
    	<name>fq.output.compressed</name>
    	<value>false</value>
    </property>
  • The fq.data.format defines in which format the transferred data is stored on Hadoop. Supported values for this property are:
    • TXT - for data warehouse strict formats.
    • PARQUET, ORC, RCFILE, AVRO, SEQUENCEFILE - for Hadoop-specific formats.
    <property>
    	<name>fq.data.format</name>
    	<value>TXT</value>
    </property>

Warehouse information

  • The fq.pda.db property sets the data warehouse database name. Include double quotations around delimited database names.
    <property>
    	<name>fq.pda.db</name>
    	<value>bludb</value>
    </property>
  • The fq.tables property provides a comma-separated list of warehouse tables that are to be imported. Include double quotations around delimited table names.
    <property>
    	<name>fq.tables</name>
    	<value>ADMIN.tab</value>
    </property>
    The format of this value is <SCHEMA>.<table>. You can use an asterisk (*) character in the fq.tables property as a filter, for a combination of multiple schemas and tables, for example:
    • S1.T1 - imports table T1 in schema S1
    • S1.* - imports all tables from schema S1
    • *.T1 - imports all tables T1 from all schemas
    • * - imports all tables from the current schema for the user that is currently logged in
    Note that you cannot use * as part of the schema or table name for filtering, for example: S1.a*, meaning "all tables in schema S1, with name starting with a". The system cannot interpret such a filter, and a* is interpreted as the exact table name.
  • The fq.pda.server property sets the wall IP address or the fully qualified host name of the warehouse server.
    <property>
    	<name>fq.pda.server</name>
    	<value>hostname.ibm.com</value>
    </property>
  • The fq.pda.port property sets the port number for the warehouse database instance NZ_DBMS_PORT.
    <property>
    	<name>fq.pda.port</name>
    	<value>50000</value>
    </property>
  • The fq.pda.user property sets the warehouse database user account name for access to the database.
    <property>
    	<name>fq.pda.user</name>
    	<value>bluadmin</value>
    </property>
  • The fq.pda.password property sets the password for the warehouse database user account.
    <property>
    	<name>fq.pda.password</name>
    	<value>password</value>
    </property>
  • The fq.pda.ssl.cacertificate property sets the full path to the CA Certificate file that is stored on HDFS and used to authenticate connections. Used only when the SSL flag is true. If not provided, then all connections are accepted.
    <property>
    	<name>fq.pda.ssl.cacertificate</name>
    	<value></value>
    </property>
  • The fq.pda.where property specifies the SQL WHERE clause that is used for selecting the data to transfer. This property only works for import, not export.
    <property>
    	<name>fq.pda.where</name>
    	<value></value>
    </property>
  • The fq.splits property sets the number of concurrent JDBC load sessions to the warehouse host.
    <property>
    	<name>fq.splits</name>
    	<value>12</value>
    </property>
  • The fq.custom.exttab.options is an optional parameter for advanced users. It allows to provide user defined options to external tables. All possible external table options and values are listed in CREATE EXTERNAL TABLE. You can provide multiple pairs of external table option and its value, separated by a space.
    Note: The options provided with the fq.custom.exttab.options parameter overwrite the corresponding ones that might exist in the Fast Data Movement code.
    <property>
    	<name>fq.custom.exttab.options</name>
    	<value>MaxErrors 1 SocketBufSize 8000000</value>
    </property>
  • The fq.break.on.error property, when set to true, stops import execution in case of error during the import of more than one table or whole database. When the import is finished, information about transferred tables is shown.
    <property>
    	<name>fq.break.on.error</name>
    	<value>false</value>
    </property>
  • The fq.append.mode property specifies how data is transferred to the specified HDFS location. The property takes the following values:
    • create During import, new table is created in the target database. If the table with the same name already exists, import fails.
    • overwrite During import, a table from the source database is imported into the target database, and if any table with the same name already existed in the target database, it is replaced (overwritten).
    • append The data from the source table is imported into the target table and appended to the already existing records. That is, if a table with the same name already exists in the target database, all data from the source table is added (appended) to the existing table.
    For more information on append modes, read Append modes for import and export.
    Default value for this property is create.
    <property>
    	<name>fq.append.mode</name>
    	<value>overwrite</value>
    </property>
  • The fq.sql.metadata property defines whether IBM® Fast Data Movement will create the table in Hive during import. By default, it is set to true.
    <property>
    	<name>fq.sql.metadata</name>
    	<value>true</value>
    </property>
    If you set the property to false, the table will not be created in Hive and IBM Fast Data Movement will only import data files and put them on HDFS.

Hive options

  • The fq.hive.usevarchar property determines the rules of conversion for VARCHAR column. VARCHAR type is supported by Hive version 0.12 or later. If the property is set to true then data type is VARCHAR after import. If false, the data type is converted to STRING. Default value is false because there is a number of limitations when VARCHAR is used, as described in Limitations on VARCHAR data types.
    <property>
            <name>fq.hive.usevarchar</name>
            <value>false</value>
        </property> 
  • The fq.hive.usedate property determines the rules of conversion for DATE column. For Hive versions 0.12 and above, the DATE can be mapped to Hive DATE instead of STRING when the parameter is set to true. By default the value is set to false.
    <property>
            <name>fq.hive.usedate</name>
            <value>false</value>
        </property> 
  • The fq.hive.clustered.by parameter lists table column(s) which will be used for clustering. To enable default clustering, keep this property empty and set fq.hive.cluster.buckets.
    <property>
            <name>fq.hive.clustered.by</name>
            <value></value>
        </property>
  • The fq.hive.clustered.buckets parameter provides an integer value which determines the amount of buckets that can be created during clustering. This property must be set to enable clustering.
    <property>
            <name>fq.hive.clustered.buckets</name>
            <value></value>
        </property>
  • The fq.hive.partitioned.by parameter lists columns in a table that will be used for partitioning.
        <property>
            <name>fq.hive.partitioned.by</name>
            <value></value>
        </property>

Hadoop JDBC connection (remote connection) properties

  • The fq.sql.server property sets the Hive server address on Hadoop where the imported table is created.
    <property>
    	<name>fq.sql.server</name>
    	<value>rack1-master</value>
    </property>
  • The fq.sql.port property sets the Hive server port number on Hadoop.
    <property>
    	<name>fq.sql.port</name>
    	<value>10000</value>
    </property>
  • The fq.sql.type property sets the server type. Supported types are hive2 or bigsql.
    <property>
    	<name>fq.sql.type</name>
    	<value>hive2</value>
    </property>
    Note: Use this property for remote import, not export.

JDBC authentication properties

Note: You must provide values for either user and password or for the Kerberos service principal name.
  • The fq.sql.user property sets the user name. It is required if you want to use a User/Password authentication.
    <property>
    	<name>fq.sql.user</name>
    	<value>biadmin</value>
    </property>
  • The fq.sql.password property sets the password. It is required if user name was provided.
    <property>
    	<name>fq.sql.password</name>
    	<value>passw0rd</value>
    </property>

JDBC SSL properties

  • The fq.sql.ssl property defines whether SSL is required to connect to the selected Hadoop SQL server. Value can be true or false.
    <property>
    	<name>fq.sql.ssl</name>
    	<value>false</value>
    </property>
  • The fq.sql.sslTrustStore property sets the path to the SSL trustStore that is to be used.
    <property>
    	<name>fq.sql.sslTrustStore</name>
    	<value>$HIVE_HOME/src/data/files/cacerts_test.jks</value>
    </property>
  • The fq.sql.sslTrustStorePassword property sets the password to the specified SSL trustStore.
    <property>
    	<name>fq.sql.sslTrustStorePassword</name>
    	<value>passw0rd</value>
    </property>

Hadoop remote execution settings

  • The fq.hadoop.remote.user parameter is used when running data movement from a system different than Hadoop. It specifies the remote Hadoop user name that will run data movement. This user should have access to HDFS.
    <property>
            <name>fq.hadoop.remote.user</name>
            <value></value>
        </property>
  • The fq.hadoop.client.configs parameter is used when running data movement from a system different than Hadoop. It specifies the path to the downloaded client configuration files.
        <property>
            <name>fq.hadoop.client.configs</name>
            <value></value>
        </property>
    

JDBC Kerberos properties

  • The fq.kerberos.principal property sets the Kerberos principal name. Required if Kerberos authentication should be used.
    <property>
            <name>fq.kerberos.principal</name>
            <value>hdfs/rack1-master.hadoop.ibm.com@REALM.IBM.COM</value>
        </property>
  • The fq.kerberos.keytab property sets the Kerberos keytab. Required if Kerberos authentication should be used.
    <property>
            <name>fq.kerberos.keytab</name>
            <value>/path/to/your/client.keytab</value>
        </property> 
  • The fq.sql.spn property sets the Kerberos service principal name. It is required if you want to use Kerberos authentication.
    <property>
    	<name>fq.sql.spn</name>
    	<value>hive/horton-master.ibm.com@XXXXXX.IBM.COM</value>
    </property>

Additional settings

  • The fq.db2.lock.table property provides assurance of data consistency between source and destination tables. When set to true, no user can change (insert or update) the source table during data movement. When set to false, such lock is disabled. fq.db2.lock.table is set to true by default.
    <property>
    <name>fq.db2.lock.table</name>
    <value>true</value>
    </property>
Note: When using BigSQL, you might need to add the following property to the fq-import-remote-conf.xml file:
<property>
<name>fq.sql.bigsql.v3compatibility</name>
<value>true</value>
</property>
The property solves the problem with mapping a NUMERIC field, as described in Limitations on decimal values in BigInsights.