Data import configuration XML
Learn how to customize the XML file which configures importing data from data warehouse to Hadoop.
General configuration
- The fq.command property sets the type of data movement: import (warehouse →
Hadoop) or export (Hadoop →
warehouse).
<property> <name>fq.command</name> <value>import</value> </property>
HDFS information
- The fq.output.path property sets the directory on HDFS where the
transferred data is
stored.
<property> <name>fq.output.path</name> <value>/dbbackup/backup1</value> </property>
- The fq.format.fielddelim property sets the ASCII value of the single
character field delimiter in the plain text output
file.
<property> <name>fq.format.fielddelim</name> <value>124</value> </property>
Note: It is not recommended to set value 32 (space) as the value in this property. - The fq.format.null property defines how the NULL value will be represented.
Default is NULL. The value cannot be longer than 4
characters.
<property> <name>fq.format.null</name> <value>NULL</value> </property>
- The fq.fs.temp property sets the location of temporary files (such as logs
and status files) on
HDFS.
<property> <name>fq.fs.temp</name> <value>/tmp</value> </property>
- The fq.hive.schema property sets the target schema name in Hive under
which all imported tables are created.
If the property does not exist, it is automatically created. If the property is not set, the default schema is used.<property> <name>fq.hive.schema</name> <value></value> </property>
- The fq.hive.tablename property is optional. It defines the output table
name in Hive. If not provided, source table name is used. Use this property especially when
importing a table with special characters in its name to Hive version lower than
0.13.0.
<property> <name>fq.hive.tablename</name> <value>Test</value> </property>
Compression options
- Option fq.compress must not be set as these compression modes are currently not supported.
- The fq.output.compressed property defines whether the transferred data is
stored on Hadoop in compressed format.
This property is used with fq.data.format with Hadoop-specific formats.
Depending on the format that you use, select one of the following values:
- PARQUET:
Snappy, gzip, uncompressed
- ORC:
NONE, ZLIB, SNAPPY
- RCFILE: The value has to contain the exact class name of the codec which is available on Hadoop
system. For example:
org.apache.hadoop.io.compress.SnappyCodec
- AVRO:
snappy, deflate
- SEQUENCEFILE: The value has to contain the exact class name of the codec which is available on
Hadoop system. For example:
org.apache.hadoop.io.compress.SnappyCodec
false
or leave it empty, and the default compression type that is specified on Hadoop for the selected format will be used.<property> <name>fq.output.compressed</name> <value>false</value> </property>
- PARQUET:
- The fq.data.format defines in which format the transferred data is stored
on Hadoop. Supported values for this property are:
TXT
- for data warehouse strict formats.PARQUET
,ORC
,RCFILE
,AVRO
,SEQUENCEFILE
- for Hadoop-specific formats.
<property> <name>fq.data.format</name> <value>TXT</value> </property>
Warehouse information
- The fq.pda.db property sets the data warehouse database name. Include
double quotations around delimited database
names.
<property> <name>fq.pda.db</name> <value>bludb</value> </property>
- The fq.tables property provides a comma-separated list of warehouse tables
that are to be imported. Include double quotations around delimited table names.
The format of this value is <SCHEMA>.<table>. You can use an asterisk (*) character in the fq.tables property as a filter, for a combination of multiple schemas and tables, for example:<property> <name>fq.tables</name> <value>ADMIN.tab</value> </property>
S1.T1
- imports table T1 in schema S1S1.*
- imports all tables from schema S1*.T1
- imports all tables T1 from all schemas*
- imports all tables from the current schema for the user that is currently logged in
S1.a*
, meaning "all tables in schema S1, with name starting witha
". The system cannot interpret such a filter, anda*
is interpreted as the exact table name. - The fq.pda.server property sets the wall IP address or the fully qualified
host name of the warehouse
server.
<property> <name>fq.pda.server</name> <value>hostname.ibm.com</value> </property>
- The fq.pda.port property sets the port number for the warehouse database
instance
NZ_DBMS_PORT
.<property> <name>fq.pda.port</name> <value>50000</value> </property>
- The fq.pda.user property sets the warehouse database user account name for
access to the
database.
<property> <name>fq.pda.user</name> <value>bluadmin</value> </property>
- The fq.pda.password property sets the password for the warehouse database
user
account.
<property> <name>fq.pda.password</name> <value>password</value> </property>
- The fq.pda.ssl.cacertificate property sets the full path to the CA
Certificate file that is stored on HDFS and used to authenticate connections. Used only when the SSL
flag is true. If not provided, then all connections are
accepted.
<property> <name>fq.pda.ssl.cacertificate</name> <value></value> </property>
- The fq.pda.where property specifies the SQL WHERE clause that is used for
selecting the data to transfer. This property only works for import, not
export.
<property> <name>fq.pda.where</name> <value></value> </property>
- The fq.splits property sets the number of concurrent JDBC load sessions to
the warehouse
host.
<property> <name>fq.splits</name> <value>12</value> </property>
- The fq.custom.exttab.options is an optional
parameter for advanced users. It allows to provide user defined options to external tables. All
possible external table options and values are listed in CREATE EXTERNAL TABLE. You can provide multiple pairs of external table option and its
value, separated by a space. Note: The options provided with the fq.custom.exttab.options parameter overwrite the corresponding ones that might exist in the Fast Data Movement code.
<property> <name>fq.custom.exttab.options</name> <value>MaxErrors 1 SocketBufSize 8000000</value> </property>
- The fq.break.on.error property, when set to true, stops import execution in
case of error during the import of more than one table or whole database. When the import is
finished, information about transferred tables is
shown.
<property> <name>fq.break.on.error</name> <value>false</value> </property>
- The fq.append.mode property specifies how data is transferred to the
specified HDFS location. The property takes the following values:
- create During import, new table is created in the target database. If the table with the same name already exists, import fails.
- overwrite During import, a table from the source database is imported into the target database, and if any table with the same name already existed in the target database, it is replaced (overwritten).
- append The data from the source table is imported into the target table and appended to the already existing records. That is, if a table with the same name already exists in the target database, all data from the source table is added (appended) to the existing table.
Default value for this property is create.<property> <name>fq.append.mode</name> <value>overwrite</value> </property>
- The fq.sql.metadata property defines whether IBM® Fast Data Movement will create the table in Hive during import. By
default, it is set to
true.
If you set the property to false, the table will not be created in Hive and IBM Fast Data Movement will only import data files and put them on HDFS.<property> <name>fq.sql.metadata</name> <value>true</value> </property>
Hive options
- The fq.hive.usevarchar property determines the rules of conversion for
VARCHAR column. VARCHAR type is supported by Hive version 0.12 or later. If the property is set to
true
then data type is VARCHAR after import. Iffalse
, the data type is converted to STRING. Default value isfalse
because there is a number of limitations when VARCHAR is used, as described in Limitations on VARCHAR data types.<property> <name>fq.hive.usevarchar</name> <value>false</value> </property>
- The fq.hive.usedate property determines the rules of conversion for DATE
column. For Hive versions 0.12 and above, the DATE can be mapped to Hive DATE instead of STRING when
the parameter is set to
true
. By default the value is set tofalse
.<property> <name>fq.hive.usedate</name> <value>false</value> </property>
- The fq.hive.clustered.by parameter lists table column(s) which will be
used for clustering. To enable default clustering, keep this property empty and set
fq.hive.cluster.buckets.
<property> <name>fq.hive.clustered.by</name> <value></value> </property>
- The fq.hive.clustered.buckets parameter provides an integer value which
determines the amount of buckets that can be created during clustering. This property must be set to
enable
clustering.
<property> <name>fq.hive.clustered.buckets</name> <value></value> </property>
- The fq.hive.partitioned.by parameter lists columns in a table that will be
used for
partitioning.
<property> <name>fq.hive.partitioned.by</name> <value></value> </property>
Hadoop JDBC connection (remote connection) properties
- The fq.sql.server property sets the Hive server address on Hadoop where
the imported table is
created.
<property> <name>fq.sql.server</name> <value>rack1-master</value> </property>
- The fq.sql.port property sets the Hive server port number on
Hadoop.
<property> <name>fq.sql.port</name> <value>10000</value> </property>
- The fq.sql.type property sets the server type. Supported types are
hive2 or bigsql.
<property> <name>fq.sql.type</name> <value>hive2</value> </property>
Note: Use this property for remote import, not export.
JDBC authentication properties
Note: You must provide values for either user and password or for the Kerberos service principal
name.
- The fq.sql.user property sets the user name. It is required if you want to
use a User/Password
authentication.
<property> <name>fq.sql.user</name> <value>biadmin</value> </property>
- The fq.sql.password property sets the password. It is required if user name
was
provided.
<property> <name>fq.sql.password</name> <value>passw0rd</value> </property>
JDBC SSL properties
- The fq.sql.ssl property defines whether SSL is required to connect to the
selected Hadoop SQL server. Value can be true or
false.
<property> <name>fq.sql.ssl</name> <value>false</value> </property>
- The fq.sql.sslTrustStore property sets the path to the SSL trustStore that
is to be
used.
<property> <name>fq.sql.sslTrustStore</name> <value>$HIVE_HOME/src/data/files/cacerts_test.jks</value> </property>
- The fq.sql.sslTrustStorePassword property sets the password to the
specified SSL
trustStore.
<property> <name>fq.sql.sslTrustStorePassword</name> <value>passw0rd</value> </property>
Hadoop remote execution settings
- The fq.hadoop.remote.user parameter is used when running data movement from
a system different than Hadoop. It specifies the remote Hadoop user name that will run data
movement. This user should have access to
HDFS.
<property> <name>fq.hadoop.remote.user</name> <value></value> </property>
- The fq.hadoop.client.configs parameter is used when running data movement
from a system different than Hadoop. It specifies the path to the downloaded client configuration
files.
<property> <name>fq.hadoop.client.configs</name> <value></value> </property>
JDBC Kerberos properties
- The fq.kerberos.principal property sets the Kerberos principal name.
Required if Kerberos authentication should be
used.
<property> <name>fq.kerberos.principal</name> <value>hdfs/rack1-master.hadoop.ibm.com@REALM.IBM.COM</value> </property>
- The fq.kerberos.keytab property sets the Kerberos keytab. Required if
Kerberos authentication should be used.
<property> <name>fq.kerberos.keytab</name> <value>/path/to/your/client.keytab</value> </property>
- The fq.sql.spn property sets the Kerberos service principal name. It is
required if you want to use Kerberos
authentication.
<property> <name>fq.sql.spn</name> <value>hive/horton-master.ibm.com@XXXXXX.IBM.COM</value> </property>
Additional settings
- The fq.db2.lock.table property provides assurance of data consistency
between source and destination tables. When set to true, no user can change (insert or update) the
source table during data movement. When set to false, such lock is disabled.
fq.db2.lock.table is set to true by
default.
<property> <name>fq.db2.lock.table</name> <value>true</value> </property>
Note: When using BigSQL, you might need to add the following property to the
fq-import-remote-conf.xml
file:
<property>
<name>fq.sql.bigsql.v3compatibility</name>
<value>true</value>
</property>
The property solves the problem with mapping a NUMERIC field, as described in Limitations on decimal values in BigInsights.