Data export configuration XML
Learn how to customize the XML file which configures exporting data from Hadoop to NPS®.
General configuration
- The nz.fq.command property sets the type of data movement: import
(NPS->Hadoop) or export
(Hadoop->NPS).
<property> <name>nz.fq.command</name> <value>export</value> </property>
HDFS information
- The nz.fq.input.path parameter sets the directory on HDFS where the
retrieved data is
stored.
<property> <name>nz.fq.input.path</name> <value>/nzbackup/fqtest1</value> </property>Note: The properties nz.fq.input.path and nz.fq.hive.tablename are exclusive, that is only one of them might be used to define the location of the data that you want to export. nz.fq.input.path provides more flexibility, as nz.fq.hive.tablename allows you to specify one table only, not multiple tables. - The nz.fq.format.fielddelim parameter sets the integer value of the single
character field delimiter in the plain text output
file.
<property> <name>nz.fq.format.fielddelim</name> <value>124</value> </property> - The nz.fq.format.null property defines how the NULL value will be
represented. Default is NULL. The value cannot be longer than 4
characters.
<property> <name>nz.fq.format.null</name> <value>NULL</value> </property> - The nz.fq.fs.temp parameter sets the location of temporary files (such as
logs and status files) on
HDFS.
<property> <name>nz.fq.fs.temp</name> <value>/tmp</value> </property>
NPS information
- The nz.fq.nps.db property sets the NPS database name. Include double
quotations around delimited database
names.
<property> <name>nz.fq.nps.db</name> <value>dev</value> </property> - The nz.fq.table parameter can be used to rename a specific table during
export so that the exported table on NPS has a different name than the source table on Hadoop.
Include double quotations around delimited table names. The format of this value is
<SCHEMA>.<table>.
<property> <name>nz.fq.table</name> <value>ADMIN.tab</value> </property>Restriction: If full schema support is enabled on your NPS system, make sure to provide the schema name together with the table name in this property. - The nz.fq.tables property provides a list of target tables for the exported data, but it also allows to filter these tables with the use of a wildcard character *. The use of the * filter is described in detail in Importing or exporting multiple tables at a time. You must include double quotations around delimited table names.
- The nz.fq.append.mode property defines how data is transferred to the
specified HDFS location. The property takes the following values:
- create During import, new table is created in the target database. If the table with the same name already exists, import fails.
- overwrite During import, a table from the source database is imported into the target database, and if any table with the same name already existed in the target database, it is replaced (overwritten).
- append The data from the source table is imported into the target table and appended to the already existing records. That is, if a table with the same name already exists in the target database, all data from the source table is added (appended) to the existing table.
Default value for this property is create.<property> <name>nz.fq.append.mode</name> <value>create</value> </property> - The nz.fq.nps.server property sets the wall IP address or the fully
qualified host name of the NPS
server.
<property> <name>nz.fq.nps.server</name> <value>hostname.ibm.com</value> </property> - The nz.fq.nps.port property sets the port number for the NPS database instance
NZ_DBMS_PORT.<property> <name>nz.fq.nps.port</name> <value>5480</value> </property> - The nz.fq.nps.user property sets The NPS database user account name for
access to the
database.
<property> <name>nz.fq.nps.user</name> <value>admin</value> </property> - The nz.fq.nps.password property sets the password for the NPS database user
account.
<property> <name>nz.fq.nps.password</name> <value>password</value> </property> - The nz.fq.nps.ssl property sets the NPS server connection type. When set
to true, then onlySecured JDBC mode is used. Default is
false.
<property> <name>nz.fq.nps.ssl</name> <value>false</value> </property> - The nz.fq.nps.ssl.cacertificate property sets the full path to the CA
Certificate file that is stored on HDFS and used to authenticate connections. Used only when the SSL
flag is true. If not provided, then all connections are
accepted.
<property> <name>nz.fq.nps.ssl.cacertificate</name> <value></value> </property> - The nz.fq.exttab.columns parameter sets the external table column names in
proper order, for example
CODE,TITLE,PRICE. The default value is*. Detailed syntax is described in "Transient External Tables" in IBM Netezza Data Loading Guide.<property> <name>nz.fq.exttab.columns</name> <value>*</value> </property> - The nz.fq.exttab.schema parameter sets the external table schema
definition, for example
CODE CHAR(5),TITLE VARCHAR(255),PRICE INTEGER.
The default value is the empty string, which uses the schema of the target table. Detailed syntax for transient external tables is described in the IBM Netezza Data Loading Guide.<property> <name>nz.fq.exttab.schema</name> <value></value> </property> - The fq.custom.exttab.options is an optional
parameter for advanced users. It allows to provide user defined options to external tables. All
possible external table options and values are listed in the IBM PureData® System for Analytics Knowledge Center at https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_options.html. You can provide multiple pairs of external table option and its
value, separated by a space. Note: The options provided with the fq.custom.exttab.options parameter overwrite the corresponding ones that might exist in IBM Fluid Query code.
<property> <name>fq.custom.exttab.options</name> <value>MaxErrors 1 SocketBufSize 8000000</value> </property> - The nz.fq.splits property sets the number of concurrent JDBC load sessions
to the NPS
host.
<property> <name>nz.fq.splits</name> <value>12</value> </property>
Checksum options
- The fq.checksum parameter indicates whether the checksum has to be
calculated after data movement. Available values:
- NONE - no checksum (default)
- FULL - calculate checksum using all columns in each table
- ROWCOUNT - only row count
- COLUMNS - calculate checksum using columns specified in fq.checksum.columns.
<property> <name>fq.checksum</name> <value>NONE</value> </property> - The fq.checksum.columns parameter provides a list of columns for checksum
calculation. When used with multiple tables import or export, checksum is calculated only on the
listed columns within the tables. If no matching columns are found, only row count check is
performed.
<property> <name>fq.checksum.columns</name> <value></value> </property>
Hadoop Hive SQL connection configuration
JDBC connection properties
- The nz.fq.sql.server property sets the Hive server address on Hadoop where
the imported table is
created.
<property> <name>nz.fq.sql.server</name> <value>rack1-master</value> </property> - The nz.fq.sql.port property sets the Hive server port number on
Hadoop.
<property> <name>nz.fq.sql.port</name> <value>10000</value> </property> - Import only: The nz.fq.sql.type property sets the server type. Supported
types are hive2 or bigsql.
<property> <name>nz.fq.sql.type</name> <value>hive2</value> </property>Note: Use this property for remote import, not export.
JDBC authentication properties
Note: You must provide values for either user and password or for the Kerberos service principal
name.
- The nz.fq.sql.user property sets the user name. It is required if you want
to use a User/Password
authentication.
<property> <name>nz.fq.sql.user</name> <value>biadmin</value> </property> - The nz.fq.sql.password property sets the password. It is required if user
name was
provided.
<property> <name>nz.fq.sql.password</name> <value>passw0rd</value> </property>
JDBC SSL properties
- The nz.fq.sql.ssl property defines whether SSL is required to connect to
the selected Hadoop SQL server. Value can be true or
false.
<property> <name>nz.fq.sql.ssl</name> <value>false</value> </property> - The nz.fq.sql.sslTrustStore property sets the path to the SSL trustStore
that is to be
used.
<property> <name>nz.fq.sql.sslTrustStore</name> <value>$HIVE_HOME/src/data/files/cacerts_test.jks</value> </property> - The nz.fq.sql.sslTrustStorePassword property sets the password to the
specified SSL
trustStore.
<property> <name>nz.fq.sql.sslTrustStorePassword</name> <value>passw0rd</value> </property>
JDBC Kerberos properties
- The nz.fq.kerberos.principal property sets the Kerberos principal name.
Required if Kerberos authentication should be
used.
<property> <name>nz.fq.kerberos.principal</name> <value>hdfs/rack1-master.hadoop.ibm.com@REALM.IBM.COM</value> </property> - The nz.fq.kerberos.keytab property sets the Kerberos keytab. Required if
Kerberos authentication should be used.
<property> <name>nz.fq.kerberos.keytab</name> <value>/path/to/your/client.keytab</value> </property> - The nz.fq.sql.spn property sets the Kerberos service principal name. It is
required if you want to use Kerberos authentication.
<property> <name>nz.fq.sql.spn</name> <value>hive/horton-master.ibm.com@XXXXXX.IBM.COM</value> </property>
Hadoop remote execution settings
- The nz.fq.hadoop.remote.user parameter is used when running data movement
from a system different than Hadoop. It specifies the remote Hadoop user name that will run data
movement. This user should have access to
HDFS.
<property> <name>nz.fq.hadoop.remote.user</name> <value></value> </property> - The nz.fq.hadoop.client.configs parameter is used when running data
movement from a system different than Hadoop. It specifies the path to the downloaded client
configuration
files.
<property> <name>nz.fq.hadoop.client.configs</name> <value></value> </property>
Additional settings
- The nz.fq.hive.tablename parameter provides the name of the source table on
Hive. When this value is provided, the nz.fq.input.path must be
empty.
<property> <name>nz.fq.hive.tablename</name> <value></value> </property> - The nz.fq.hive.set.distribute.on parameter is used to specify data
distribution key by column name(s). Can be used in case of Hive table export. Empty string is
default, which means that data will be distributed
randomly.
<property> <name>nz.fq.hive.set.distribute.on</name> <value></value> </property> - The fq.hive.where parameter can be used to define an SQL WHERE clause to
use for selecting the data to transfer during export of a Hive table. This parameter only works when
fq.hive.tablename is
set.
<property> <name>fq.hive.where</name> <value>name like 'K%'</value> </property> - The fq.hive.use.where.in.length.discovery parameter specifies if string
length in schema should be defined with the fq.hive.where parameter when
exporting a Hive table for the first time. The parameter is set to false by default, that is, string
length is calculated for all records in the Hive table during schema preparation, not restricted by
the WHERE clause. When set to true, only the records from the WHERE clause are taken into
consideration for a given table. Note that when you later change the WHERE clause to include longer
records, and export the data to the same Netezza table again, then the export will
fail.
<property> <name>fq.hive.use.where.in.length.discovery</name> <value>false</value> </property> - The nz.fq.skip.table.structure.validation parameter is used in case of
exporting a Hive table to the already existing destination table in warehouse. When this value is
set to true, the validation between the structures of source Hive table and destination warehouse
table is omitted. Default value is
false.
<property> <name>nz.fq.skip.table.structure.validation</name> <value>false</value> </property>