Data export configuration XML

Learn how to customize the XML file which configures exporting data from Hadoop to NPS®.

General configuration

  • The nz.fq.command property sets the type of data movement: import (NPS->Hadoop) or export (Hadoop->NPS).
    <property>
    	<name>nz.fq.command</name>
    	<value>export</value>
    </property>

HDFS information

  • The nz.fq.input.path parameter sets the directory on HDFS where the retrieved data is stored.
    <property>
    	<name>nz.fq.input.path</name>
    	<value>/nzbackup/fqtest1</value>
    </property>
    Note: The properties nz.fq.input.path and nz.fq.hive.tablename are exclusive, that is only one of them might be used to define the location of the data that you want to export. nz.fq.input.path provides more flexibility, as nz.fq.hive.tablename allows you to specify one table only, not multiple tables.
  • The nz.fq.format.fielddelim parameter sets the integer value of the single character field delimiter in the plain text output file.
    <property>
    	<name>nz.fq.format.fielddelim</name>
    	<value>124</value>
    </property>
  • The nz.fq.format.null property defines how the NULL value will be represented. Default is NULL. The value cannot be longer than 4 characters.
    <property>
    	<name>nz.fq.format.null</name>
    	<value>NULL</value>
    </property>
  • The nz.fq.fs.temp parameter sets the location of temporary files (such as logs and status files) on HDFS.
    <property>
    	<name>nz.fq.fs.temp</name>
    	<value>/tmp</value>
    </property>

NPS information

  • The nz.fq.nps.db property sets the NPS database name. Include double quotations around delimited database names.
    <property>
    	<name>nz.fq.nps.db</name>
    	<value>dev</value>
    </property>
  • The nz.fq.table parameter can be used to rename a specific table during export so that the exported table on NPS has a different name than the source table on Hadoop. Include double quotations around delimited table names. The format of this value is <SCHEMA>.<table>.
    <property>
    	<name>nz.fq.table</name>
    	<value>ADMIN.tab</value>
    </property>
    Restriction: If full schema support is enabled on your NPS system, make sure to provide the schema name together with the table name in this property.
  • The nz.fq.tables property provides a list of target tables for the exported data, but it also allows to filter these tables with the use of a wildcard character *. The use of the * filter is described in detail in Importing or exporting multiple tables at a time. You must include double quotations around delimited table names.
  • The nz.fq.append.mode property defines how data is transferred to the specified HDFS location. The property takes the following values:
    • create During import, new table is created in the target database. If the table with the same name already exists, import fails.
    • overwrite During import, a table from the source database is imported into the target database, and if any table with the same name already existed in the target database, it is replaced (overwritten).
    • append The data from the source table is imported into the target table and appended to the already existing records. That is, if a table with the same name already exists in the target database, all data from the source table is added (appended) to the existing table.
    For more information on append modes, read Append modes for import and export.
    Default value for this property is create.
    <property>
    	<name>nz.fq.append.mode</name>
    	<value>create</value>
    </property>
  • The nz.fq.nps.server property sets the wall IP address or the fully qualified host name of the NPS server.
    <property>
    	<name>nz.fq.nps.server</name>
    	<value>hostname.ibm.com</value>
    </property>
  • The nz.fq.nps.port property sets the port number for the NPS database instance NZ_DBMS_PORT.
    <property>
    	<name>nz.fq.nps.port</name>
    	<value>5480</value>
    </property>
  • The nz.fq.nps.user property sets The NPS database user account name for access to the database.
    <property>
    	<name>nz.fq.nps.user</name>
    	<value>admin</value>
    </property>
  • The nz.fq.nps.password property sets the password for the NPS database user account.
    <property>
    	<name>nz.fq.nps.password</name>
    	<value>password</value>
    </property>
  • The nz.fq.nps.ssl property sets the NPS server connection type. When set to true, then onlySecured JDBC mode is used. Default is false.
    <property>
    	<name>nz.fq.nps.ssl</name>
    	<value>false</value>
    </property>
  • The nz.fq.nps.ssl.cacertificate property sets the full path to the CA Certificate file that is stored on HDFS and used to authenticate connections. Used only when the SSL flag is true. If not provided, then all connections are accepted.
    <property>
    	<name>nz.fq.nps.ssl.cacertificate</name>
    	<value></value>
    </property>
  • The nz.fq.exttab.columns parameter sets the external table column names in proper order, for example CODE, TITLE, PRICE. The default value is *. Detailed syntax is described in "Transient External Tables" in IBM Netezza Data Loading Guide.
    <property>
    	<name>nz.fq.exttab.columns</name>
    	<value>*</value>
    </property>
  • The nz.fq.exttab.schema parameter sets the external table schema definition, for example CODE CHAR(5), TITLE VARCHAR(255), PRICE INTEGER.
    <property>
    	<name>nz.fq.exttab.schema</name>
    	<value></value>
    </property>
    The default value is the empty string, which uses the schema of the target table. Detailed syntax for transient external tables is described in the IBM Netezza Data Loading Guide.
  • The fq.custom.exttab.options is an optional parameter for advanced users. It allows to provide user defined options to external tables. All possible external table options and values are listed in the IBM PureData® System for Analytics Knowledge Center at https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_options.html. You can provide multiple pairs of external table option and its value, separated by a space.
    Note: The options provided with the fq.custom.exttab.options parameter overwrite the corresponding ones that might exist in IBM Fluid Query code.
    <property>
    	<name>fq.custom.exttab.options</name>
    	<value>MaxErrors 1 SocketBufSize 8000000</value>
    </property>
  • The nz.fq.splits property sets the number of concurrent JDBC load sessions to the NPS host.
    <property>
    	<name>nz.fq.splits</name>
    	<value>12</value>
    </property>

Checksum options

  • The fq.checksum parameter indicates whether the checksum has to be calculated after data movement. Available values:
    • NONE - no checksum (default)
    • FULL - calculate checksum using all columns in each table
    • ROWCOUNT - only row count
    • COLUMNS - calculate checksum using columns specified in fq.checksum.columns.
     <property>
            <name>fq.checksum</name>
            <value>NONE</value>
        </property>
  • The fq.checksum.columns parameter provides a list of columns for checksum calculation. When used with multiple tables import or export, checksum is calculated only on the listed columns within the tables. If no matching columns are found, only row count check is performed.
    <property>
            <name>fq.checksum.columns</name>
            <value></value>
        </property>

Hadoop Hive SQL connection configuration

JDBC connection properties

  • The nz.fq.sql.server property sets the Hive server address on Hadoop where the imported table is created.
    <property>
    	<name>nz.fq.sql.server</name>
    	<value>rack1-master</value>
    </property>
  • The nz.fq.sql.port property sets the Hive server port number on Hadoop.
    <property>
    	<name>nz.fq.sql.port</name>
    	<value>10000</value>
    </property>
  • Import only: The nz.fq.sql.type property sets the server type. Supported types are hive2 or bigsql.
    <property>
    	<name>nz.fq.sql.type</name>
    	<value>hive2</value>
    </property>
    Note: Use this property for remote import, not export.

JDBC authentication properties

Note: You must provide values for either user and password or for the Kerberos service principal name.
  • The nz.fq.sql.user property sets the user name. It is required if you want to use a User/Password authentication.
    <property>
    	<name>nz.fq.sql.user</name>
    	<value>biadmin</value>
    </property>
  • The nz.fq.sql.password property sets the password. It is required if user name was provided.
    <property>
    	<name>nz.fq.sql.password</name>
    	<value>passw0rd</value>
    </property>

JDBC SSL properties

  • The nz.fq.sql.ssl property defines whether SSL is required to connect to the selected Hadoop SQL server. Value can be true or false.
    <property>
    	<name>nz.fq.sql.ssl</name>
    	<value>false</value>
    </property>
  • The nz.fq.sql.sslTrustStore property sets the path to the SSL trustStore that is to be used.
    <property>
    	<name>nz.fq.sql.sslTrustStore</name>
    	<value>$HIVE_HOME/src/data/files/cacerts_test.jks</value>
    </property>
  • The nz.fq.sql.sslTrustStorePassword property sets the password to the specified SSL trustStore.
    <property>
    	<name>nz.fq.sql.sslTrustStorePassword</name>
    	<value>passw0rd</value>
    </property>

JDBC Kerberos properties

  • The nz.fq.kerberos.principal property sets the Kerberos principal name. Required if Kerberos authentication should be used.
    <property>
            <name>nz.fq.kerberos.principal</name>
            <value>hdfs/rack1-master.hadoop.ibm.com@REALM.IBM.COM</value>
        </property>
  • The nz.fq.kerberos.keytab property sets the Kerberos keytab. Required if Kerberos authentication should be used.
    <property>
            <name>nz.fq.kerberos.keytab</name>
            <value>/path/to/your/client.keytab</value>
        </property> 
  • The nz.fq.sql.spn property sets the Kerberos service principal name. It is required if you want to use Kerberos authentication.
    <property>
    	<name>nz.fq.sql.spn</name>
    	<value>hive/horton-master.ibm.com@XXXXXX.IBM.COM</value>
    </property>

Hadoop remote execution settings

  • The nz.fq.hadoop.remote.user parameter is used when running data movement from a system different than Hadoop. It specifies the remote Hadoop user name that will run data movement. This user should have access to HDFS.
    <property>
            <name>nz.fq.hadoop.remote.user</name>
            <value></value>
        </property>
  • The nz.fq.hadoop.client.configs parameter is used when running data movement from a system different than Hadoop. It specifies the path to the downloaded client configuration files.
        <property>
            <name>nz.fq.hadoop.client.configs</name>
            <value></value>
        </property>
    

Additional settings

  • The nz.fq.hive.tablename parameter provides the name of the source table on Hive. When this value is provided, the nz.fq.input.path must be empty.
        <property>
            <name>nz.fq.hive.tablename</name>
            <value></value>
        </property>
  • The nz.fq.hive.set.distribute.on parameter is used to specify data distribution key by column name(s). Can be used in case of Hive table export. Empty string is default, which means that data will be distributed randomly.
    <property>
            <name>nz.fq.hive.set.distribute.on</name>
            <value></value>
        </property>
  • The fq.hive.where parameter can be used to define an SQL WHERE clause to use for selecting the data to transfer during export of a Hive table. This parameter only works when fq.hive.tablename is set.
    <property>
            <name>fq.hive.where</name>
            <value>name like 'K%'</value>
        </property>
  • The fq.hive.use.where.in.length.discovery parameter specifies if string length in schema should be defined with the fq.hive.where parameter when exporting a Hive table for the first time. The parameter is set to false by default, that is, string length is calculated for all records in the Hive table during schema preparation, not restricted by the WHERE clause. When set to true, only the records from the WHERE clause are taken into consideration for a given table. Note that when you later change the WHERE clause to include longer records, and export the data to the same Netezza table again, then the export will fail.
    <property>
            <name>fq.hive.use.where.in.length.discovery</name>
            <value>false</value>
        </property>
  • The nz.fq.skip.table.structure.validation parameter is used in case of exporting a Hive table to the already existing destination table in warehouse. When this value is set to true, the validation between the structures of source Hive table and destination warehouse table is omitted. Default value is false.
    <property>
            <name>nz.fq.skip.table.structure.validation</name>
            <value>false</value>
        </property>