Data movement known issues and limitations

Refer to this topic for information on the limitations of IBM® Fluid Query data movement.

Cannot run data movement from NPS system with Kerberos authentication

If your NPS® system uses Kerberos authentication for its database user accounts, note that you cannot run the data movement feature from the NPS host to import or export files. As a alternative, you can run the data movement feature from the Hadoop service provider.

GPFS data movement between NPS and Hadoop

Running the import and export operations from NPS or any other systems is only possible when HDFS is used. If BigInsights uses GPFS, you can only initiate import and export operations from Hadoop.

Incremental data import when an update or deletion is discovered on NPS

Incremental import to Hadoop is stopped when there was an update or deletion in the table on NPS after the last import, as it leads to data inconsistency. In such situation the import stops with the following error message:

Some records have been deleted or updated after last import. Full import is required to 
have consistent data. To run incremental import anyway please set 
nz.fq.ignore.deleted to true.

You can use the nz.fq.ignore.deleted flag to ignore such error messages and import data anyway.

Import of some object types is not supported

The following object types are not supported by IBM Fluid Query because they contain unsupported, NPS-specific data types (special columns), such as rowid, datasliceid, createxid, and deletexid.

External table
Views
Mviews

Import of these objects will result in an error similar to the following one: ERROR: Column reference "DATASLICEID" not supported for external table.

Import of some data types is not supported

The following data types are not supported in this release of IBM Fluid Query:

varbinary
st_geometry

Import of a table that contains these column types will result in the following error: Can't recognize NPS data type

When importing a clustered base table (CBT), the organizing keys are not retained

When you import a CBT from NPS to Hadoop, the organizing keys are not preserved in the metadata. As a result, when you transfer the table back to NPS, it is exported as a regular table.

You can perform the following steps as a workaround for this limitation:

Import a CBT to Hadoop.
Before you export it, go to your NPS system and create a table with organizing keys, similar to the schema of the table that you imported to Hadoop.
Export only the data from the CBT that is located on Hadoop to the existing table on the NPS side. To do this, you must provide the path to the table directory on HDFS in the export configuration file.

Export of the `Interval` data type is not supported on older NPS systems

You cannot export a table that contains Interval data types to a Netezza system that runs NPS release older than 7.1.

For these older versions of NPS, you can still import a table containing Interval data types to Hadoop but it will be impossible to export such a table back to NPS later.

IBM Fluid Query does not support complex data types that occur in Hive

The following complex data types are not supported:

ARRAY
MAP
STRUCT
UNIONTYPE
BINARY

Cannot import tables with the same table name to the same HDFS directory

This problem might occur when importing a table from NPS with the same name to Hadoop. The HDFS directories are not case-sensitive so you cannot import tables with the same table name to the same HDFS directory even if the letter case of table names is different or tables exist in different databases. Both cases will cause data overwriting on HDFS.

To avoid overwriting of data for such tables, change the HDFS directory using the nz.fq.output.path parameter in the import XML configuration file. Do this even in the case when the tables are in different databases or their letter case is different. The following example explains this workaround in detail:

You have two tables with the same name but different letter case in the same database on NPS, for example tables test and TEST.

In the XML configuration file, set the property for the HDFS directory:

<property> 
	<name>nz.fq.output.path</name> 
	<value>/nzbackup/backup1</value> 
</property>

Run the import for table test.

Now, in the XML configuration file, set the property for the HDFS directory:

<property> 
	<name>nz.fq.output.path</name> 
	<value>/nzbackup/backup2</value> 
</property>

Run the import for table TEST.
If you follow this procedure, data is not overwritten and the tables are stored in different locations on HDFS.

Compatibility of Netezza driver with Java 1.6 and Cloudera

The data movement installation package is delivered with Netezza drivers supporting Java 1.6 or Java 1.7. You can find them by unpacking fluid-query-import-export.tar package. The nzjdbc3.jar driver supports Java 1.7, and nzjdbc3_java16.jar supports Java 1.6. When installing the data movement feature, the installer automatically recognizes the version of Java and Cloudera. When the installed version of Java is 1.6, then nzjdbc3_java16.jar is installed, otherwise nzjdbc3.jar is installed.

To replace the driver manually:

On your system, locate the IBM Fluid Query installation package (minimum version 1.7.0.0), or download the latest package from IBM Fix Central.
Go to the directory with the installation package.
Unpack nz-fluidquery-.tar.gz. Execute the command:
```
gunzip nz-fluidquery-VERSION.tar.gz
```
where VERSION is the version of IBM Fluid Query. For example:
```
gunzip /myfolder/nz-fluidquery-v1.7.tar.gz
```
File nz-fluidquery-VERSION.tar is created in the directory where the installation package exists.
Unpack nz-fluidquery-VERSION.tar by executing the command:
```
tar -xvf ./nz-fluidquery-VERSION.tar
```
For example:
```
tar -xvf ./nz-fluidquery-v1.7.tar
```
File fluid-query-import-export-VERSION.tar is created in the directory where the installation package exists.
Unpack fluid-query-import-export-VERSION.tar by executing the command:
```
tar -xvf ./fluid-query-import-export-VERSION.tar
```
For example:
```
tar -xvf ./fluid-query-import-export-v1.7.0.0.tar
```
Two files are created:
- nzjdbc3.jar
- nzjdbc_java16.jar
To manually install the Netezza driver supporting Java 1.7, copy nzjdbc3.jar to the data movement installation directory. Run the command:
```
cp ./nzjdbc3.jar /fluidquerylocal
```
To manually install the Netezza driver supporting Java 1.6, copy nzjdbc3_java16.jar to the data movement installation directory. Run the command:
```
cp ./nzjdbc3_java16.jar /fluidquerylocal/nzjdbc3.jar
```

Limitations on VARCHAR data types

By default, the parameter fq.hive.usevarchar is set to false, that is, the VARCHAR data type is changed to STRING on Hive. This is caused by a number of limitations when converting to VARCHAR on Hive, for example:

On Cloudera, the escape character ('\') is not processed correctly.
Export in binary mode does not process characters with codes 128-255 in varchar column.
In BigInsights 3 there is a problem with synchronizing Hive table metadata with BigSQL: Error Column "COL_CHAR", type "varchar(1)" is not supported

Handling a field delimiter and end of line character in data movement

Hive does not support new line and end of line characters within data. Records including these characters are separated, thus data is handled incorrectly. When querying any data format other than binary (for example text, Avro, Parquet), the new line and end of line characters cannot be handled correctly.

You can use compression and store data including these characters on Hadoop in Netezza compressed (binary) mode. Then, when querying data, newline and carriage return characters are replaced with a space (' '). This is performed by the nzetc tool on the fly, during decompression from the binary format. When you export data back to NPS, the data is not affected. The solution does not work for mixed mode of transfer.

Limitation on TIMESTAMP data type for BigSQL

Date range in BigSQL starts from 1400-01-01. When importing a table to BigSQL with a TIMESTAMP column that contains dates earlier than year 1400, BigSQL shows NULL value for these dates.

Limitation on DATE and TIMESTAMP containing BC (Before Christ)

Moving data with columns DATE or TIMESTAMP that contain BC (Before Christ), for example ('1400-01-13 BC','1400-01-13 08:13:33 BC'), might fail with error, as such date format is not supported:

Hive does not support BC format as described in https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-date
NPS does not support unload for BC format. For more information see https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.load.doc/r_load_datestyle.html

Errors when exporting a Hive table with name or column name starting with underscore

Exporting a table with name or column name starting with _ (underscore) is not supported. As a workaround, you can create a view based on this table, but with name or column name not starting with underscore.