Data movement known issues and limitations
Refer to this topic for information on possible limitations on data movement.
Import of some data types is not supported
- binary
- varbinary
- DBCLOB
- CLOB
- BLOB
- xml
IBM Fast Data Movement does not support complex data types that occur in Hive
- ARRAY
- MAP
- STRUCT
- UNIONTYPE
- BINARY
ERROR com.ibm.nz.fq.pda.ExternalTableExecutorThread - nzexception The length, precision, or scale attribute for column, distinct type, structured type, array type, attribute of structured type, routine, cast target type, type mapping, or global variable "GRAPHIC_DT" is not valid.. SQLCODE=-604, SQLSTATE=42611, DRIVER=4.23.42 com.ibm.db2.jcc.am.SqlSyntaxErrorException: The length, precision, or scale attribute for column, distinct type, structured type, array type, attribute of structured type, routine, cast target type, type mapping, or global variable "GRAPHIC_DT" is not valid.. SQLCODE=-604, SQLSTATE=42611, DRIVER=4.23.42
Cannot import tables with the same table name to the same HDFS directory
This problem might occur when importing a table from source database with the same name to Hadoop. The HDFS directories are not case-sensitive so you cannot import tables with the same table name to the same HDFS directory even if the letter case of table names is different or tables exist in different databases. Both cases will cause data overwriting on HDFS.
- You have two tables with the same name but different letter case in the same database on Db2 Warehouse, for example tables test and TEST.
- In the XML configuration file, set the property for the HDFS
directory:
<property> <name>fq.output.path</name> <value>/dbbackup/backup1</value> </property>
- Run the import for table test.
- Now, in the XML configuration file, set the property for the HDFS
directory:
<property> <name>fq.output.path</name> <value>/dbbackup/backup2</value> </property>
- Run the import for table TEST.
- If you follow this procedure, data is not overwritten and the tables are stored in different locations on HDFS.
Limitations on VARCHAR data types
- On Cloudera, the escape character ('\') is not processed correctly.
- Export in binary mode does not process characters with codes 128-255 in varchar column.
- In BigInsights 3 there is a problem with synchronizing Hive table metadata with BigSQL: Error Column "COL_CHAR", type "varchar(1)" is not supported
Handling a field delimiter and end of line character in data movement
Hive does not support new line and end of line characters within data. Records including these characters are separated, thus data is handled incorrectly. When querying any data format other than binary (for example text, Avro, Parquet), the new line and end of line characters cannot be handled correctly.
Limitation on TIMESTAMP data type
Date range in BigSQL starts from 1400-01-01. When importing a table to BigSQL with a TIMESTAMP column that contains dates earlier than year 1400, BigSQL shows NULL value for these dates.
- timestamp with time part 24:00:00 may not be interpreted correctly by Hive
- timestamps with seconds fractions precision grater than 9 are not interpreted correctly by Hive
In case of using 2 stage processing (with Hive-level conversion) in cases above, data may become corrupted even after exporting back to Db2
- past timestamps like 0001-01-01 may become shifted
- timestamps being inside daylight saving time (DST) shift in Hadoop timezone may be affected by 1 hour shift
Limitation on DATE and TIMESTAMP containing BC (Before Christ)
('1400-01-13 BC','1400-01-13 08:13:33 BC')
, might fail with error, as such date
format is not supported:- Hive does not support BC format as described in https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-date
Errors when exporting a Hive table with name or column name starting with underscore
Exporting a table with name or column name starting with _ (underscore) is not supported. As a workaround, you can create a view based on this table, but with name or column name not starting with underscore.
IBM Fast Data Movement does not support spaces in table names when importing a table from Db2 Warehouse
When you try to import a table that contains a space in table name, IBM Fast Data Movement imports data but fails when creating a table in Hive.
Hive displays very low and very high values as infinity
- On Db2 Warehouse:
Smallest|-1.79769313486232E+308|-3.402823E+38|-3.402823E+38|-3.402823E+38|-1.79769313486232E+308|-1.79769313486232E+308 Largest negative|-2.2250738585072E-308|-1.175494E-38|-1.175494E-38|-1.175494E-38|-2.2250738585072E-308|-2.2250738585072E-308 Smallest positive|2.2250738585072E-308|1.175494E-38|1.175494E-38|1.175494E-38|2.2250738585072E-308|2.2250738585072E-308 Largest|1.79769313486232E+308|3.402823E+38|3.402823E+38|3.402823E+38|1.79769313486232E+308|1.79769313486232E+308
- On Hive:
Smallest -Infinity -3.402823E38 -3.402823E38 -3.402823E38 -Infinity -Infinity Largest negative -2.2250738585072E-308 -1.175494E-38 -1.175494E-38 -1.175494E-38 -2.2250738585072E-308 -2.2250738585072E-308 Smallest positive 2.2250738585072E-308 1.175494E-38 1.175494E-38 1.175494E-38 2.2250738585072E-308 2.2250738585072E-308 Largest Infinity 3.402823E38 3.402823E38 3.402823E38 Infinity Infinity
Infinity values in FLOAT column cannot be exported from Hive
When trying to export a Hive table that contains -/+Infinity value in a FLOAT column, error is thrown.
Hive displays 0 in a numeric column as NULL
Some versions of Hive (observed with 1.2.1) incorrectly display 0 in numeric column of fields DECIMAL(n,n) - instead of 0, NULL is displayed. This problem affects data imported to Hadoop as Hive will not display it correctly. Additionally, in case of 2-stages import to Hadoop (used for Hadoop formats like PARQUET, partitioned or subpartitioned table) zeroes are converted to NULLs while processed by Hive. As a result, data exported back to Db2 Warehouse is also affected.
Direct transfer limitations
- IBM Fluid Query is installed.
- Hive is configured for querying NZBAK (hive.aux.jars.path and HIVE_AUX_JARS_PATH in Hive service configuration)
then data transfer in direct mode with the setting fq.data.directmovement=true is not supported. Possible errors in this case:
- ERROR: com.ibm.nz.fq.common.exception.hive.HiveException: Unable to execute Hive sql. Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
- ERROR: com.ibm.nz.fq.common.exception.hive.HiveException: Unable to execute Hive sql. Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
Delay in execution during export operations on Hortonworks
- Ensure that cluster can handle at least two containers execution in parallel - verify memory allocation for YARN containers, container min/max size.
- Modify the following YARN Capacity Scheduler
parameters:
yarn.scheduler.capacity.default.minimum-user-limit-percent=20 yarn.scheduler.capacity.maximum-am-resource-percent=0.8 yarn.scheduler.capacity.root.default.user-limit-factor=2