IBM Support

DataStage Inserts into Hive partitioned table using the Hive Connector are running slow and eventually failing while writing huge number of records.



Inserts into Hive Partitioned table using the Hive Connector are running slow and eventually failing while writing huge number of records.


Job execution would be slow and a new map reduce job would be created for every record that is being inserted into the table.


Datadirect JDBC Driver for Hive does not yet support batch inserts into a partitioned table.


Hive Connector in the target context and writing into a partitioned table.

Diagnosing The Problem

Using the Hadoop / Hive admin console, typically ambari console, one can monitor the applications that are getting created for the Hive inserts and notice that every insert is being executed in a separate MapReduce job.

Resolving The Problem

One can use the staging table to insert into the Hive partitioned table.
Please follow the instructions provided below to configure the Hive Connector to write into a partitioned Hive table using the staging table.

1. Set the Enabled Partitioned Write to No in the Hive Connector

2. Provide the staging table in Table name property of the Hive Connector. This will be a temporary table, which is non-partitioned and the data will be loaded into it.

3. Use the Set Hive parameters property to set the following hive parameters

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

4. Use the AfterSQL property to run the SQL in the following format

5. In the select clause, ensure that the partitioned keys are added at the end.
For example, if the partitioning keys are C1, and C2, ensure that the select statement lists all other columns before the partitioning keys as shown below.


Refer to the screenshots below for steps to configure the Connector

FIG - 1 : Screenshot showing the usage of the Hive Connector to write into a partitioned table.

FIG - 2 : Screenshot showing the table definitions along with the partitions

FIG - 3 : Screenshot showing the sample data from the staging as well as the partitioned table

[{"Product":{"code":"SSZJPZ","label":"IBM InfoSphere Information Server"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":";11.5;","Edition":"Enterprise","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018