Hive partition partitioned read method

When this method is specified, the connector determines the number of partitions in the table specified in the Usage > Table name property or in the Select statement property. The connector associates each node with a list of partitions. For each node, the connector reads the rows that belong to the partitions that are associated with that node.

The value for the Select statement property is used only if the Generate SQL property is set to No. Add [[part-value]] placeholder to the SELECT statement expression in the WHERE clause where you see fit, the placeholder will be replaced by the partitioning key assigned to the partition. Hive connector will process the SELECT statement from the Select statement property and replace the [[part-value]] placeholder with appropriate partitioning key assigned for the column to which the placeholder is compared, different on each node. Only one [[part-value]] placeholder in the SELECT statement is supported.
Note: Only one [[part-value]] placeholder in the SELECT statement is supported. Tables with more then one partitioning column are currently not supported by the Hive partition partitioning read method.

Example of using the Hive partition partitioned read method

For this example, the Hive connector is configured in the following way:
  • The Generate SQL property is set to No.
  • The Select statement property is set to SELECT * FROM TABLE1 WHERE COL2 = [[part-value]] AND COL1 > 10.
  • The connector is configured to run in parallel mode on four nodes and the table has 8 partitions.
  • The Partitioned reads method property is set to Hive partition.

The connector determines the partitioning keys of the partitions, which are „Paris”, „Warsaw”, „Berlin”, „Moscow”, „Tokyo”, „London”, „Prague” and „Rome”. The connector runs the following SELECT statements on the nodes:

The connector runs the following SELECT statement on four nodes:

Node 1
SELECT * FROM TABLE1 WHERE (COL2 = ‘Paris’ OR COL2 = ‘Warsaw’) AND (COL1 > 10)
Node 2
SELECT * FROM TABLE1 WHERE (COL2 = ‘Berlin’ OR COL2 = ‘Moscow’) AND (COL1 > 10)
Node 3
SELECT * FROM TABLE1 WHERE (COL2 = ‘Tokyo’ OR COL2 = ‘London’) AND (COL1 > 10)
Node 4
SELECT * FROM TABLE1 WHERE (COL2 = ‘Prague’ OR COL2 = ‘Rome’) AND (COL1 > 10)