Hive partition partitioned read method
When this method is specified, the connector determines the number of partitions in the table specified in the Usage > Table name property or in the Select statement property. The connector associates each node with a list of partitions. For each node, the connector reads the rows that belong to the partitions that are associated with that node.
The value for the Select statement property is used only if the
Generate SQL property is set to No. Add
[[part-value]] placeholder to the SELECT statement expression in the WHERE clause
where you see fit, the placeholder will be replaced by the partitioning key assigned to the
partition. Hive connector will process the SELECT statement from the Select
statement property and replace the [[part-value]] placeholder with
appropriate partitioning key assigned for the column to which the placeholder is compared, different
on each node. Only one [[part-value]] placeholder in the SELECT statement is
supported.
Note: Only one [[part-value]] placeholder in the SELECT statement is
supported. Tables with more then one partitioning column are currently not supported by the Hive
partition partitioning read method.
Example of using the Hive partition partitioned read method
For this example, the Hive connector is configured in the following way:- The Generate SQL property is set to No.
- The Select statement property is set to SELECT * FROM TABLE1 WHERE COL2 = [[part-value]] AND COL1 > 10.
- The connector is configured to run in parallel mode on four nodes and the table has 8 partitions.
- The Partitioned reads method property is set to Hive partition.
The connector determines the partitioning keys of the partitions, which are „Paris”, „Warsaw”, „Berlin”, „Moscow”, „Tokyo”, „London”, „Prague” and „Rome”. The connector runs the following SELECT statements on the nodes:
The connector runs the following SELECT statement on four nodes:
Node
1
SELECT * FROM TABLE1 WHERE (COL2 = ‘Paris’ OR COL2 = ‘Warsaw’) AND (COL1 > 10)
Node
2
SELECT * FROM TABLE1 WHERE (COL2 = ‘Berlin’ OR COL2 = ‘Moscow’) AND (COL1 > 10)
Node
3
SELECT * FROM TABLE1 WHERE (COL2 = ‘Tokyo’ OR COL2 = ‘London’) AND (COL1 > 10)
Node
4
SELECT * FROM TABLE1 WHERE (COL2 = ‘Prague’ OR COL2 = ‘Rome’) AND (COL1 > 10)