Minimum and maximum range partitioned read method

When this method is specified, the connector calculates the minimum and maximum values for the column that is specified in the Partitioned reads method > Column name property and then divides the calculated range into subranges. Each subrange is then assigned to a node; the number of subranges equals the number of nodes that are configured for the stage. On each node, the connector runs a SELECT statement that returns the rows where the value in the specified column is in the subrange that was assigned to that node.

To use this method, you must specify a column name from the source table in the Partitioned reads method > Column name property. The specified column must exist in the table that is specified in the Partitioned reads method > Table name property. The column that you specify must have a TINYINT, SMALLINT, INTEGER or BIGINT data type.

The value for the Select statement property is used only if the Generate SQL property is set to No. Add [[range-part-expr]] placeholder to the SELECT statement expression in the WHERE clause where you see fit. Hive connector will process the SELECT statement from the Select statement property and replace the [[range-part-expr]] placeholder with appropriate condition, different on each node.

Example of using the minimum and maximum range partitioned read method

For this example, the Hive connector is configured in the following way:
  • The Generate SQL property is set to No.
  • The Select statement property is set to SELECT * FROM TABLE1 WHERE [[range-part-expr]] AND COL1 > 10.
  • The connector is configured to run in parallel mode on four nodes.
  • The Partitioned reads method property is set to Minimum and maximum range.
  • The Partitioned reads method > Table name property is set to TABLE1.
  • The Partitioned reads method > Column name property is set to COL2, and COL2 is defined as INTEGER in TABLE1.
The connector determines the minimum and maximum value for column COL2. If the minimum value is -20 and maximum value is 135, the connector runs the following SELECT statements on the nodes:
Node 1
SELECT * FROM TABLE1 WHERE COL2 < 19 AND (COL1 > 10)
Node 2
SELECT * FROM TABLE1 WHERE (COL2 >= 19 AND COL2 < 58) AND (COL1 > 10)
Node 3
SELECT * FROM TABLE1 WHERE (COL2 >= 58 AND COL2 < 97) AND (COL1 > 10)
Node 4
SELECT * FROM TABLE1 WHERE (COL2 >= 97) AND (COL1 > 10)