Reading partitioned data

In a job that uses multiple nodes, each node that is specified for the stage reads a distinct subset of data from the source table.

Before you begin

To use the default rowid range partitioned read method, the user whose credentials are used to connect to the Oracle database must have SELECT access to the DBA_EXTENTS dictionary view.

About this task

If the connector is configured to run in parallel mode to read data, the connector runs a slightly modified SELECT statement on each node. The combined set of rows from all of the queries is the same set of rows that would be returned if the unmodified user-defined SELECT statement were run once on one node.

Procedure

  1. On the job design canvas, double-click the Oracle Connector stage, and then click the Stage tab.
  2. On the Advanced page, set Execution mode to Parallel, and then click the Output tab.
  3. Set Enable partitioned reads to Yes.
  4. Set Read mode to Select, and then define the SELECT statement that the connector uses at run time:
    • Set Generate SQL at runtime to Yes, and then enter the name of the table in the Table name field. Use the syntax schema_name.table_name, where schema_name is the owner of the table. If you do not specify schema_name, the connector uses the schema that belongs to the currently connected user. The connector automatically generates and runs the SELECT statement.

      To read data from a particular partition of a partitioned table, set the Table scope property to Single partition, and specify the name of the partition in the Partition name property. The connector then automatically adds a PARTITION(partition_name) clause to the SELECT statement that is generated. To read data from a particular subpartition of the composite partitioned table, set the Table scope property to Single subpartition and specify the name of the subpartition in the Subpartition name property. The connector then automatically adds a SUBPARTITION(subpartition_name) clause to the generated SELECT statement.

    • Set Generate SQL at runtime to No, and then specify the SELECT statement in the Select statement property. You can enter the SQL statement or enter the fully qualified file name of the file that contains the SQL statement. If you enter a file name, you must also set Read select statement from file to Yes.
  5. Set the Partitioned reads method property to the partitioning method that you want to use. The default partitioning method is Rowid range.
  6. Specify the input values that the partitioned read method uses:
    1. In the Table name for partitioned reads property, specify the name of the table that the partitioned read method uses to define the subsets of data that each node reads from the source table.

      If you do not specify a table name, the connector uses the value of the Generate SQL at runtime property to determine the table name. If Generate SQL at runtime is set to Yes, the connector uses the table name that is specified in the Table name property. If Generate SQL at runtime is set to No, the connector looks at the SELECT statement that is specified in the Select statement property and uses the first table name that is specified in the FROM clause.

    2. If you choose the Rowid range or the Minimum and maximum range partitioned read method, in the Partition or subpartition name for partitioned reads property, specify the name of the partition or subpartition that the partitioned read methods uses.
      Note: If you do not specify a value for the Partition or subpartition name for partitioned reads property, the connector uses the entire table as input for the partitioned read method. When the connector is configured to read data from a single partition or subpartition, you typically specify the name of the partition or subpartition in the Partition or subpartition name for partitioned reads property. Then the connector analyzes only the data that belongs to that partition or subpartition. This process typically results in a more even distribution of data and a more efficient use of nodes.
    3. If you choose the Modulus or the Minimum and maximum range partitioned read method, in the Column name for partitioned reads, enter the name of the column from the source table to use for the method. The column must be an existing column in the table, must be of NUMBER(p) data type, where p is the number precision, and must have a scale of zero.
  7. Click OK, and then save the job.