In a job that uses multiple nodes, each node that is specified
for the stage reads a distinct subset of data from the source table.
Before you begin
To use the default rowid range partitioned read method,
the user whose credentials are used to connect to the Oracle database
must have SELECT access to the DBA_EXTENTS dictionary view.
About this task
If the connector is configured to run in parallel mode
to read data, the connector runs a slightly modified SELECT statement
on each node. The combined set of rows from all of the queries is
the same set of rows that would be returned if the unmodified user-defined
SELECT statement were run once on one node.
Procedure
- On the job design canvas, double-click the Oracle Connector
stage, and then click the Stage tab.
- On the Advanced page, set Execution mode to Parallel,
and then click the Output tab.
- Set Enable partitioned reads to Yes.
- Set Read mode to Select,
and then define the SELECT statement that the connector uses at run
time:
- Set Generate SQL at runtime to Yes,
and then enter the name of the table in the Table name field.
Use the syntax schema_name.table_name, where schema_name is
the owner of the table. If you do not specify schema_name,
the connector uses the schema that belongs to the currently connected
user. The connector automatically generates and runs the SELECT statement.
To
read data from a particular partition of a partitioned table, set
the Table scope property to Single
partition, and specify the name of the partition in the Partition
name property. The connector then automatically adds a
PARTITION(partition_name) clause to the SELECT
statement that is generated. To read data from a particular subpartition
of the composite partitioned table, set the Table scope property
to Single subpartition and specify the name
of the subpartition in the Subpartition name property.
The connector then automatically adds a SUBPARTITION(subpartition_name)
clause to the generated SELECT statement.
- Set Generate SQL at runtime to No,
and then specify the SELECT statement in the Select statement property.
You can enter the SQL statement or enter the fully qualified file
name of the file that contains the SQL statement. If you enter a file
name, you must also set Read select statement from file to Yes.
- Set the Partitioned reads method property
to the partitioning method that you want to use. The default partitioning
method is Rowid range.
- Specify the input values that the partitioned read method
uses:
- In the Table name for partitioned reads property,
specify the name of the table that the partitioned read method uses
to define the subsets of data that each node reads from the source
table.
If you do not specify a table name, the connector
uses the value of the Generate SQL at runtime property
to determine the table name. If Generate SQL at runtime is
set to Yes, the connector uses the table name
that is specified in the Table name property.
If Generate SQL at runtime is set to No,
the connector looks at the SELECT statement that is specified in the Select
statement property and uses the first table name that
is specified in the FROM clause.
- If you choose the Rowid range or
the Minimum and maximum range partitioned read
method, in the Partition or subpartition name for partitioned
reads property, specify the name of the partition or subpartition
that the partitioned read methods uses.
Note: If you do
not specify a value for the Partition or subpartition name
for partitioned reads property, the connector uses the
entire table as input for the partitioned read method. When the connector
is configured to read data from a single partition or subpartition,
you typically specify the name of the partition or subpartition in
the Partition or subpartition name for partitioned reads property.
Then the connector analyzes only the data that belongs to that partition
or subpartition. This process typically results in a more even distribution
of data and a more efficient use of nodes.
- If you choose the Modulus or
the Minimum and maximum range partitioned read
method, in the Column name for partitioned reads,
enter the name of the column from the source table to use for the
method. The column must be an existing column in the table, must be
of NUMBER(p) data type, where p is
the number precision, and must have a scale of zero.
- Click OK, and then save the job.