Reading partitioned data

In a job that uses multiple nodes, each node that is specified for the stage reads a distinct subset of data from the source.

Before you begin

  • Configure the stage on the output link to run in parallel.
  • Define two or more processing nodes to run the job on.

About this task

The File connector can read one or more files sequentially or in parallel. To read more than one file, specify wildcards in the value for the File name property.

When the job runs in sequential mode, the connector reads all rows from all input files by using a single processing node. When the job runs in parallel mode, each input file whose name matches the specified wildcard options, is read by multiple processing nodes. Each processing node reads a subset of rows from every input file.

If you want each node to read one or more different files, the specified file names must contain a unique number that corresponds to a node number. Node number is the value that is specified for the File name property. For example, if you define two nodes for the job and specify MyFile_[[node-number]].txt for the File name property, node 0 reads the MyFile_0.txt file and node 1 reads the MyFile_1.txt file.

Use the [[node-number]] option and wildcards in the File name property to combine the two methods for each processing node to read all rows from a distinct set of matching input files.

Procedure

  1. On the job design canvas, double-click the File Connector stage, and then click the Stage tab.
  2. On the Advanced page, set Execution mode to Parallel.
  3. Specify a value for the File name property based on the number and names of the files to read.
  4. Click OK, and then save the job.