Designing the data flow

You can branch and merge streams in the pipeline.

Branching streams

When you connect a stage to multiple stages, all data passes to all connected stages. You can configure required fields for a stage to discard records before they enter the stage, but by default all records are passed.

For example, in the following pipeline, all of the data from the Directory origin passes to both branches of the pipeline for different types of processing. But you might optionally configure required fields for the Field Splitter or Field Replacer to discard any records that are not needed.

A single Directory origin passes to two stages, creating two branches of the pipeline

To route data based on more complex conditions, use a Stream Selector.

Some stages generate events that pass to event streams. Event streams originate from an event-generating stage, such as a destination or origin, and pass from the stage through an event stream output, as follows:

A pipeline with an event stream branching from the event-generating stage

For more information about the event framework and event streams, see Dataflow triggers overview.

Merging streams

You can merge streams of data in a pipeline by connecting two or more stages to the same downstream stage. When you merge streams of data, Data Collector channels the data from all streams to the same stage, but does not perform a join of records in the stream.

For example, in the following pipeline, the Stream Selector stage sends data with null values to the Field Replacer stage:

A pipeline canvas shows the Stream Selector stage sending data to both the Field Replacer stage and the Expression Evaluator stage. The Field Replacer stage also sends data to the Field Replacer stage, merging the streams of the pipeline.

The data from the Stream Selector default stream and all data from Field Replacer pass to Expression Evaluator for further processing, but in no particular order and with no record merging.

Important: Pipeline validation does not prevent duplicate data. To avoid writing duplicate data to destinations, configure the pipeline logic to remove duplicate data or to prevent the generation of duplicate data.

Note that you cannot merge event streams with data streams. Event records must stream from the event-generating stage to destinations or executors without merging with data streams. For more information about the event framework and event streams, see Dataflow triggers overview.