Specifying a Merge Method and Keys
The Merge tab of the Merge node contains the following fields.
Merge Method Select the method to be used for merging records. Selecting either Keys or Condition activates the bottom half of the dialog box.
- Order Merges records by order such that the nth record from each input is merged to produce the nth output record. When any record runs out of a matching input record, no more output records are produced. This means that the number of records that are created is the number of records in the smallest dataset. It is important if using this option that you have sorted your data using a Sort node.
-
Keys Uses a key field, such as Transaction ID, to merge records with
the same value in the key field. This is equivalent to a database "equi-join." If a key value occurs
more than once, all possible combinations are returned. For example, if records with the same key
field value A contain differing values B, C, and D in other fields, the
merged fields produce a separate record for each combination of A with value B,
A with value C, and A with value D.
Note: Null values are not considered identical in the merge-by-key method and will not join.
- Condition Use this option to specify a condition for the merge. For more information, see Specifying Conditions for a Merge.
- Ranked condition Use this option to specify whether each row pairing in the primary and all secondary data sets are to be merged; use the ranking expression to sort any multiple matches into order from low to high. For more information, see Specifying Ranked Conditions for a Merge.
Possible keys Lists only those fields with exactly matching field names in all input data sources. Select a field from this list and use the arrow button to add it as a key field used for merging records. More than one key field can be used. You can rename non-matching input fields by using a Filter node, or the Filter tab of a source node.
Keys for merge Lists all fields that are used to merge records from all input data sources based on values of the key fields. To remove a key from the list, select one and use the arrow button to return it to the Possible Keys list. When more than one key field is selected, the option below is enabled.
Combine duplicate key fields When more than one key field is selected above, this option ensures that there is only one output field of that name. This option is enabled by default except in the case when streams have been imported from earlier versions of IBM® SPSS® Modeler. When this option is disabled, duplicate key fields must be renamed or excluded by using the Filter tab in the Merge node dialog box.
Include only matching records (inner join) Select to merge only complete records.
Include matching and non-matching records (full outer join) Select to perform a "full outer join." This means that if values for the key field are not present in all input tables, the incomplete records are still retained. The undefined value ($null$) is added to the key field and included in the output record.
Include matching and selected non-matching records (partial outer join) Select to perform a "partial outer join" of the tables you select in a subdialog box. Click Select to specify tables for which incomplete records will be retained in the merge.
Include records in the first dataset not matching any others (anti-join) Select to perform a type of "anti-join," where only nonmatching records from the first dataset are passed downstream. You can specify the order of input datasets using arrows on the Inputs tab. This type of join does not include complete records in the output dataset. For more information, see Types of Joins.