Operational Decision Manager (ODM) stage batch mode and key mode processing in DataStage

Each input link of the Operational Decision Manager (ODM) stage corresponds to either an IN or an IN_OUT parameter of the ruleset that is specified for the stage. The connector converts records that are retrieved on the input links to Java™ values and passes them to the ODM stage as the values of the corresponding IN and IN_OUT ruleset parameters. A key concept associated with the function of the run time is that of rule execution cycle. During each rule execution cycle, the connector reads one or more records from the input links, converts them to Java objects and sends them to the ODM stage.

The number of records that are retrieved from the input links during each rule execution cycle is governed by batch mode and key mode. You can configure either batch mode or key mode of processing records. The strategy that the connector uses to process records from the first input link is different from the strategy it uses to process records from the remaining input links. The connector internally classifies input links into primary and secondary links. The first input link is the primary link, and the remaining input links are secondary links.

The following information applies to scenarios in which the following conditions are met:

Record processing strategy

The connector explicitly sets values for all IN and IN_OUT parameters of the ruleset before the ruleset is invoked. In some cases the value that is set for a parameter can be the NULL value. But the connector does not invoke a ruleset without previously setting values for all IN and IN_OUT parameters of the ruleset.
When no records are available on the primary link, the connector ensures that no records are available on any of the secondary links and the job completes successfully. However, if there are more records available for some secondary link, the connector rejects them if the reject link is defined for that secondary link and the Leftover record error option is selected for the reject link. Otherwise, the connector logs an error and the job fails.
If records are not available on a secondary link, the action that the connector takes depends on whether the ruleset parameter that is associated with that link is of a Java array type or not. If it is, the connector creates an empty array and sets it as the ruleset parameter value. Otherwise, the connector treats the condition as an error and rejects the records that it received on all the input links (including the primary link), during the current rule execution cycle, but only if the reject links are defined for those input links and the Leftover record error option was selected for the reject links. Otherwise, the connector logs an error and the job fails. The additional rules that apply to the record processing strategy depend on the mode in which the stage is configured to run.

Batch mode

To enable the batch mode, the Enable batch processing stage property must be set to Yes and the Enable key processing stage property must be set to No. A nonnegative integer value n also must be specified for the Batch size property. The default value is 1. In the batch mode, for each rule execution cycle, the connector reads all available records from each of the input links, but not more than n records per link, where n is the batch size that is specified for the stage. The batch size of 0 means unlimited.

When the connector retrieves the necessary number of records on all input links, it converts them to Java objects, which it then sets as the ruleset parameter values and invokes the ruleset. If the ruleset runs successfully, the connector produces records on the output links based on the OUT and IN_OUT ruleset parameter values, and repeats the process for the next ruleset execution. If the ruleset execution fails, the connector logs an error and the job fails.

Consider an example of a job where the ILOG® JRules stage has two input links, with each link containing only one column of type Integer. Let the data received on the input links be specified as follows (the data on the right arrives first):

Link 1

19 17 16 15 14 12 11 10

Link 2

20 18 17 15 12 10 8

Let batch mode be enabled and let the batch size be 2. For this data, there are four rule execution cycles. The data sent in each rule execution cycle is as follows:

Rule execution cycle 1

Link 1: 11 10

Link 2: 10 8

Rule execution cycle 2

Link 1: 14 12

Link 2: 15 12

Rule execution cycle 3

Link 1: 16 15

Link 2: 18 17

Rule execution cycle 4

Link 1:19 17

Link 2: 20

Key mode

To enable the key mode, the Enable key processing stage property must be set to Yes. You must select the input link columns to serve as the key columns by using the Key column[n] property, where n is a positive integer and represents the index of the key column. The indexes start at 1 for the first selected key column and increment as more columns are added. The name of each key column to be used as the key column must be specified in the Column name stage property under the corresponding Key column[n] property. At least one key column must be specified, which means that at least Key column[1] property must be present and its Column name child property value must be set. To add a Key column[2], right-click Key column[1] and select Add property value. Repeat the process to add more Key column[n] values. To set the Column name stage property for a particular Key column[n] it is possible to click the Select button displayed on the right side of Column name. The columns that are offered for selection are the columns that exist on all input links.

For each selected key column, it is also necessary to specify the direction in which the records on the input links are sorted in respect to the values of that particular key column. This direction is specified in the Sort direction option under the corresponding Key column[n]. The supported values are Ascending and Descending. The default value is Ascending. When the connector compares two key column values to determine their position relative to each other, it compares the Java values that are created from those column values. For key columns that contain character values (for example VarChar columns), the comparing of values is done in a case-sensitive manner. When the connector compares NULL value to non-NULL value, it considers the NULL value to be smaller than the non-NULL value. For a nullable key column and the ascending sort order the records with NULL values for that key column appear before the records with non-NULL values for that same key column.

When key mode is being run, the connector automatically enforces at run time the sorting of records on its input links based on the key column names and sort directions that are specified in the stage. Also, if the connector stage is configured to run in key mode, in parallel, on more than one node, and if the Partition type field on the Partitioning tab for any input link is set to the default (Auto) value, the connector enforces hash partitioning of the records on that input link and for the hash key uses the key columns that are specified in the stage. This enforcement ensures that the records with the matching key column values are processed by the same stage instance (running on the same node).

The batch size that is specified for the stage in the Batch size property plays an important role in the key mode. If the Enable batch processing property is set to No when Enable key processing is set to Yes, the connector runs in key mode and assumes the batch size of 0 (unlimited). When the connector reads records from the input links in this mode, in addition to the batch size it also monitors the values of the key columns present in the records. For each ruleset execution, the connector reads up to n records from the primary link where n represents the batch size that is specified for the stage. If the batch size n is different from 0, the connector reads up to n records from the link regardless of the key values in those records. However, if the batch size is 0 (unlimited), the connector reads all available records from the primary link that have matching key column values.

After it reads the records from the primary link, the connector proceeds to read records from the remaining secondary links which have key column values that match key column values of any of the records that are retrieved on the primary link for that same rule execution cycle. The number of records that the connector reads from the secondary link depends on whether the ruleset parameter for that link is based on a Java array type or not. If the ruleset parameter is based on an array type, the connector reads all available matching records and uses them to prepare the value for the ruleset parameter. If the ruleset parameter is not based on an array type, the connector reads only one matching record from the link. The connector then invokes the ruleset.

The connector then checks whether, for any of the parameters not based on array types there exists another matching record on the corresponding secondary input link. If such a record exists, the connector uses it for the ruleset parameter value and invokes the ruleset again. When invoking the ruleset again, the connector reuses the value from the previous ruleset execution for each ruleset parameter for which it did not read any new matching records for the current ruleset execution. The connector repeats this process until it cannot find any more new matching records for any of the ruleset parameters. It then proceeds to read new records from the primary link in preparation for the next ruleset execution.

Consider an example of a job where the ODM stage has two input links and each link has two columns. In Link 1, first column is of type Integer and the second column is of type VarChar(5). In Link 2, the first column is of type Integer and the second column is of type Double. Let the data received on the input links be as follows (the data on the right arrives first):

Link 1

(16, Sam) (16, Sam) (16, Sam) (12, John) (12, John) (10, Rick) (10, Rick) (10, Rick)

Link 2

(16, 32.4) (12, 32.22) (12, 24.22) (12, 53.33) (12, 23.233) (10, 32.55) (10, 25.77)

Let key mode be enabled and let the batch size be 0. The first column of type Integer is the key column and the sort order for the key is Ascending. Let the ruleset parameters associated with the input links be of Java array types. Then, for this data there are three rule execution cycles. The data that is sent in each rule execution cycle is as follows:

Rule execution cycle 1

Link 1: (10, Rick) (10, Rick) (10, Rick)

Link 2: (10, 32.55) (10, 25.77)

Rule execution cycle 2

Link 1: (12, John) (12, John)

Link 2: (12, 32.22) (12, 24.22) (12, 53.33) (12, 23.233)

Rule execution cycle 3

Link 1: (16,Sam) (16,Sam) (16,Sam)

Link 2: (16, 32.4)

If for some of the secondary links the connector detects a record with the key column values that do not match the key column values in any of the primary records, and if no records with the matching key column values were previously retrieved on the same link for the same ruleset execution, the connector specifies an empty array for the ruleset parameter value for that link. The empty array is specified only if that ruleset parameter is of a Java array type and only if based on thesorting order that is specified for the column keys the connector determines that there is a possibility for the key column values in the detected record to match the key column values in a record that the connector is yet to retrieve on the primary link.

Otherwise, the connector attempts to reject the records that caused the key mismatch error. The connector compares the key column values of the detected record on the current secondary link and the key column values of the records that it already retrieved on the primary link for the current ruleset execution. When performing this comparison, the connector monitors the sort direction that was specified for the key columns. If it is possible for the next record on the current secondary link to have the key column values that match the key column values of any record that was already retrieved on the primary link, the connector rejects the record that it read on the current secondary link. Otherwise, it rejects the records that it previously retrieved on the remaining input links, including the records from the primary input link.

Consider an example of a job where the the ODM stage has three input links. Let the key column on the three links be C1 of type Integer. Let the value of column C1 on the three links be as follows (data on the right arrives first):

Link 1 (primary): 5 5 5

Link 2: 5 5

Link 3: 6

Let the batch size be set to 0.

The behavior of the connector under various operating modes is as follows:

  1. The ruleset parameters for all the input links are of array type and the sorting order for the column C1 is Ascending. The connector invokes the ruleset and passes three records with C1 value of 5 for the first ruleset parameter, two records with C1 value of 5 for the second ruleset parameter, and 0 records for the third ruleset parameter. The connector determines based on the sort order of column C1 that it is still possible that the next record on the primary link has the C1 value of 6 and that presents a match for the current secondary link record and hence the record on Link 3 is not rejected.
  2. The ruleset parameter for Link 3 is not of Java array type and the sorting order is Ascending. In this case, the connector rejects the records with the C1 value of 5 that it already retrieved on the first and second input links, because the records that potentially arrive next on the third input link after the record with C1 value of 6 can have only the C1 value of 6 or greater, so none of them are able to match the C1 value of 5.
  3. The ruleset parameter for Link 3 is not of Java array type and the sorting order is Descending. In this case, the connector rejects the record with the C1 value 6 that it retrieved on the third input link because there is a possibility that the next record on this link has the C1 value of 5 and therefore match the records that are retrieved on the first two input links.

After the connector rejects the records due to the key mismatch, it keeps trying to prepare ruleset parameter values for the current ruleset execution, by reading records from the first input link from which it rejected the records. This is either the primary link, in which case the connector starts preparing ruleset parameter values from scratch, or the current secondary link in which case the connector proceeds to read records on that link to try to match them with the records it already received on the primary link for the current ruleset execution.

If a record must be rejected due to key mismatch, the connector can do that only if the record to be rejected came from an input link for which the reject link is defined and only if the Key mismatch error option is selected on that reject link. Otherwise, the connector logs an error and the job fails.

In the batch mode, the connector enforces the batch size value of 1 if the ruleset parameter for any of the input links is not based on a Java array type. In the key mode, the connector enforces the batch size value of 1 if the ruleset parameter for the primary link is not based on a Java array type. If at runtime the connector needs to override the Batch size value that is specified by the user and enforce the value of 1, it writes an informational message in the job log about this change.

If the records arrive on the input links of the stage in transaction waves, the connector processes records in each wave independently from the records in another wave. The records from one wave are not combined with the records from another wave so that ruleset parameter values for a single ruleset execution are produced.