Understanding data preparation jobs
One of the optimization requirements of BIL is to only run the validation logic when it is required.
Validation logic is skipped in two cases:
- when the incoming dataset for a particular subtype is empty
- when validation is turned off altogether by the user
To aid the implementation of this requirement, some data preparation jobs are used. A data preparation or prep job, pertaining to a particular RT/ST is used in case validation logic is not executed for that RT/ST.
The preparation jobs are responsible for making sure that, when the validation logic is not executed, the dataset that passes through would still comply with specific interrelated metadata requirements and dataset naming conventions expected by the next processing phase.
There are prep jobs for several Party subtypes (namely Address, ContactMethod, Contact, ContactRel and Identifier) and all of the Contract subtypes (Contract, ContractComponent, ContractCompVal, ContractRole, Alert, RoleLocation, and MiscValue).
The following table lists all the data preparation jobs:
RT/ST | Data Preparation Job |
---|---|
P/A | BL_020_Prep_Address |
P/C | BL_020_Prep_ContactMethod |
P/I | BL_020_Prep_Identifier |
P/O P/P |
BL_020_Prep_Contact |
P/R | BL_020_Prep_ContactRel |
C/C | BL_021_Prep_ContractComponent |
C/H | BL_021_Prep_Contract |
C/L | BL_021_Prep_RoleLocation |
C/M | BL_021_Prep_MiscValue_Contract |
C/R | BL_021_Prep_ContractRole |
C/V | BL_021_Prep_ContractCompVal |
C/T | BL_021_Prep_Alert_Contract |
Understanding the DSPrefix
The prefix of the output datasets produced by the prep jobs is parameterized. Each of the prep jobs listed in the preceding table (apart from BL_020_Prep_ContactRel, and BL_020_Prep_Identifier) requires a String input parameter, namely DSPrefix to be supplied when they are invoked. This enables prep jobs to be reused from two different sequences where prefix values need to be different:
- The preparation jobs are used from BL_020__VS_RI_EC_PARTY and BL_020__VS_EC_CONTRACT job sequences from a point where the next processing phase is Error Consolidation. The dataset prefix value passed from these job sequences into the preparation jobs is therefore the one expected by Error Consolidation, i.e. Valid.
- The preparation jobs are also used from the PREP job sequences BL_020__PREP_PARTY_DS and BL_021__PREP_CONTRACT_DS where the next processing phase is Key Assignment and Load. The dataset prefix value passed from these job sequences is the one expected by Key Assignment and Load phase, for example:. ErrCon_.
Special cases
In case of BL_020_Prep_ContactRel and BL_020_Prep_Identifier, the suffix of the output dataset is required to be parameterized. This is accomplished through the use of a different String parameter called DSSuffix.
The output datasets of BL_020_Prep_ContactRel and BL_020_Prep_Identifier contain party relation information which is used in the Error Consolidation Logic to iteratively drop parties that are related to parties in error. As an input to the Error Consolidation jobs, these output datasets’ names need to contain a suffix value of _0. However, if validation is turned off, no error consolidation is performed, in which case this suffix has to be blank.
This requirement is accounted for by the use of the DSSuffix parameter in these two jobs. When validation is turned on (Error Consolidation is to be executed) the BL_020_Prep_ContactRel and BL_020_Prep_Identifier jobs are referenced from the BL_020__VS_RI_EC_PARTY job sequence (given, input to BL_020_VS_ContactRel and BL_020_VS_Identifier are empty). In this case the DSSuffix value supplied is _0. On the other hand, when validation is off (no Error Consolidation), these two jobs are referenced from the BL_020_PREP_PARTY_DS job sequence in which case the DSSuffix value provided is an empty string.
Correcting metadata
Another common task that all the preparation jobs perform is making sure that the output datasets contain the correct metadata as required by the next processing phase. In many cases, additional columns become required to be added to the output datasets from RI_CONTACT_SUBSET dataset (in case of Party RT/STs) and RI_CONTRACT_SUBSET dataset (for Contract RT/STs).