mpxlink utility
The mpxlink utility is a cross match program that enables entity linkage.
The mpxlink utility takes comparison results from the mpxcomp utility and creates entity link and task files (.unl files) that can be loaded into the database. This utility can be run from the command line or from InfoSphere® MDM Workbench .
All options and flags are case independent; option values are not independent.
Generating task sets can be a lengthy operation.
If you want to retain existing Entity IDs (entrecnos) while doing an incremental cross match (IXM), you must use the correct options:
ixmmode
: Specify-ixmmode
with mpxlink to ensure that entity sets are processed correctly. If-ixmmode
is not specified, the mpxlink process starts with the current entity set. Setting-ixmmode
causes the mpxlink utility to re-evaluate all entity sets, preserving the previous Entity ID when all previous members are present in the new entity set.-
bxmxeia
: This option is required if the existing Entity IDs are to be considered when entity sets are formed during the IXM. noTskSets
: Specifying-noTskSets
reduces utility run time without affecting task creation.-
noTskRelatedMembers
: Specifying-noTskRelatedMembers
also reduces utility run time. entrecno
: Set an-entrecno
value to a number higher than any currentrecno column value in any of the mpi_entlink tables in the system. If you do not set-entrecno
in this way, you risk creating overlapping Entity IDs, which can result in new members incorrectly being added to an existing entity.-
audrecno
: Set an-audrecno
value to a number higher than any current audrecnos in the mpi_audhead tables. If you do not set-audrecno
in this way, you risk creating inaccurate entity linkage history data. -
audhead
: Specify-audhead
to create an mpi_audhead.unl file as part of the mpxlink operation.
Before you run a utility, make sure that you have set the necessary operational server environment variables. For information about the variables, see the operational server environment variables topic.
Option | Type | Description | Default |
---|---|---|---|
-entType |
Name | Entity type name. This option identifies the type of entity that is being computed. If you are implementing multiple entity types (for example, identity and household), you must run mpxlink for each type. This option is required and there is no default setting. | NONE |
-bxmInpDir |
dirName | .bin input directory. The
directory where the input binary (.bin) files
to link are stored. Input files can be from the mpxcomp utility
output, or other processes such as an IXM. This directory is typically the work directory on the server that hosts your hub configuration. This option is required and there is no default setting. You can list multiple directories for this option; separate multiple directories with single spaces. |
NONE |
-bxmOutDir |
dirName | .bin output directory.
Indicate where you want the BXM output files to be located. This directory
is relative to the projects work directory: WAS_PROFILE_HOME\installedApps\YOUR_CELL_NAME\MDM-native-IDENTIFIER.ear\native.war\work\project_name\work\bxm_output_dir Also generate bulk cross match data in the designated BXM output directory. |
NONE |
-unlOutDir |
dirName | .unl output directory.
The directory in which you want the mpxlink output
binary files located. Binary output files are used by the relationship
linkers. The binary output file is named mpx_bxmxmem.bin. This directory is typically relative to the work directory on the server that hosts the hub configuration. Generating the output in binary form is optional; specifying an output directory with this option is what causes binary output to be generated. In other words, if no directory is specified here, no binary output is generated. |
NONE |
-nMemParts |
N | Number of member partitions (MemParts). MemParts
are used to partition the data set. Typically this partition is done
for memory considerations. Because the mpxlink utility
requires the entire input data set (for example, the binary files
of comparison results) to be read into memory at one time, breaking
the data set into smaller pieces allows them to fit into available
memory. The MemParts option differs from the MxmParts option in that MemParts breaks up the memHead and memCmpd data files, whereas MxmParts breaks up link and task files (the output of the mpxcomp utility). The MemParts value set here must be the same as the MemParts value set in mpxcomp, and in the utility that created the input for mpxcomp (for example, mpxfsdvd, mpxprep, or mpxredvd). In other words, the MemParts setting in mpxcomp determines how many partitioned file segments are passed to mpxlink; the mpxlink MemParts setting must accurately reflect the number of partitioned file segments that come from mpxcomp. There is a performance consideration to partitioning the data set: the higher the MemParts is set, the slower the mpxlink process. Leave this value set to 1 unless memory is an issue. The maximum value is 100. |
1 |
-nMxmParts |
N | Number of maximum out partitions. Like MemParts,
the MxmParts option partitions the output of the mpxcomp process.
As with MemParts, this option is used when the output file is too
large to be read into memory in its entirety, and needs to be broken
up into smaller sections to fit into available memory. The MxmParts option differs from MemParts in that MxmParts breaks up link and task files (the output of the mpxcomp utility), whereas MemParts breaks up the memHead and memCmpd data files. The MxmParts value set here must be the same as the MxmParts value set in mpxcomp, which provides the input to mpxlink. In other words, the MxmParts setting in mpxcomp determines how many partitioned file segments are passed to mpxlink. The mpxlink utility MxmParts setting must accurately reflect the number of partitioned file segments that come from the mpxcomp utility. Leave this value set to 1 unless memory is an issue. The maximum value is 100. |
1 |
-{no}bxmDiff |
Use explicit different records from entrule.
This option controls whether mpxlink uses existing
entity rules when forming entities. For example, if two members in
an entity are separated in InfoSphere MDM
Inspector,
a non-identity rule is created by the operational server. (Likewise
if two members are manually linked, an identity rule is created.)
The mpxrule utility captures these rules as "same"
(identity) or "diff" (non-identity) rules. If you recrossmatch an
existing database, including these rules prevents the mpxlink utility
from reforming linkages (in the case of diff rules), or force members
to be in the same entity (in the case of a "same" rule). The input
data used here is created with the corresponding mpxcomp |
-noBxmDiff |
|
-{no}bxmSame |
Use explicit same records from entrule. Like -bxmDiff ,
this option controls whether the mpxlink utility
uses existing entity rules when forming entities. See description
for the -bxmDiff option. The input data used here
is created with the corresponding mpxcomp utility |
-noBxmSame |
|
-{no}bxmXeia |
Use implicit link records from entlink. This
option instructs mpxlink to include the output
from the mpxxeia utility. The mpxxeia utility
captures existing entity data. The input data used here is created
with the corresponding mpxcomp utility -bxmXeia option
(Use implicit link records from entlink in InfoSphere
MDM Workbench). |
-noBxmXeia |
|
-{no}bxmPD |
Use potential duplicate task records from entxtsk.
The mpxlink utility uses this data to form review
identifier tasks that can be loaded into the database. The input
data used here is created with the corresponding mpxcomp utility |
-noBxmPD |
|
-{no}bxmPL |
Use potential linkage task records from the mpxxtask utility (entxtsk), which captures existing task information from the database. | -noBxmPL |
|
-{no}bxmRI |
Use review identifier task records from the mpxxtask utility (entxtsk), which captures existing task information from the database. | -noBxmRI |
|
-{no}bxmRule |
Use member rule records from the mpxprep, mpxredvd, or mpxfsdvd utilities. Member rules express the relationship between the survivor and obsolete members in a merge. Because the input data used here is created by default in the mpxprep utility, it is not necessary to specify a corresponding option in mpxprep. | -bxmRule |
|
-{no}bxmLink |
Use linkage records from the mpxcomp utility.
The mpxlink utility uses this data to form entities
that can be loaded into the database. The input data used here is
created with the corresponding mpxcomp utility -bxmLink option
(Use linkage records from mpxcomp in InfoSphere
MDM Workbench). |
-bxmLink |
|
-{no}bxmTask |
Use task records from the mpxcomp utility.
The mpxlink utility uses this data to form tasks
that can be loaded into the database. The input data used here is
created with the corresponding mpxcomp utility -bxmTask option
(Use task records from mpxcomp in InfoSphere
MDM Workbench). |
-bxmTask |
|
-{no}bxmRvid |
Use review identifier records from the mpxcomp utility.
The mpxlink utility uses this data to form review
identifier tasks that can be loaded into the database. The input
data used here is created with the corresponding mpxcomp utility |
-bxmRvid |
|
-{no}entLink |
Instruct the operational server to write new linkages and entity level tasks to a .unl file (mpi_entlink.unl). | -entLink |
|
-{no}entXeia |
Instruct the operational server to write historical Entity ID data to a .unl file (mpi_entxeia.unl). | -entXeia |
|
-{no}entXtsk |
Instruct the operational server to write information about tasks related to an entity to a .unl file (mpi_entxtsk.unl). | -entXtsk |
|
-{no}seqGen |
When specified, this option writes a .unl file that contains updated sequence generator numbers that can then be loaded into the database. The operational server normally updates this table properly on startup. This option is useful for an installation that, when doing multiple links, needs to update the sequence numbers without starting an operational server. | -noSeqGen |
|
-{no}tskSets |
Compute full task set information. Assigns a
task set number to a member in a task. A task set identifies a group
(two or more) of records that are explicitly identified as being in
a task. For example, if memrecnos 1, 2, and 3 are in a Potential Duplicate task, they are all assigned tskset=1. If memrecnos 4 and 5 are in Potential Linkage task, they are assigned tskset=2, and so on. Typically, you would not run mpxlink with the tskSets option on. The TskSets functionality has two purposes. The first is for reporting, so that you can see how many members are involved in a task (although a member can only belong to one task set, so counts can be off for a member that is the glue member). The count is not updated either. The count is valid for a point-in-time when the task is first created. Therefore, if you have member additions or updates, they are not reflected in the count later. The second purpose is used to make tasks that will not be automatically broken up by operational server scoring. For example, a member enters the witness protection plan and thus has two sets of demographics. The operational server will not naturally join them in an entity or task, so this option could be used to create a task. If you manually create a task, the tskSetno gets set just as it does when you run a bulk cross-match (BXM) with the -tskSets option turned on. Setting this option instructs the operational server to, no matter what these members score, leave the task alone. In this case, even if a member has demographic changes that would cause the task to go away, the operational server will ignore it. There may be some circumstances where you might want this behavior (for example, in a data quality remediation [DQR] or some other process where you require the task population to remain static). However, the majority of implementations would not want to use this default behavior. |
-noTskSets |
|
-{no}tskRelatedMembers |
Create a count of members in a task so that when you have a trigger member, you can tell that there are n members in the task. The count is only calculated when a member is cross matched. | -tskRelatedMembers |
|
-{no}strict |
Forces xeia (entity linkage) information to
default to existing information (rules and prior data). Setting this
option to -strict makes the mpxlink utility
sensitive to anomalies in the data. Disable this option to instruct mpxlink to ignore anomalies in the data. For example, inconsistencies or discrepancies that arise from live updates to the table. (That is, discrepancies that might occur because data is changing from updates as it is being collected by the mpx utilities that create the input for mpxlink.) This option is typically used for reporting purposes. |
-strict |
|
-ixmMode |
Indicates IXM mode. Used for IXM only. | FALSE | |
-entRecno |
N | Used with .unl only. The
starting entity record number for the .unl. The option allows for specifying an entity record number to start with for the creation of the mpi_entlink_xx.unl file. The parameter is optional. If not set, then the mpxlink utility defaults to applying 1 as the starting entity record number. |
1 |
-tskRecno |
N | Used with .unl only. Allows for specification of a starting task record number in the .unl file. This option reads the tskrecno from the mpi_seqgen table. | mpi_seqgen.tskrecno |
-audRecno |
N | Used with .unl only. Common
audRecno for all .unl files. This option sets
the audit record number for the .unl files that
are loaded into the mpi_audhead database table. When the -{no}audHead option
(Write mpi_audhead.unl in InfoSphere
MDM Workbench)
is enabled, you can set the -audRecno option to an
existing mpi_audhead record number. |
2 |
-usrRecno |
N | Used with .unl only. Common
usrRecno for all .unl files. This option sets
the user record number for the .unl files that
are loaded into the mpi_audhead database table. When the -{no}audHead option
(Write mpi_audhead.unl in InfoSphere
MDM Workbench is
enabled, you can set this option to an existing mpi_usrhead user record
number. |
1 |
-ixnRecno |
N | Used with .unl only. This
setting is the ixnRecno for audhead record. This option sets the transaction
record number for the .unl files that are loaded
into the mpi_audhead database table. When the -{no}audHead option
(Write mpi_audhead.unl in InfoSphere
MDM Workbench is
enabled, you can set this option to an existing mpi_ixnhead user record
number. |
71 |
-evtTypeno |
N | Used with .unl only. This
setting is the evtTypeno for the audhead record. Use this option to
specify an event type for the audhead records. When the -{no}audHead option
(Write mpi_audhead.unl in InfoSphere
MDM Workbench is
enabled, you can set this option to an existing mpi_evttype event
type number. |
0 |
-{no}audHead |
Used with .unl only. Writes mpi_audhead.unl file,
and uses the audrecno specified in the -audRecno option.
(Common audit record number for all .unl option.)
This option is commonly used in new implementations where no audit
records exist yet. |
-noAudHead |
|
-bktOutDir |
dirName | Used with NTE only; the output directory for the BXM files. | NONE |
-entBktd |
Used with NTE only. Write entBktd information.
Use this option and the -bktOutDir together to allow
the mpxlink utility to generate a binary bucket
file that is used by the mpxcomp utility to rescore
members that exist in the same transitive entity. This is used for
non-transtive entities to get scores between members who were brought
together by a "glue" member and would not have a score generated by
our traditional binary bucket file generated during the mpxprep or mpxfsdvd process.
This setting allows a second pass that uses the mpxcomp and mpxlink utilities
to produce accurate non-transitive entities. Although non-transitive
entities can be produced with a single pass through the mpxcomp and mpxlink utilities,
the two-pass approach improves accuracy. |
FALSE |