mpxlink utility

The mpxlink utility is a cross match program that enables entity linkage.

The mpxlink utility takes comparison results from the mpxcomp utility and creates entity link and task files (.unl files) that can be loaded into the database. This utility can be run from the command line or from InfoSphere® MDM Workbench Master Data Management > New Job Set.

All options and flags are case independent; option values are not independent.

Generating task sets can be a lengthy operation.

If you want to retain existing Enterprise IDs (entrecnos) while doing an incremental cross match (IXM), you must use the correct options:

  • ixmmode: Specify -ixmmode with mpxlink to ensure that entity sets are processed correctly. If -ixmmode is not specified, the mpxlink process starts with the current entity set. Setting -ixmmode causes the mpxlink utility to re-evaluate all entity sets, preserving the previous Enterprise ID when all previous members are present in the new entity set.
  • bxmxeia: This option is required if the existing Enterprise IDs are to be considered when entity sets are formed during the IXM.
  • noTskSets: Specifying -noTskSets reduces utility run time without affecting task creation.
  • noTskRelatedMembers: Specifying -noTskRelatedMembers also reduces utility run time.
  • entrecno: Set an -entrecno value to a number higher than any currentrecno column value in any of the mpi_entlink tables in the system. If you do not set -entrecno in this way, you risk creating overlapping Enterprise IDs, which can result in new members incorrectly being added to an existing entity.
  • audrecno: Set an -audrecno value to a number higher than any current audrecnos in the mpi_audhead tables. If you do not set -audrecno in this way, you risk creating inaccurate entity linkage history data.
  • audhead: Specify -audhead to create an mpi_audhead.unl file as part of the mpxlink operation.

Before you run a utility, make sure that you have set the necessary operational server environment variables. For information about the variables, see the operational server environment variables topic.

Table 1. mpxlink utility options
Option Type Description Default
-entType Name Entity type name. This option identifies the type of entity that is being computed. If you are implementing multiple entity types (for example, identity and household), you must run mpxlink for each type. This option is required and there is no default setting. NONE
-bxmInpDir dirName .bin input directory. The directory where the input binary (.bin) files to link are stored. Input files can be from the mpxcomp utility output, or other processes such as an IXM.

This directory is typically the work directory on the server that hosts your hub configuration. This option is required and there is no default setting.

You can list multiple directories for this option; separate multiple directories with single spaces.

NONE
-bxmOutDir dirName .bin output directory. Indicate where you want the BXM output files to be located. This directory is relative to the projects work directory:

WAS_PROFILE_HOME\installedApps\YOUR_CELL_NAME\MDM-native-IDENTIFIER.ear\native.war\work\project_name\work\bxm_output_dir

Also generate bulk cross match data in the designated BXM output directory.

NONE
-unlOutDir dirName .unl output directory. The directory in which you want the mpxlink output binary files located. Binary output files are used by the relationship linkers. The binary output file is named mpx_bxmxmem.bin.

This directory is typically relative to the work directory on the server that hosts the hub configuration.

Generating the output in binary form is optional; specifying an output directory with this option is what causes binary output to be generated. In other words, if no directory is specified here, no binary output is generated.

NONE
-nMemParts N Number of member partitions (MemParts). MemParts are used to partition the data set. Typically this partition is done for memory considerations. Because the mpxlink utility requires the entire input data set (for example, the binary files of comparison results) to be read into memory at one time, breaking the data set into smaller pieces allows them to fit into available memory.

The MemParts option differs from the MxmParts option in that MemParts breaks up the memHead and memCmpd data files, whereas MxmParts breaks up link and task files (the output of the mpxcomp utility).

The MemParts value set here must be the same as the MemParts value set in mpxcomp, and in the utility that created the input for mpxcomp (for example, mpxfsdvd, mpxprep, or mpxredvd). In other words, the MemParts setting in mpxcomp determines how many partitioned file segments are passed to mpxlink; the mpxlink MemParts setting must accurately reflect the number of partitioned file segments that come from mpxcomp.

There is a performance consideration to partitioning the data set: the higher the MemParts is set, the slower the mpxlink process.

Leave this value set to 1 unless memory is an issue. The maximum value is 100.

1
-nMxmParts N Number of maximum out partitions. Like MemParts, the MxmParts option partitions the output of the mpxcomp process. As with MemParts, this option is used when the output file is too large to be read into memory in its entirety, and needs to be broken up into smaller sections to fit into available memory.

The MxmParts option differs from MemParts in that MxmParts breaks up link and task files (the output of the mpxcomp utility), whereas MemParts breaks up the memHead and memCmpd data files.

The MxmParts value set here must be the same as the MxmParts value set in mpxcomp, which provides the input to mpxlink. In other words, the MxmParts setting in mpxcomp determines how many partitioned file segments are passed to mpxlink. The mpxlink utility MxmParts setting must accurately reflect the number of partitioned file segments that come from the mpxcomp utility.

Leave this value set to 1 unless memory is an issue. The maximum value is 100.

1
-{no}bxmDiff   Use explicit different records from entrule. This option controls whether mpxlink uses existing entity rules when forming entities. For example, if two members in an entity are separated in InfoSphere MDM Inspector, a non-identity rule is created by the operational server. (Likewise if two members are manually linked, an identity rule is created.) The mpxrule utility captures these rules as "same" (identity) or "diff" (non-identity) rules. If you recrossmatch an existing database, including these rules prevents the mpxlink utility from reforming linkages (in the case of diff rules), or force members to be in the same entity (in the case of a "same" rule).

The input data used here is created with the corresponding mpxcomp-bxmDiff option (Use explicit different records from entrule option in InfoSphere MDM Workbench).

-noBxmDiff
-{no}bxmSame   Use explicit same records from entrule. Like -bxmDiff, this option controls whether the mpxlink utility uses existing entity rules when forming entities. See description for the -bxmDiff option.

The input data used here is created with the corresponding mpxcomp utility -bxmSame option (Use explicit same records from entrule in InfoSphere MDM Workbench).

-noBxmSame
-{no}bxmXeia   Use implicit link records from entlink. This option instructs mpxlink to include the output from the mpxxeia utility. The mpxxeia utility captures existing entity data. The input data used here is created with the corresponding mpxcomp utility -bxmXeia option (Use implicit link records from entlink in InfoSphere MDM Workbench). -noBxmXeia
-{no}bxmPD   Use potential duplicate task records from entxtsk. The mpxlink utility uses this data to form review identifier tasks that can be loaded into the database.

The input data used here is created with the corresponding mpxcomp utility -bxmRvid (Use reviewid records from mpxcomp in InfoSphere MDM Workbench).

-noBxmPD
-{no}bxmPL   Use potential linkage task records from the mpxxtask utility (entxtsk), which captures existing task information from the database. -noBxmPL
-{no}bxmRI   Use review identifier task records from the mpxxtask utility (entxtsk), which captures existing task information from the database. -noBxmRI
-{no}bxmRule   Use member rule records from the mpxprep, mpxredvd, or mpxfsdvd utilities. Member rules express the relationship between the survivor and obsolete members in a merge. Because the input data used here is created by default in the mpxprep utility, it is not necessary to specify a corresponding option in mpxprep. -bxmRule
-{no}bxmLink   Use linkage records from the mpxcomp utility. The mpxlink utility uses this data to form entities that can be loaded into the database. The input data used here is created with the corresponding mpxcomp utility -bxmLink option (Use linkage records from mpxcomp in InfoSphere MDM Workbench). -bxmLink
-{no}bxmTask   Use task records from the mpxcomp utility. The mpxlink utility uses this data to form tasks that can be loaded into the database. The input data used here is created with the corresponding mpxcomp utility -bxmTask option (Use task records from mpxcomp in InfoSphere MDM Workbench). -bxmTask
-{no}bxmRvid   Use review identifier records from the mpxcomp utility. The mpxlink utility uses this data to form review identifier tasks that can be loaded into the database.

The input data used here is created with the corresponding mpxcomp utility -bxmRvid option (Use reviewid records from mpxcomp in InfoSphere MDM Workbench).

-bxmRvid
-{no}entLink   Instruct the operational server to write new linkages and entity level tasks to a .unl file (mpi_entlink.unl). -entLink
-{no}entXeia   Instruct the operational server to write historical Enterprise ID data to a .unl file (mpi_entxeia.unl). -entXeia
-{no}entXtsk   Instruct the operational server to write information about tasks related to an entity to a .unl file (mpi_entxtsk.unl). -entXtsk
-{no}seqGen   When specified, this option writes a .unl file that contains updated sequence generator numbers that can then be loaded into the database. The operational server normally updates this table properly on startup. This option is useful for an installation that, when doing multiple links, needs to update the sequence numbers without starting an operational server. -noSeqGen
-{no}tskSets   Compute full task set information. Assigns a task set number to a member in a task. A task set identifies a group (two or more) of records that are explicitly identified as being in a task.

For example, if memrecnos 1, 2, and 3 are in a Potential Duplicate task, they are all assigned tskset=1. If memrecnos 4 and 5 are in Potential Linkage task, they are assigned tskset=2, and so on.

Typically, you would not run mpxlink with the tskSets option on. The TskSets functionality has two purposes.  The first is for reporting, so that you can see how many members are involved in a task (although a member can only belong to one task set, so counts can be off for a member that is the glue member). The count is not updated either. The count is valid for a point-in-time when the task is first created. Therefore, if you have member additions or updates, they are not reflected in the count later.

The second purpose is used to make tasks that will not be automatically broken up by operational server scoring. For example, a member enters the witness protection plan and thus has two sets of demographics. The operational server will not naturally join them in an entity or task, so this option could be used to create a task.

If you manually create a task, the tskSetno gets set just as it does when you run a bulk cross-match (BXM) with the -tskSets option turned on. Setting this option instructs the operational server to, no matter what these members score, leave the task alone. In this case, even if a member has demographic changes that would cause the task to go away, the operational server will ignore it.

There may be some circumstances where you might want this behavior (for example, in a data quality remediation [DQR] or some other process where you require the task population to remain static). However, the majority of implementations would not want to use this default behavior.

-noTskSets
-{no}tskRelatedMembers   Create a count of members in a task so that when you have a trigger member, you can tell that there are n members in the task. The count is only calculated when a member is cross matched. -tskRelatedMembers
-{no}strict   Forces xeia (entity linkage) information to default to existing information (rules and prior data). Setting this option to -strict makes the mpxlink utility sensitive to anomalies in the data.

Disable this option to instruct mpxlink to ignore anomalies in the data. For example, inconsistencies or discrepancies that arise from live updates to the table. (That is, discrepancies that might occur because data is changing from updates as it is being collected by the mpx utilities that create the input for mpxlink.)

This option is typically used for reporting purposes.

-strict
-ixmMode   Indicates IXM mode. Used for IXM only. FALSE
-entRecno N Used with .unl only. The starting entity record number for the .unl.

The option allows for specifying an entity record number to start with for the creation of the mpi_entlink_xx.unl file. The parameter is optional. If not set, then the mpxlink utility defaults to applying 1 as the starting entity record number.

1
-tskRecno N Used with .unl only. Allows for specification of a starting task record number in the .unl file. This option reads the tskrecno from the mpi_seqgen table. mpi_seqgen.tskrecno
-audRecno N Used with .unl only. Common audRecno for all .unl files. This option sets the audit record number for the .unl files that are loaded into the mpi_audhead database table. When the -{no}audHead option (Write mpi_audhead.unl in InfoSphere MDM Workbench) is enabled, you can set the -audRecno option to an existing mpi_audhead record number. 2
-usrRecno N Used with .unl only. Common usrRecno for all .unl files. This option sets the user record number for the .unl files that are loaded into the mpi_audhead database table. When the -{no}audHead option (Write mpi_audhead.unl in InfoSphere MDM Workbench is enabled, you can set this option to an existing mpi_usrhead user record number. 1
-ixnRecno N Used with .unl only. This setting is the ixnRecno for audhead record. This option sets the transaction record number for the .unl files that are loaded into the mpi_audhead database table. When the -{no}audHead option (Write mpi_audhead.unl in InfoSphere MDM Workbench is enabled, you can set this option to an existing mpi_ixnhead user record number. 71
-evtTypeno N Used with .unl only. This setting is the evtTypeno for the audhead record. Use this option to specify an event type for the audhead records. When the -{no}audHead option (Write mpi_audhead.unl in InfoSphere MDM Workbench is enabled, you can set this option to an existing mpi_evttype event type number. 0
-{no}audHead   Used with .unl only. Writes mpi_audhead.unl file, and uses the audrecno specified in the -audRecno option. (Common audit record number for all .unl option.) This option is commonly used in new implementations where no audit records exist yet. -noAudHead
-bktOutDir dirName Used with NTE only; the output directory for the BXM files. NONE
-entBktd   Used with NTE only. Write entBktd information. Use this option and the -bktOutDir together to allow the mpxlink utility to generate a binary bucket file that is used by the mpxcomp utility to rescore members that exist in the same transitive entity. This is used for non-transtive entities to get scores between members who were brought together by a "glue" member and would not have a score generated by our traditional binary bucket file generated during the mpxprep or mpxfsdvd process. This setting allows a second pass that uses the mpxcomp and mpxlink utilities to produce accurate non-transitive entities. Although non-transitive entities can be produced with a single pass through the mpxcomp and mpxlink utilities, the two-pass approach improves accuracy. FALSE