mpxfsdvd utility

The mpxfsdvd utility enables the creation of bulk cross match (BXM) files from .unl files.

This utility is a data derivation method that uses pre-existing member unload files to extract and create comparison strings, bucket hashes, and binaries. It is most commonly used when you have made changes to your algorithm but the data itself has not changed. This utility can be run from a command line or from the InfoSphere® MDM Workbench Virtual MDM > New Job Set.

Keep in mind these items when preparing to run mpxfsdvd:
  • All options and flags are case independent; option values are not.
  • Both -unlInpdir and -unlInpSegs are required.
  • Either one or both of -unlOutdir -unlOutSegs or -bxmOutDir must be specified.
Important: If you want to preserve Entity IDs (entrecnos) when running an incremental cross match (IXM), you must use the -ixmmode option when running the mpxfsdvd utility. If -ixmmode is not specified, the downstream mpxlink utility process starts with the current entity set. Setting -ixmmode causes the mpxlink utility to re-evaluate all entity sets, preserving the previous Entity ID when all previous members are present in the new entity set.

Before you run a utility, make sure that you have set the necessary operational server environment variables. For information about the variables, see the operational server environment variables topic.

Table 1. mpxfsdvd utility options
Option Type Description Default
-unlInpDir dirName Location of .unl files. The mpxfsdvd utility reads the member attribute data from the .unl files in the directory specified here. This directory is relative to the project work directory on the hub:

WAS_PROFILE_HOME\installedApps\YOUR_CELL_NAME\MDM-native-IDENTIFIER.ear\native.war\work\project_name\work\UNL_INPUT_DIR

NONE
-unlInpSegs segList List of segments contained by the .unl files NONE
-unlOutDir dirName .unl file output directory. The output of the mpxfsdvd utility is the derived data segments (comparison, bucket, and, optionally, query data), which have their own .unl files (mpi_memcmpd, mpi_membktd, and mpi_memqryd) written to the directory specified here. This directory is relative to the project work directory on the hub:

WAS_PROFILE_HOME\installedApps\YOUR_CELL_NAME\MDM-native-IDENTIFIER.ear\native.war\work\project_name\work\UNL_OUTPUT_DIR

Used with the ‑unlOutSegs option and indicates whether the mpxfsdvd utility should generate .unl files during processing. Also with -unlOutSegs, instructs mpxfsdvd to create .unl files containing bucket data or comparison data, or instructs mpxfsdvd to generate query .unl files during processing (the files are used by the relationship linker).

NONE
-unlOutSegs segList Attribute segments to output. Used with the -unlOutDir option and indicates whether mpxfsdvd should generate .unl files during processing. Instructs mpxfsdvd to create .unl files containing bucket data or comparison data, or instructs mpxfsdvd to generate query .unl files during processing (the files are used by the relationship linker). NONE
-encoding   Encoding of .unl files; options are LATIN1, UTF8, or UTF16 LATIN1
-bxmOutDir dirName .bin output directory. Indicates where you want the BXM output files to be located. This directory is relative to the project work directory on the hub:

WAS_PROFILE_HOME\installedApps\YOUR_CELL_NAME\MDM-native-IDENTIFIER.ear\native.war\work\project_name\work\BXM_OUTPUT_DIR

NONE
-{no}bxmBktd   Generate MEMBKTD output. -bxmBktd
-{no}bxmCmpd   Generate MEMCMPD output. -bxmCmpd
-{no}bxmQryd   Generate MEMQRYD output. This option is for use with the relationship linker and instructs mpxfsdvd to create BXM files containing query data. The relationship types, attributes, and rules should already be defined so that mpxfsdvd knows what data to include in the BXM file. -bxmQryd
-nMemParts N Number of member partitions. Setting this partition depends on the size of your data set, your algorithms, and how much memory you have access to on the hub. The utility that consumes the mpxfsdvd output (such as mpxfreq) must use a matching “memparts” value. Leave this option at the default unless you need the memory. The higher the member partitions, the slower your mpxcomp process because the hub must do more duplicate comparisons. 1
-nBktParts N Number of bucket partitions. Setting this partition depends on the size of your data set, your algorithms, and how much memory you have access to on the hub. Leave this setting at the default unless you need the memory. 1
-minBktTag N Minimum bucket tag to use (0=any). The lowest bucketing role designation used in the algorithm to include in the process. 0
-maxBktTag N Maximum bucket tag to use (0=any). The highest bucketing role designation used in the algorithm to include in the process. 0
-nQryParts N Number of query partitions. Setting this partition depends on the size of your data set, your algorithms, and how much memory you have access to on the hub. Leave this setting at the default unless you need the memory. This option is enabled only when the option to Generate query BXM is also enabled. 1
-minQryRole N Minimum query role to use (0=all). The lowest query role designation used in the algorithm to include in the process. This option is enabled only when the option to Generate query BXM is also enabled. 0
-buffSize N Size for each file input and output (I/O) buffer. 65536
-memType memName Member type name. If you have multiple member types in the hub database and need to derive data for only one of those member types, select the member type here; otherwise, select ALL. NONE
-entType entName Entity type name NONE
-skipRecs N Number of member records to skip before re-deriving members from the specified input files. Processing begins with the next member read from MEMHEAD after skipping this number of records.

When used with the -maxRecs option, this parameter lets you set a range of members from the specified input file to process.

0
-maxRecs N Maximum number of member records to re-derive from the specified input files. When using this parameter along with skipping member records, this number includes the number skipped.

When used with the -skipRecs option, this parameter lets you set a range of members from the specified input file to process. This option is useful when running multiple instances of mpxfsdvd against the same set of input files

Unlimited
-maxErrs N Maximum errors before halting processing. This option sets a threshold for errors in the data. Once the threshold is reached, the mpxfsdvd utility stops. The intent of this option is to allow you to process a set of input .unl files with tolerance for data issues. For example, if the .unl file has an incorrect number of fields, the member record is rejected and re-derivation does not complete for that member. The mpxfsdvd utility writes detailed information into the log file, including the line number, input file, and reason for the rejection. 100
-{no}HeadSql flag Generates SQL output. Instructs the mpxfsdvd utility to generate an SQL file in the specified .unl output directory. If a .unl output directory is not specified then the output is written to the BXM output directory. This SQL file contains a query against the mpi_memhead table for members that were identified as missing. These members are identified when there is an attribute row that does not have a corresponding head row. -noHeadSql
-ixmmode   This true or false option sets the IXM mode. In IXM mode, a subset of members are compared rather than the entire member set. If running a BXM, use the default of false. If running an IXM, set this option to true. FALSE