mpxdata utility

The mpxdata utility uses raw data to build member unload files (.unl), generate comparison strings, assign bucket hashes, and create binary files.

Use this utility to specify memput (member put) or memcompute (member compute) interactions based on file input. You can run mpxdata from InfoSphere® MDM Workbench Master Data Management > New Job Set.

The mpxdata utility runs several steps while running, from parsing data into .unl files to deriving data and organizing member records into buckets. The mpxdata utility also creates binary files, which are used to compare data faster than scanning through strings. The mpxdata utility parses raw data extracts into attribute-specific sets of data. For example, it can take a single record for a person and create one record for the address elements, another for the name elements, and a third for the telephone numbers. Parsing allows the hub to store multiple iterations of active and inactive data (such as a former address or phone number) and increases responsiveness when searching and comparing.

The mpxdata utility logic can process multiple attributes for a single member from the input data file. The attribute rows in the data file are grouped together by the record identifier (source code and memidnum pair), which means all attribute rows for the same member are continuous. Duplicate values are treated as a single value, and empty values are skipped. Active (nsactive) and maximum (nsexist) settings are enforced before the attribute values and derived data are written into the output file.

All options and flags are case independent; option values are not.

Restriction: When running the mpxdata utility, you must use an ODBC connection. Set the MAD_CTXLIB variable to ODBC.

Before you run a utility, make sure that you have set the necessary operational server environment variables. For information about the variables, see the operational server environment variables topic.

Table 1. mpxdata utility options
Option Type Description Default
-ixnCode ixnCode Interaction code, either MEMCOMPUTE or MEMPUT. MEMPUT inserts or updates members in an existing hub database for each record in the input file. MEMCOMPUTE generates .unl files which can then be loaded by using the madunlload utility, or another load utility. When processing an extract, MEMCOMPUTE is most often used because loading a.unl file is faster than inserting each member through MEMPUT. NONE
-putType enumVal Put type (MEMPUT only). Choices are insert_update, insert_only, and update_only. This option works at a member level, not an attribute level.
  • insert_only restricts the operational server to creating a member. If a member exists for this srcCode and memIdnum combination, the interaction fails with an error code of EXISTS.
  • update_only restricts the operational server to updating existing members only. If an attempt is made to update a member that does not exist, the interaction fails with an error code of ENOREC.
  • insert_update adds a member if one does not exist. If the member does exist, an update is made.
MPI_PUTTYPE_
INSERT_UPDATE
-memMode enumVal Member mode (MEMPUT only). Choices are complete, partial, attrcomp, and explicit.
  • Partial is used when a source system sends an update to a member, but you do not know whether the input is a complete picture of the member, or if you have the complete range of values for a given attribute.
  • Attrcomp stands for attribute complete. Like the partial mode, the attrcomp mode tells the operational server that it might not have a complete picture of all the attributes that make a complete member. However, for the attributes that are present, all known values for the member are included in the member put interaction.
  • Complete tells the engine that the input to the member put interaction contains all of the values for all of the attributes defined for this member type.
  • Explicit is used in situations where you want to control exactly what is stored for the member and the record status of the attributes that is being stored.
MPI_
MEMMODE_
COMPLETE
-entPrior N Sets the entity management priority. Use this option when you want to set the priority at member write and override any default entity priority previously set for the associated source. source priority (Default Entity Priority setting in InfoSphere MDM Workbench)
-config fileName Name of configuration file. This file is a specially formatted file that defines the fields for the data input file. NONE
-recSize N Fixed-length record size (for fixed-length input files). Add the appropriate end-of-line characters to this value. NONE
-fldDelim delimChar Field delimiter character for variable length record fields |
-inpFile fileName Input file name defined by the configuration file name and either fixed length or delimited by the field delimiter (fldDelim) character. The location of this file is in the hub instance directory project workspace: (WAS_PROFILE_HOME\installedApps\YOUR_CELL_NAME\MDM-native-IDENTIFIER.ear\native.war\work\project_name\work). input.dat
-rejFile filename Rejected record file. If the mpxdata utility is unable to parse data as it reads each row in the input file, it writes that data to the Rejects file. The mpxdata utility continues to parse remaining data, adding any additional rejected data to the rejects file. The default file name is rejects.txt. reject.dat
-maxRecs N Maximum number of records to process before ending the mpxdata process. This option is disabled when the Process all records option is selected. Unlimited
-maxErrs N Maximum number of errors allowed before stopping the process. This option sets a threshold for errors in the data. After the threshold is reached, the mpxdata utility stops (that is not considered an mpxdata error). You can use this option to process an extract with tolerance for known data issues. For example, if the delimited extract file has too few or too many delimiters in a few records, you can set this option to an expected value. If the value is exceeded, mpxdata stops and gives you an opportunity to resolve the problem in the extract data or configuration file. The mpxdata utility writes records it cannot parse to the rejects file. The total error count (totErrs) is reported in the mpxdata log as an INFO message:

06:59:53 mpxdata INFO MPX_BxmData: totRecs=6, totErrs=3, elapsed=1 secs., recs/sec=6, minbkttag=0, maxbkttag=0, nMemParts=1, nBktParts=1, buffsize=65536

100
-skipRecs N Number of records to skip before beginning processing. If there are any rows of text in the input file before the data rows begin, indicate how many rows to ignore. The number of skipped rows does not include lines that are commented out with the hash (#) character. 0
-rptRecs N Report records processed interval. The mpxdata log reports a status every n records. You might want to decrease the frequency to reduce the log output for large data sets, or increase it to get more granularity. 100000
-buffSize N Size for each file input and output buffer 65536
-verbose   Show progress information FALSE
-noexec   Show SQL statements only; no execution is processed FALSE
-encoding   Encoding of .unl files; options are LATIN1, UTF8, or UTF16 LATIN1
-methods   Output the method data, but do not process data. NONE
-version   Output the version information NONE
-memRecno N MEMCOMPUTE ONLY

Starting memrecno. The value supplied is used as the first memrecno in the .unl files and is incremented by one for each additional record.

1
-audRecno N MEMCOMPUTE ONLY

Common audrecno. The value supplied is used as the first audrecno in the .unl files and is incremented by one for each additional record.

2
-historicalAudhead   MEMCOMPUTE ONLY

When this is enabled, the audit record number (audrecno) is incremented by one for each history version of the record. With the -historicalAudhead option enabled, the behavior of -audRecno is overwritten.

Remember: The behavior of -audRecno is that the audit record number is incremented by one for each additional record. This means that, for a record containing multiple historical versions, the audit record number for these historical versions is the same. The -historicalAudhead option changes this setting.
FALSE
-unlOutDir dirName MEMCOMPUTE ONLY

This option identifies the .unl file output directory. Used with -unlOutSegs, this option instructs mpxdata to create .unl files after reading and parsing the extract file.

NONE
-unlOutSegs segList MEMCOMPUTE ONLY

The Attribute segments to include in the output. Used with -unlOutDir, this option instructs mpxdata to create .unl files after reading and parsing the extract file. Use this option to select the attributes (segments) to be included in the .unl output files.

NONE
-unlAudSegs segList Use this option to specify the audit segments you want to create during the mpxdata process and include in the .unl files. Use this option to preserve the record creation or last modified time and map it to the evtctime field, as well as mapping additional source system information to the evtType, evtInitiator, and evtLocation audit fields. The mapping of these fields is available in the .cfg file specified by the -config option. You can choose audhead or audxmem. When using the -unlAudSegs option, the -audhead option is unavailable and the -audRecno indicates the starting audrecno for the records that are being created. NONE
-bxmOutDir dirName MEMCOMPUTE ONLY

.bin output directory. Specifies the directory where the bulk cross match (BXM) files are saved.

NONE
-{no}bxmBktd   Generate member bucket (MEMBKTD) output -bxmBktd
-{no}bxmCmpd   Generate member comparison data (MEMCMPD) output -bxmCmpd
-{no}bxmQryd   Generate member query data (MEMQRYD) output. This option is used with the relationship linker and instructs mpxdata to create BXM files that contain query data. The relationship types, attributes, and rules must already be defined, so that mpxdata knows what data to include in the BXM file. -bxmQryd
-nMemParts N MEMCOMPUTE ONLY

Number of member partitions. Setting this partition depends on the size of your data set, your algorithms, and how much memory you have on the server. The utility that uses the mpxdata output (such as mpxfreq) must use a matching memparts value. Leave this option at the default unless you need more memory. The higher the member partition the slower your mpxcomp process, as the hub must do more duplicate comparisons.

1
-nBktParts N MEMCOMPUTE ONLY

Number of bucket partitions. Setting this partition depends on the size of your data set, your algorithms, and how much memory you have on the hub server. Leave this option at the default unless you need more memory.

1
-minBktTag N MEMCOMPUTE ONLY

Minimum bucket tag to use (0=any). Specifies the lowest bucketing role to be included in the operation.

0
-maxBktTag N MEMCOMPUTE ONLY

Maximum bucket tag to use (0=any). Specifies the highest bucketing role to be included in the operation.

0
-nQryParts N MEMCOMPUTE ONLY

Number of query partitions. Setting this partition depends on the size of your data set, your algorithms, and how much memory you have on the hub server. Leave this option at the default unless you need more memory. This option is enabled only when the option to Generate query BXM is also enabled.

1
-minQryRole N MEMCOMPUTE ONLY

Minimum query role to use (0=any) when the Generate query BXM option is enabled. This option specifies the lowest query role to be included in the operation.

0
-audhead   MEMCOMPUTE ONLY

Write audhead records. When enabled, audhead records are written to .unl files (to be uploaded to the database later).

FALSE
-append   MEMCOMPUTE ONLY

Append to .unl files. This option applies only to the .unl files. If you are processing multiple extract data files (for example, from different sources), mpxdata writes new (or overwrites existing) .unl files when this option is not used. If used, the new .unl data written by mpxdata is added to the end of the existing .unl file.

FALSE
-memType memName MEMCOMPUTE ONLY

Member type name. This option sets a filter on the output of mpxdata for the specified member type. Setting this field to ALL processes all member types.

NONE
-entType entName MEMCOMPUTE ONLY

Entity type name.

NONE
-strInpDir dirName Allows specification of a directory that contains string (str) tables as .unl files to update or append the contents in the dictionary. NONE