mpxdata utility
The mpxdata utility uses raw data to build member unload files (.unl), generate comparison strings, assign bucket hashes, and create binary files.
Use this utility to specify memput (member put) or memcompute (member compute) interactions based on file input. You can run mpxdata from InfoSphere® MDM Workbench .
The mpxdata utility runs several steps while running, from parsing data into .unl files to deriving data and organizing member records into buckets. The mpxdata utility also creates binary files, which are used to compare data faster than scanning through strings. The mpxdata utility parses raw data extracts into attribute-specific sets of data. For example, it can take a single record for a person and create one record for the address elements, another for the name elements, and a third for the telephone numbers. Parsing allows the hub to store multiple iterations of active and inactive data (such as a former address or phone number) and increases responsiveness when searching and comparing.
The mpxdata utility logic can process multiple attributes for a single member from the input data file. The attribute rows in the data file are grouped together by the record identifier (source code and memidnum pair), which means all attribute rows for the same member are continuous. Duplicate values are treated as a single value, and empty values are skipped. Active (nsactive) and maximum (nsexist) settings are enforced before the attribute values and derived data are written into the output file.
All options and flags are case independent; option values are not.
Before you run a utility, make sure that you have set the necessary operational server environment variables. For information about the variables, see the operational server environment variables topic.
Option | Type | Description | Default |
---|---|---|---|
-ixnCode |
ixnCode | Interaction code, either MEMCOMPUTE or MEMPUT. MEMPUT inserts or updates members in an existing hub database for each record in the input file. MEMCOMPUTE generates .unl files which can then be loaded by using the madunlload utility, or another load utility. When processing an extract, MEMCOMPUTE is most often used because loading a.unl file is faster than inserting each member through MEMPUT. | NONE |
-putType |
enumVal | Put type (MEMPUT only). Choices are insert_update, insert_only,
and update_only. This option works at a member level, not an attribute
level.
|
MPI_PUTTYPE_ INSERT_UPDATE |
-memMode |
enumVal | Member mode (MEMPUT only). Choices are complete, partial, attrcomp,
and explicit.
|
MPI_ MEMMODE_ COMPLETE |
-entPrior |
N | Sets the entity management priority. Use this option when you want to set the priority at member write and override any default entity priority previously set for the associated source. | source priority (Default Entity Priority setting in InfoSphere MDM Workbench) |
-config |
fileName | Name of configuration file. This file is a specially formatted file that defines the fields for the data input file. | NONE |
-recSize |
N | Fixed-length record size (for fixed-length input files). Add the appropriate end-of-line characters to this value. | NONE |
-fldDelim |
delimChar | Field delimiter character for variable length record fields | | |
-inpFile |
fileName | Input file name defined by the configuration file name and either fixed length or delimited by the field delimiter (fldDelim) character. The location of this file is in the hub instance directory project workspace: (WAS_PROFILE_HOME\installedApps\YOUR_CELL_NAME\MDM-native-IDENTIFIER.ear\native.war\work\project_name\work). | input.dat |
-rejFile |
filename | Rejected record file. If the mpxdata utility is unable to parse data as it reads each row in the input file, it writes that data to the Rejects file. The mpxdata utility continues to parse remaining data, adding any additional rejected data to the rejects file. The default file name is rejects.txt. | reject.dat |
-maxRecs |
N | Maximum number of records to process before ending the mpxdata process.
This option is disabled when the Process all records option
is selected. |
Unlimited |
-maxErrs |
N | Maximum number of errors allowed before stopping the process.
This option sets a threshold for errors in the data. After the threshold
is reached, the mpxdata utility stops (that is
not considered an mpxdata error). You can use this
option to process an extract with tolerance for known data issues.
For example, if the delimited extract file has too few or too many
delimiters in a few records, you can set this option to an expected
value. If the value is exceeded, mpxdata stops
and gives you an opportunity to resolve the problem in the extract
data or configuration file. The mpxdata utility
writes records it cannot parse to the rejects file. The total error
count (totErrs) is reported in the mpxdata log
as an INFO message:
|
100 |
-skipRecs |
N | Number of records to skip before beginning processing. If there are any rows of text in the input file before the data rows begin, indicate how many rows to ignore. The number of skipped rows does not include lines that are commented out with the hash (#) character. | 0 |
-rptRecs |
N | Report records processed interval. The mpxdata log reports a status every n records. You might want to decrease the frequency to reduce the log output for large data sets, or increase it to get more granularity. | 100000 |
-buffSize |
N | Size for each file input and output buffer | 65536 |
-verbose |
Show progress information | FALSE | |
-noexec |
Show SQL statements only; no execution is processed | FALSE | |
-encoding |
Encoding of .unl files; options are LATIN1, UTF8, or UTF16 | LATIN1 | |
-methods |
Output the method data, but do not process data. | NONE | |
-version |
Output the version information | NONE | |
-memRecno |
N | MEMCOMPUTE ONLY Starting memrecno. The value supplied is used as the first memrecno in the .unl files and is incremented by one for each additional record. |
1 |
-audRecno |
N | MEMCOMPUTE ONLY Common audrecno. The value supplied is used as the first audrecno in the .unl files and is incremented by one for each additional record. |
2 |
-historicalAudhead |
MEMCOMPUTE ONLY When this is enabled, the audit record number ( Remember: The behavior of
-audRecno is that the audit
record number is incremented by one for each additional record. This means that, for a record
containing multiple historical versions, the audit record number for these historical versions is
the same. The -historicalAudhead option changes this setting. |
FALSE | |
-unlOutDir |
dirName | MEMCOMPUTE ONLY This option identifies the |
NONE |
-unlOutSegs |
segList | MEMCOMPUTE ONLY The Attribute segments to include in the
output. Used with |
NONE |
-unlAudSegs |
segList | Use this option to specify the audit segments
you want to create during the mpxdata process and
include in the .unl files. Use this option to
preserve the record creation or last modified time and map it to the
evtctime field, as well as mapping additional source system information
to the evtType, evtInitiator, and evtLocation audit fields. The mapping
of these fields is available in the .cfg file
specified by the -config option. You can choose audhead
or audxmem. When using the -unlAudSegs option, the -audhead option
is unavailable and the -audRecno indicates the starting
audrecno for the records that are being created. |
NONE |
-bxmOutDir |
dirName | MEMCOMPUTE ONLY .bin output directory. Specifies the directory where the bulk cross match (BXM) files are saved. |
NONE |
-{no}bxmBktd |
Generate member bucket (MEMBKTD) output | -bxmBktd |
|
-{no}bxmCmpd |
Generate member comparison data (MEMCMPD) output | -bxmCmpd |
|
-{no}bxmQryd |
Generate member query data (MEMQRYD) output. This option is used with the relationship linker and instructs mpxdata to create BXM files that contain query data. The relationship types, attributes, and rules must already be defined, so that mpxdata knows what data to include in the BXM file. | -bxmQryd |
|
-nMemParts |
N | MEMCOMPUTE ONLY Number of member partitions. Setting this partition depends on the size of your data set, your algorithms, and how much memory you have on the server. The utility that uses the mpxdata output (such as mpxfreq) must use a matching memparts value. Leave this option at the default unless you need more memory. The higher the member partition the slower your mpxcomp process, as the hub must do more duplicate comparisons. |
1 |
-nBktParts |
N | MEMCOMPUTE ONLY Number of bucket partitions. Setting this partition depends on the size of your data set, your algorithms, and how much memory you have on the hub server. Leave this option at the default unless you need more memory. |
1 |
-minBktTag |
N | MEMCOMPUTE ONLY Minimum bucket tag to use (0=any). Specifies the lowest bucketing role to be included in the operation. |
0 |
-maxBktTag |
N | MEMCOMPUTE ONLY Maximum bucket tag to use (0=any). Specifies the highest bucketing role to be included in the operation. |
0 |
-nQryParts |
N | MEMCOMPUTE ONLY Number of query partitions. Setting this partition depends on the size of your data set, your algorithms, and how much memory you have on the hub server. Leave this option at the default unless you need more memory. This option is enabled only when the option to Generate query BXM is also enabled. |
1 |
-minQryRole |
N | MEMCOMPUTE ONLY Minimum query role to use (0=any) when the |
0 |
-audhead |
MEMCOMPUTE ONLY Write audhead records. When enabled, audhead records are written to .unl files (to be uploaded to the database later). |
FALSE | |
-append |
MEMCOMPUTE ONLY Append to .unl files. This option applies only to the .unl files. If you are processing multiple extract data files (for example, from different sources), mpxdata writes new (or overwrites existing) .unl files when this option is not used. If used, the new .unl data written by mpxdata is added to the end of the existing .unl file. |
FALSE | |
-memType |
memName | MEMCOMPUTE ONLY Member type name. This option sets a filter on the output of mpxdata for the specified member type. Setting this field to ALL processes all member types. |
NONE |
-entType |
entName | MEMCOMPUTE ONLY Entity type name. |
NONE |
-strInpDir |
dirName | Allows specification of a directory that contains string (str) tables as .unl files to update or append the contents in the dictionary. | NONE |