Derive Data from UNLs (mpxfsdvd)
This Derive Data from UNLs (mpxfsdvd) job is a data derivation method that uses pre-existing member unload files to extract and create comparison strings, bucket hashes, and binaries. It is most commonly used when you have made changes to your algorithm, but the data itself has not changed.
The following table describes the job options.
Workbench | Command line | Description |
---|---|---|
Member Type | -memType |
If you have multiple member types in the operational server
database and only need to derive data for one of those member types,
select the member type here; otherwise, select ALL. Default: ALL |
Inputs and Outputs | ||
UNL input directory | -unlInpDir |
Derive Data from UNLs (mpxfsdvd) reads the member attribute
data (specified on the Attribute Types tab) from the UNL files in
the directory specified here. This directory is relative to the project
work directory on the operational server:
Default: unl |
Generate UNL output | -unlOutDir and-unlOutSegs used
together |
Indicates whether Derive Data from UNLs (mpxfsdvd) should generate
UNL files during processing. Also with ‑unlOutSegs, instructs the
job to create UNL files containing bucketing data or comparison data,
or instructs the job to generate query UNL files during processing
(the files are used by the relationship linker). Default: enabled |
UNL output directory | -unlOutDir |
The output of Derive Data from UNLs (mpxfsdvd) is the derived
data segments (comparison, bucket and, optionally, query data), which
have their own UNL files (mpi_memcmpd.unl, mpi_membktd.unl,
and mpi_memqryd.unl) written to the directory
specified here. This directory is relative to the project work directory
on the operational server:
Default: unl |
Generate Bucket UNL | use-unlOutSegs with -unlOutdir |
Instructs Derive Data from UNLs (mpxfsdvd) to create UNL files
containing bucketing data. Default: enabled |
Generate Comparison UNL | use-unlOutSegs with -unlOutdir |
Instructs Derive Data from UNLs (mpxfsdvd) to create UNL files
containing comparison data. Default: enabled |
Generate Query UNL | use-unlOutSegs with -unlOutdir |
Instructs Derive Data from UNLs (mpxfsdvd) to generate query
UNL files during processing. These files are used by the relationship
linker. Default: disabled |
Generate BXM output | -{no}bxmCmpd and
|
Instructs Derive Data from UNLs (mpxfsdvd) to generate output
files for bulk cross matching. Default: enabled |
BXM output directory | -bxmOutDir |
Indicates where you want the .bxm output
files to be located. This directory is relative to the project work
directory on the operational server:
Default: bxm |
Generate query BXM | -{no}bxmQryd |
Indicates whether Derive Data from UNLs (mpxfsdvd) should generate
query BXM files during processing. These files are used by the relationship
linker. Default: disabled |
Generate SQL script for possible missing memheads | -{no}HeadSql |
Generates SQL output. Instructs Derive Data from UNLs (mpxfsdvd)
to generate an SQL file in the specified UNL output directory. If
a UNL output directory is not specified then the output will be written
to the BXM output directory. This SQL file contains a query against
the mpi_memhead database table for members that were identified as
missing. These members are identified when there is an attribute row
that does not have a corresponding head row. Default: disabled |
Performance Tuning | These fields act as a filter to include buckets up to the maximum (maximum bucket role), above a minimum (minimum bucket role) or within a range if both are set to a value greater than 0. |
|
Maximum number of Member partitions | -nMemParts |
Setting this partition depends on the size of your data set,
your algorithms, and how much memory you have access to on the operational
server. The utility that consumes the Derive Data from UNLs (mpxfsdvd)
output (such as Generate Frequency Stats (mpxfreq)) must use a matching
“memparts” value. Leave this at the default unless you need the memory.
The higher the member partitions the slower your mpxcomp process,
as the operational server must do more duplicate comparisons. Default: 1 |
Maximum number of Bucket partitions | -nBktParts |
Setting this partition depends on the size of your data set,
your algorithms, and how much memory you have access to on the operational
server. Leave this at the default unless you need the memory. Default: 10 |
Maximum number of Query partitions | -nQryParts |
Setting this partition depends on the size of your data set,
your algorithms, and how much memory you have access to on the operational
server. Leave this at the default unless you need the memory. This
option is enabled only when the option to Generate query BXM is also
enabled. Default: 1 |
Buffer size | -buffSize |
Size for each file I/O buffer. Default: 65536 |
Options | ||
Encoding | -encoding |
Choices are Latin1, UTF8 and UTF16. Default: latin1 |
Minimum bucket role | -minBktTag |
The lowest bucketing role designation used in the algorithm
to include in the process. Default: 0 |
Maximum bucket role | -maxBktTag |
The highest bucketing role designation used in the algorithm
to include in the process. Default: 0 |
Minimum query role | -minQryRole |
The lowest query role designation used in the algorithm to
include in the process. This option is enabled only when the option
to Generate query BXM is also enabled. Default: 10000 |
Maximum errors before stopping | -maxErrs |
Maximum errors before halting processing. This option sets
a threshold for errors in the data. Once the threshold is reached,
Derive Data from UNLs (mpxfsdvd) stops processing. The intent of this
option is to allow the user to process a set of input UNLs with tolerance
for data issues. For example, if the UNL has an incorrect number of
fields, the member record is rejected and re-derivation does not complete
for that member. The mpxfsdvd utility writes detailed information
into the log file including the line number, input file and reason
for the rejection. Default: 100 |
Number of records to skip | -skipRecs |
Number of member records to skip before re-deriving members
from the specified input files. Processing begins with the next member
read from MEMHEAD after skipping this number of records. Default: 0 |
Maximum number of records to process | -maxRecs |
Maximum number of member records to re-derive from the specified
input files. When using this parameter along with skipping member
records, this number includes the number skipped. Default: Process all records |
Enable incremental cross match | -ixmMode | Use this option to enable incremental cross matching. In IXM
mode, a subset of members are compared rather than the entire member
set. If running a BXM, use the default of false. If running an IXM,
set this to true. Default: disabled |
Attribute Types | Here you can select the attribute types that Derive Data from UNLs (mpxfsdvd) reads from the input UNL files. |
|
Log Options | ||
Trace logging | Produces a trace of activity as interactions flow through the system. This option is very verbose and should only be used for short periods of time. | |
Debug logging | Produces low-level diagnostics used internally by IBM® to identify what was happening on the system
before an error condition occurred. This option generates a large
amount of output per activity and should only be used for short periods
of time. Attention: Debug logging can potentially
include personal member information such as member identification
number, name, and so forth.
|
|
Timer logging | Produces timings on certain operations to help identify where significant processing time is elapsing. | |
SQL logging | Outputs the SQL that is sent by the InfoSphere® MDM database layer to the RDBMS. This helps in diagnosing database-related issues. This option can produce large amounts of output depending on the activity. | |
Audit logging | Produces activity information and non-critical warnings. Often, this option is used when a new system is first implemented to monitor activity. | |
Algorithm logging | A separate logging level for algorithm-related debug information
without the risk of including protected health information (PHI). Default: disabled |