Derive Data from UNLs (mpxfsdvd)

This Derive Data from UNLs (mpxfsdvd) job is a data derivation method that uses pre-existing member unload files to extract and create comparison strings, bucket hashes, and binaries. It is most commonly used when you have made changes to your algorithm, but the data itself has not changed.

The following table describes the job options.

Table 1. Derive Data from UNLs (mpxfsdvd) options
Workbench Command line Description
Member Type -memType If you have multiple member types in the operational server database and only need to derive data for one of those member types, select the member type here; otherwise, select ALL.

Default: ALL

Inputs and Outputs
UNL input directory -unlInpDir Derive Data from UNLs (mpxfsdvd) reads the member attribute data (specified on the Attribute Types tab) from the UNL files in the directory specified here. This directory is relative to the project work directory on the operational server:

<MDM_INSTALL_HOME>\inst\mpinet_<MDM_INSTANCE_NAME>\work\projectname\work\UNL_input_dir

Default: unl

Generate UNL output -unlOutDir and
-unlOutSegs used together
Indicates whether Derive Data from UNLs (mpxfsdvd) should generate UNL files during processing. Also with ‑unlOutSegs, instructs the job to create UNL files containing bucketing data or comparison data, or instructs the job to generate query UNL files during processing (the files are used by the relationship linker).

Default: enabled

UNL output directory -unlOutDir The output of Derive Data from UNLs (mpxfsdvd) is the derived data segments (comparison, bucket and, optionally, query data), which have their own UNL files (mpi_memcmpd.unl, mpi_membktd.unl, and mpi_memqryd.unl) written to the directory specified here. This directory is relative to the project work directory on the operational server:

<MDM_INSTALL_HOME>\inst\mpinet_<MDM_INSTANCE_NAME>\work\projectname\work\UNL_output_dir

Default: unl

Generate Bucket UNL use
-unlOutSegs with -unlOutdir
Instructs Derive Data from UNLs (mpxfsdvd) to create UNL files containing bucketing data.

Default: enabled

Generate Comparison UNL use
-unlOutSegs with -unlOutdir
Instructs Derive Data from UNLs (mpxfsdvd) to create UNL files containing comparison data.

Default: enabled

Generate Query UNL use
-unlOutSegs with -unlOutdir
Instructs Derive Data from UNLs (mpxfsdvd) to generate query UNL files during processing. These files are used by the relationship linker.

Default: disabled

Generate BXM output -{no}bxmCmpd and

-{no}bxmBktd

Instructs Derive Data from UNLs (mpxfsdvd) to generate output files for bulk cross matching.

Default: enabled

BXM output directory -bxmOutDir Indicates where you want the .bxm output files to be located. This directory is relative to the project work directory on the operational server:

<MDM_INSTALL_HOME>\inst\mpinet_<MDM_INSTANCE_NAME>\work\projectname\work\BXM_output_dir

Default: bxm

Generate query BXM -{no}bxmQryd Indicates whether Derive Data from UNLs (mpxfsdvd) should generate query BXM files during processing. These files are used by the relationship linker.

Default: disabled

Generate SQL script for possible missing memheads -{no}HeadSql Generates SQL output. Instructs Derive Data from UNLs (mpxfsdvd) to generate an SQL file in the specified UNL output directory. If a UNL output directory is not specified then the output will be written to the BXM output directory. This SQL file contains a query against the mpi_memhead database table for members that were identified as missing. These members are identified when there is an attribute row that does not have a corresponding head row.

Default: disabled

Performance Tuning

These fields act as a filter to include buckets up to the maximum (maximum bucket role), above a minimum (minimum bucket role) or within a range if both are set to a value greater than 0.

Maximum number of Member partitions -nMemParts Setting this partition depends on the size of your data set, your algorithms, and how much memory you have access to on the operational server. The utility that consumes the Derive Data from UNLs (mpxfsdvd) output (such as Generate Frequency Stats (mpxfreq)) must use a matching “memparts” value. Leave this at the default unless you need the memory. The higher the member partitions the slower your mpxcomp process, as the operational server must do more duplicate comparisons.

Default: 1

Maximum number of Bucket partitions -nBktParts Setting this partition depends on the size of your data set, your algorithms, and how much memory you have access to on the operational server. Leave this at the default unless you need the memory.

Default: 10

Maximum number of Query partitions -nQryParts Setting this partition depends on the size of your data set, your algorithms, and how much memory you have access to on the operational server. Leave this at the default unless you need the memory. This option is enabled only when the option to Generate query BXM is also enabled.

Default: 1

Buffer size -buffSize Size for each file I/O buffer.

Default: 65536

Options
Encoding -encoding Choices are Latin1, UTF8 and UTF16.

Default: latin1

Minimum bucket role -minBktTag The lowest bucketing role designation used in the algorithm to include in the process.

Default: 0

Maximum bucket role -maxBktTag The highest bucketing role designation used in the algorithm to include in the process.

Default: 0

Minimum query role -minQryRole The lowest query role designation used in the algorithm to include in the process. This option is enabled only when the option to Generate query BXM is also enabled.

Default: 10000

Maximum errors before stopping -maxErrs Maximum errors before halting processing. This option sets a threshold for errors in the data. Once the threshold is reached, Derive Data from UNLs (mpxfsdvd) stops processing. The intent of this option is to allow the user to process a set of input UNLs with tolerance for data issues. For example, if the UNL has an incorrect number of fields, the member record is rejected and re-derivation does not complete for that member. The mpxfsdvd utility writes detailed information into the log file including the line number, input file and reason for the rejection.

Default: 100

Number of records to skip -skipRecs Number of member records to skip before re-deriving members from the specified input files. Processing begins with the next member read from MEMHEAD after skipping this number of records.

Default: 0

Maximum number of records to process -maxRecs Maximum number of member records to re-derive from the specified input files. When using this parameter along with skipping member records, this number includes the number skipped.

Default: Process all records

Enable incremental cross match -ixmMode Use this option to enable incremental cross matching. In IXM mode, a subset of members are compared rather than the entire member set. If running a BXM, use the default of false. If running an IXM, set this to true.

Default: disabled

Attribute Types

Here you can select the attribute types that Derive Data from UNLs (mpxfsdvd) reads from the input UNL files.

Log Options
Trace logging   Produces a trace of activity as interactions flow through the system. This option is very verbose and should only be used for short periods of time.
Debug logging   Produces low-level diagnostics used internally by IBM® to identify what was happening on the system before an error condition occurred. This option generates a large amount of output per activity and should only be used for short periods of time.
Attention: Debug logging can potentially include personal member information such as member identification number, name, and so forth.
Timer logging   Produces timings on certain operations to help identify where significant processing time is elapsing.
SQL logging   Outputs the SQL that is sent by the InfoSphere® MDM database layer to the RDBMS. This helps in diagnosing database-related issues. This option can produce large amounts of output depending on the activity.
Audit logging   Produces activity information and non-critical warnings. Often, this option is used when a new system is first implemented to monitor activity.
Algorithm logging   A separate logging level for algorithm-related debug information without the risk of including protected health information (PHI).

Default: disabled