Derive Data from Server (mpxredvd)

The Derive Data from Server (mpxredvd) job can be used both to derive member data (creating mpi_memcpd.unl and mpi_mekbkd.unl files), and (optionally) to create binary data files for use in a bulk cross match (BXM).

To produce these files, Derive Data from Server (mpxredvd) reads the member data from the database (specifically, from the member tables), rather than from UNL files. Derive Data from Server (mpxredvd) goes through the data line by line to create new buckets and comparison strings. You can select the specific elements (buckets, comparison strings, or binaries) that you want to re-derive.

Table 1. Derive Data from Server (mpxredvd) options
Workbench Command line Description
Member Type -memType If you have multiple member types in the operational server database and only need to derive data for one of those member types, select the member type here; otherwise, select ALL.

Default: ALL

Inputs and Outputs
Generate UNL output -unlOutDir and
-unlOutSegs used together
Indicates whether Derive Data from Server (mpxredvd) should generate UNL files during processing.

Default: enabled

UNL output directory -unlOutDir The output of Derive Data from Server (mpxredvd) is the derived data segments (comparison, bucket and, optionally, query data), which have their own UNL files (mpi_memcmpd.unl, mpi_membktd.unl, and mpi_memqryd.unl) written to the directory specified here. This directory is relative to the project work directory on the operational server:

<MDM_INSTALL_HOME>\inst\mpinet_<MDM_INSTANCE_NAME>\work\
projectname \work\ UNL_output_dir

With ‑unlOutSegs, indicates whether Derive Data from Server (mpxredvd) should generate UNL files during processing. Also with ‑unlOutSegs, creates UNL files containing bucketing data or comparison data, or instructs the job to generate query UNL files during processing (the files are used by the relationship linker).

Default: unl

Generate Bucket UNL use
-unlOutSegs with -unlOutdir
Instructs Derive Data from Server (mpxredvd) to create UNL files containing bucketing data.

Default: enabled

Generate Comparison UNL use
-unlOutSegs with -unlOutdir
Instructs Derive Data from Server (mpxredvd) to create UNL files containing comparison data.

Default: enabled

Generate Query UNL use
-unlOutSegs with -unlOutdir
Instructs Derive Data from Server (mpxredvd) to generate query UNL files during processing. These files are used by the relationship linker.

Default: disabled

Generate BXM output -{no}bxmCmpd and
-{no}bxmBktd
Instructs Derive Data from Server (mpxredvd) to generate output files for bulk cross matching.

Default: enabled

BXM output directory -bxmOutDir Indicates where you want the .bxm output files to be located. This directory is relative to the project work directory on the operational server:

<MDM_INSTALL_HOME>\inst\mpinet_<MDM_INSTANCE_NAME>\work\
projectname \work\ BXM_output_dir

Default: bxm

Generate query BXM -{no}bxmQryd Indicates whether Derive Data from UNLs (mpxfsdvd) should generate query BXM files during processing. These files are used by the relationship linker.

Default: disabled

Update database -{no}dbUpdate Updates the operational server database after the rederivation is completed.
Performance Tuning   These fields act as a filter to include buckets up to the maximum (maximum bucket role), above a minimum (minimum bucket role) or within a range if both are set to a value greater than 0.
Maximum number of Member partitions -nMemParts Setting this partition depends on the size of your data set, your algorithms, and how much memory you have access to on the operational server. The utility that consumes the Derive Data from Server (mpxredvd) output (such as Generate Frequency Stats (mpxfreq)) must use a matching “memparts” value. Leave this at the default unless you need the memory. The higher the member partitions the slower your mpxcomp process, as the operational server must do more duplicate comparisons.

Default: 1

Maximum number of Bucket partitions -nBktParts Setting this partition depends on the size of your data set, your algorithms, and how much memory you have access to on the operational server. Leave this at the default unless you need the memory.

Default: 10

Maximum number of Query partitions -nQryParts Setting this partition depends on the size of your data set, your algorithms, and how much memory you have access to on the operational server. Leave this at the default unless you need the memory. This option is enabled only when the option to Generate query BXM is also enabled.

Default: 1

Block size -blkSize Number of members in a block. Default: 1000
Buffer size -buffSize Size for each file input/output (I/O) buffer.

Default: 65536

Options
Encoding -encoding Choices are Latin1, UTF8 and UTF16.

Default: latin1

Minimum bucket role -minBktTag The lowest bucketing role designation used in the algorithm to include in the process.

Default: 0 (use all)

Maximum bucket role -maxBktTag The highest bucketing role designation used in the algorithm to include in the process.

Default: 0 (use all)

Minimum query role -minQryRole The lowest query role designation used in the algorithm to include in the process. This option is enabled only when the option to Generate query BXM is also enabled.

Default: 10000

Minimum member record number -minMemRecno Specifies the lowest MEMRECNO to include in the process.

Default: 0 (use all)

Maximum member record number -maxMemRecno Specifies the highest MEMRECNO to include in the process.

Default: 0 (use all)

Log Options
Trace logging   Produces a trace of activity as interactions flow through the system. This option is very verbose and should only be used for short periods of time.
Debug logging   Produces low-level diagnostics used internally by IBM® to identify what was happening on the system before an error condition occurred. This option generates a large amount of output per activity and should only be used for short periods of time.
Attention: Debug logging can potentially include personal member information such as member identification number, name, and so forth.
Timer logging   Produces timings on certain operations to help identify where significant processing time is elapsing.
SQL logging   Outputs the SQL that is sent by the InfoSphere® MDM database layer to the RDBMS. This helps in diagnosing database-related issues. This option can produce large amounts of output depending on the activity.
Audit logging   Produces activity information and non-critical warnings. Often, this option is used when a new system is first implemented to monitor activity.
Algorithm logging   A separate logging level for algorithm-related debug information without the risk of including protected health information (PHI).

Default: disabled