Derive Data from Server (mpxredvd)
The Derive Data from Server (mpxredvd) job can be used both to derive member data (creating mpi_memcpd.unl and mpi_mekbkd.unl files), and (optionally) to create binary data files for use in a bulk cross match (BXM).
To produce these files, Derive Data from Server (mpxredvd) reads the member data from the database (specifically, from the member tables), rather than from UNL files. Derive Data from Server (mpxredvd) goes through the data line by line to create new buckets and comparison strings. You can select the specific elements (buckets, comparison strings, or binaries) that you want to re-derive.
Workbench | Command line | Description |
---|---|---|
Member Type | -memType |
If you have multiple member types in the operational server
database and only need to derive data for one of those member types,
select the member type here; otherwise, select ALL. Default: ALL |
Inputs and Outputs | ||
Generate UNL output | -unlOutDir and-unlOutSegs used
together |
Indicates whether Derive Data from Server (mpxredvd) should
generate UNL files during processing. Default: enabled |
UNL output directory | -unlOutDir |
The output of Derive Data from Server (mpxredvd) is the derived
data segments (comparison, bucket and, optionally, query data), which
have their own UNL files (mpi_memcmpd.unl, mpi_membktd.unl,
and mpi_memqryd.unl) written to the directory
specified here. This directory is relative to the project work directory
on the operational server:
With ‑unlOutSegs, indicates whether Derive Data from Server (mpxredvd) should generate UNL files during processing. Also with ‑unlOutSegs, creates UNL files containing bucketing data or comparison data, or instructs the job to generate query UNL files during processing (the files are used by the relationship linker). Default: unl |
Generate Bucket UNL | use-unlOutSegs with -unlOutdir |
Instructs Derive Data from Server (mpxredvd) to create UNL
files containing bucketing data. Default: enabled |
Generate Comparison UNL | use-unlOutSegs with -unlOutdir |
Instructs Derive Data from Server (mpxredvd) to create UNL
files containing comparison data. Default: enabled |
Generate Query UNL | use-unlOutSegs with -unlOutdir |
Instructs Derive Data from Server (mpxredvd) to generate query
UNL files during processing. These files are used by the relationship
linker. Default: disabled |
Generate BXM output | -{no}bxmCmpd and-{no}bxmBktd |
Instructs Derive Data from Server (mpxredvd) to generate output
files for bulk cross matching. Default: enabled |
BXM output directory | -bxmOutDir |
Indicates where you want the .bxm output
files to be located. This directory is relative to the project work
directory on the operational server:
Default: bxm |
Generate query BXM | -{no}bxmQryd |
Indicates whether Derive Data from UNLs (mpxfsdvd) should generate
query BXM files during processing. These files are used by the relationship
linker. Default: disabled |
Update database | -{no}dbUpdate |
Updates the operational server database after the rederivation is completed. |
Performance Tuning | These fields act as a filter to include buckets up to the maximum (maximum bucket role), above a minimum (minimum bucket role) or within a range if both are set to a value greater than 0. | |
Maximum number of Member partitions | -nMemParts |
Setting this partition depends on the size of your data set,
your algorithms, and how much memory you have access to on the operational
server. The utility that consumes the Derive Data from Server (mpxredvd)
output (such as Generate Frequency Stats (mpxfreq)) must use a matching
“memparts” value. Leave this at the default unless you need the memory.
The higher the member partitions the slower your mpxcomp process,
as the operational server must do more duplicate comparisons. Default: 1 |
Maximum number of Bucket partitions | -nBktParts |
Setting this partition depends on the size of your data set,
your algorithms, and how much memory you have access to on the operational
server. Leave this at the default unless you need the memory. Default: 10 |
Maximum number of Query partitions | -nQryParts |
Setting this partition depends on the size of your data set,
your algorithms, and how much memory you have access to on the operational
server. Leave this at the default unless you need the memory. This
option is enabled only when the option to Generate query BXM is also
enabled. Default: 1 |
Block size | -blkSize |
Number of members in a block. Default: 1000 |
Buffer size | -buffSize |
Size for each file input/output (I/O) buffer. Default: 65536 |
Options | ||
Encoding | -encoding |
Choices are Latin1, UTF8 and UTF16. Default: latin1 |
Minimum bucket role | -minBktTag |
The lowest bucketing role designation used in the algorithm
to include in the process. Default: 0 (use all) |
Maximum bucket role | -maxBktTag |
The highest bucketing role designation used in the algorithm
to include in the process. Default: 0 (use all) |
Minimum query role | -minQryRole |
The lowest query role designation used in the algorithm to
include in the process. This option is enabled only when the option
to Generate query BXM is also enabled. Default: 10000 |
Minimum member record number | -minMemRecno |
Specifies the lowest MEMRECNO to include in the process. Default: 0 (use all) |
Maximum member record number | -maxMemRecno |
Specifies the highest MEMRECNO to include in the process. Default: 0 (use all) |
Log Options | ||
Trace logging | Produces a trace of activity as interactions flow through the system. This option is very verbose and should only be used for short periods of time. | |
Debug logging | Produces low-level diagnostics used internally by IBM® to identify what was happening on the system
before an error condition occurred. This option generates a large
amount of output per activity and should only be used for short periods
of time. Attention: Debug logging can potentially
include personal member information such as member identification
number, name, and so forth.
|
|
Timer logging | Produces timings on certain operations to help identify where significant processing time is elapsing. | |
SQL logging | Outputs the SQL that is sent by the InfoSphere® MDM database layer to the RDBMS. This helps in diagnosing database-related issues. This option can produce large amounts of output depending on the activity. | |
Audit logging | Produces activity information and non-critical warnings. Often, this option is used when a new system is first implemented to monitor activity. | |
Algorithm logging | A separate logging level for algorithm-related debug information
without the risk of including protected health information (PHI). Default: disabled |