InfoSphere MDM standardization overview

With InfoSphere® MDM, when data is received, it can be standardized using one of the standardizers included with the product or a third party standardizer.

For the IBM® InfoSphere QualityStage standardizer, you have the additional option of allowing for normalization of the data. Normalization involves taking a set of data that is provided in a single field, and parsing it out into its individual elements so that they can be stored separately. For example, if 555 Acme Rd. is received in line one of the address, then, with QualityStage configured, InfoSphere MDM can store 555 as the street number, Acme as the street name and ROAD as the street type. The unnormalized data is the address line one, and the normalized data is the separately stored street number, street name and street type.

Normalization is enabled by setting the /IBM/Party/LocationNormalization/enabled configuration item to true (it is false by default). This configuration item turns normalization on or off for both addresses and phone numbers.

For addresses, the normalized attributes correspond to all attributes that compose the address lines in the formatter configuration item /IBM/ThirdPartyAdapters/Address/Formatter. By default, for example, since BuildingName does not appear in the definition of an address line, it is not considered a normalized item, and is dealt with accordingly.

In such a case, the following occurs:
  1. Internal to InfoSphere MDM, an external rule called CheckAddressNormalization is called first to derive unnormalized data (addressLineOne) by the formatter defined in the CONFIGELEMENT table.
  2. The unnormalized data is then sent to the standardizer to either standardize or normalize, or both.
  3. Finally, the returned normalized items from the standardizer are used to derive unnormalized data by the formatter.

Phone number normalization works a similar way.

The following two external rules are introduced for normalization:
CheckAddressNormalizedRule
This rule is used to derive unnormalized data (AddressLineOne, AddressLineTwo of TCRMAddressBObj) if both unnormalized data and normalized data (such as normalized address items that are separately stored) are provided in the request. The rule that determines how the unnormalized address data is derived is defined in the Address Standardization and Normalization table. This rule uses CM option /IBM/Party/ContactMethod/PhoneCategoryType to determine the Contact Method Category Type for the phone number.
CheckContactMethodNormalizedRule
This rule is used to derive unnormalized data (ReferenceNumber of TCRMContactMethodBObj) if both unnormalized data, such as ReferenceNumber, and normalized data , such as normalized TCRMPhoneNumberBObj items that are separately stored, are provided in the request. The rule that determines how to derive the unnormalized phone number data is derived is defined in the Phone Number Standardization and Normalization table.

Given the level of configuration and the types of data and transactions that are possible with standardizations, there are many different scenarios that are supported. However, only a few of the scenarios are expected to be used by most customers. The scenarios that we expect customers to use most often are indicated in the following tables, which describe the detailed behavior for all scenarios possible for both Address and Phone Number standardization and normalization. Generally, when normalization is on, updates are expected to send both normalized (N) and unnormalized (U) data in the request, since both types of data would be returned from the previously invoked transaction, but only one type of data would change, either U or N. Add transactions are expected to just use one type of data.

Table 1. Address Standardization and Normalization
Standardization/Normalization Just Unnormalized Data <U> Just Normalized Data <N> Both Types of Data <U, N>
On, Off
  • No pre-formatting occurs
  • No post-formatting occurs
  1. AddressLineOne, AddressLineTwo, AddressLineThree are standardized and saved
  2. After standardization, normalized address items are ignored
  3. No pre- or post-formatting occurs
This scenario is expected to be used more often.
Exception thrown due to missing mandatory field For both add and update transactions:
  • N is not saved
  • The standardization of U is saved
  • Pre- and post-formatting are not executed
  • Normalized data returned from the standardizer is ignored.
Off, On

Pre-formatting occurs for N

Saves U as it is. Pre-formats U.

No post-formatting occurs

For both add and update transactions:
  • N and U are saved as they are
  • Pre- and post-formatting are not executed
Note: This configuration may cause N and U to be inconsistent.
Off, Off
  • No pre-formatting occurs
  • No post-formatting occurs
Saves U as it is.

This scenario is expected to be used more often.

Exception thrown due to missing mandatory field For both add and update transactions:
  • N is not saved
  • U is saved as it is
  • Pre- and post-formatting are not executed
Note: This configuration may cause N and U to be inconsistent.
On, On
  • Pre-formatting occurs
  • Post-formatting occurs
Post-format addressLineOne, addressLineTwo, addressLineThree. Formatter is defined in CONFIGELEMENT table.

After standardization, normalized address items will be persisted

Pre- and post-formatted by InfoSphere MDM

This scenario is expected to be used more often.

For add transactions:
  • U is sent to the standardizer
  • N is ignored. No preformatting is executed
  • Normalized data is populated with data received from the standardizer
  • Post-formatting is executed
For update transactions, the behavior depends on what has changed, which you can determine by comparing the incoming data with the before image:
  • If nothing has changed or if only U has changed, InfoSphere MDM uses U as default data sent to standardizer. No pre-formatting is executed.
  • If only N has changed, U (addrLine2 and addrLine3, if defined in the pre-formatter) is pre-formatted by N.
  • If both N and U have changed, then InfoSphere MDM uses U as the data to send to the standardizer
  • Post-formatting is executed

This scenario is expected to be used more often.

Table 2. Phone Number Standardization and Normalization
Standardization/Normalization Just Unnormalized Data <U> Just Normalized Data <N> Both Types of Data <U, N>
On, Off
  • No pre-formatting occurs
  • No post-formatting occurs
  1. Ref_num is standardized and saved
  2. After standardization, normalized address items are ignored
This scenario is expected to be used more often.
Exception thrown due to missing mandatory field For both add and update transactions:
  • N is not saved
  • The standardization of U is saved
  • Pre- and post-formatting are not executed
  • Normalized data returned from the standardizer is ignored.
Off, On
  • Pre-formatting occurs
  • No post-formatting occurs
Saves U as it is. Pre-formats ref_num.NN For both add and update transactions:
  • N and U are saved as they are
  • Pre- and post-formatting are not executed
Note: This configuration may cause N and U to be inconsistent.
Off, Off
  • No pre-formatting occurs
  • No post-formatting occurs
Saves U as it is.

This scenario is expected to be used more often.

Exception thrown due to missing mandatory field For both add and update transactions:
  • N is not saved
  • U is saved as it is
  • Pre- and post-formatting are not executed
Note: This configuration may cause N and U to be inconsistent.
On, On
  • Pre-formatting occurs
  • Post-formatting occurs
Post-format ref_num. Formatter is defined in CONFIGELEMENT table.

After standardization, normalized phone number items will be persisted

Pre- and post-formatted by InfoSphere MDM

This scenario is expected to be used more often.

For add transactions:
  • U is sent to the standardizer
  • N is ignored. No preformatting is executed
  • Normalized data is populated with data received from the standardizer
  • Post-formatting is executed
For update transactions, the behavior depends on what has changed, which you can determine by comparing the incoming data with the before image:
  • If nothing has changed or if only U has changed, InfoSphere MDM uses U as default data sent to standardizer. No pre-formatting is executed.
  • If only N has changed, U is pre-formatted by N.
  • If both N and U have changed, then InfoSphere MDM uses U as the data to send to the standardizer
  • Post-formatting is executed

This scenario is expected to be used more often.