Predefined rule sets

You can use predefined rule sets that are country- or region-specific or that can be applied internationally.

Country or region rule sets

In the Designer client repository view, expand the Standardization Rules folder and select the country or region that you want. The folder for the country or region contains the provided rule sets.

You can install additional rule sets from asset interchange and then import the rule sets into your project.

Table 1. Rule sets for countries or regions
Country or region Rule sets Location of rule set files
Argentina
  • ARADDR
  • ARAREA
  • ARNAME
  • ARPREP
InfoSphere® Information Server file directory
Australia
  • AUADDR
  • AUAREA
  • AUNAME
  • AUPREP
Designer client repository view
Brazil
  • BRADDR
  • BRAREA
  • BRNAME
  • BRPREP
InfoSphere Information Server file directory
Canada
  • CAADDR
  • CAAREA
  • CANAME
  • CAPREP
Designer client repository view
Chile
  • CHADDR
  • CHAREA
  • CHNAME
  • CHPREP
InfoSphere Information Server file directory
France
  • FRADDR
  • FRAREA
  • FRNAME
  • FRPREP
Designer client repository view
Germany
  • DEADDR
  • DEAREA
  • DENAME
  • DEPREP
Designer client repository view
Hong Kong Special Administrative Region of the People's Republic of China - Chinese characters
  • HKCADDR
  • HKCNAME
Note: The Name domain includes data typically found in the Area domain. The Prep domain is not needed.
Designer client repository view
Hong Kong Special Administrative Region of the People's Republic of China - Latin characters
  • HKADDR
  • HKNAME
  • HKPHONE
Note: The Name domain includes data typically found in the Area domain. The Prep domain is not needed.
Designer client repository view
India
  • INAPAD
  • INAREA
  • INASAD
  • INBHAD
  • INDLAD
  • INGJAD
  • INHPAD
  • INHRAD
  • INJKAD
  • INKAAD
  • INKEAD
  • INMHAD
  • INMPAD
  • INNAME
  • INORAD
  • INPBAD
  • INRJAD
  • INTNAD
  • INUPAD
  • INWBAD
Note: The IndiaAddressSharedContainer shared container is imported with the Indian address rule sets. The shared container can be used in a job that standardizes Indian address and area data.
InfoSphere Information Server file directory
Ireland
  • IEADDR
  • IEAREA
  • IENAME
  • IEPREP
InfoSphere Information Server file directory
Italy
  • ITADDR
  • ITAREA
  • ITNAME
  • ITPREP
Designer client repository view
Japan
  • JP1PHN
  • JP2PHN
  • JPADDR
  • JPAREA
  • JPDATE
  • JPNAME
  • JPTRIM
Designer client repository view
Japan
  • JPKANA
  • JPKNAM

JPKANA converts data from Katakana to Kanji.

InfoSphere Information Server file directory
Korea
  • KOADDR
  • KOAREA
  • KONAME
  • KOPREP
InfoSphere Information Server file directory
Mexico
  • MXADDR
  • MXAREA
  • MXNAME
  • MXPREP
InfoSphere Information Server file directory
Netherlands
  • NLADDR
  • NLAREA
  • NLNAME
  • NLPREP
InfoSphere Information Server file directory
People's Republic of China
  • CNADDR
  • CNAREA
  • CNNAME
  • CNPHONE
Designer client repository view
Peru
  • PEADDR
  • PEAREA
  • PENAME
  • PEPREP
InfoSphere Information Server file directory
Russia
  • RUADDRL
  • RUNAMEL

The RUADDRL rule set standardizes address and area information.

To use these rule sets, you must set the Windows code page to 1251 and the regional settings for your operating system to Russian.

InfoSphere Information Server file directory
Spain
  • ESADDR
  • ESAREA
  • ESNAME
  • ESPREP
Designer client repository view
Thailand
  • THADDRL
  • THNAMEL

The THADDRL rule set standardizes address and area information.

To use these rule sets, you must set the Windows code page to 847 and set the regional settings for your operating system to the Thai language. In the job properties, set the NLS parameter to TIS-620.

InfoSphere Information Server file directory
United Kingdom
  • GBADDR
  • GBAREA
  • GBNAME
  • GBPREP
Designer client repository view
United States
  • USADDR
  • USAREA
  • USNAME
  • USPREP
  • USTAXID
The USTAXID rules validate United States tax ID format, including the following characteristics:
  • nine digits
  • all numeric
  • not consisting of one number, for example 000-00-0000
Designer client repository view

Rule sets in the other category

In the Designer client repository view, expand the Standardization Rules folder and select the folder named Other. The Other folder contains the provided in-country or region-specific rule sets or rule sets that can be applied to more than one country.
Table 2. Rule sets in the other category
Scope Rule sets Location of rule set files
International COUNTRY

Determines the country or region of origin for incoming data. This rule set looks at area information, such as city, state or province, locality, and postal code. The rule set attempts to identify the country or region to which the information belongs and assign the ISO territory code.

Expected input: multinational address information with a default two byte country code delimter that is enclosed in the characters ZQ, for example: ZQUSZQ.

Designer client repository view
United States-specific Expanded Company Name EXPCOM
Parses company name information into up-to-eight match words (excluding words such as the and and) and extracts distinguishing information such as:
  • TradeName
  • StateOrgNum
  • FranchiseNumber
  • Division
  • AccountInfo
  • CorpDate

Expected input: a valid company name.

Designer client repository view
Validation VDATE
Verifies date formats, for example:
  • mmddyyyy
  • ddmmyyyy
  • mmddyy
  • ddmmyy
  • yyyymmdd
Designer client repository view
Validation VEMAIL
Verifies email formats, for example:
  • first.last@domain.org
  • name@domain.com.ca
Designer client repository view
Validation VPHONE
Verifies United States, Canada, and Caribbean phone formats, for example:
  • (nnn) nnn-nnnn
  • nnn nnn nnnn
  • nnnnnnnnnn
Note: A number sequence that leads with 1, is considered invalid.
Designer client repository view
Validation VTAXID

Provides rules that validate country-specific tax ID formats. Intended for only the United States.

Designer client repository view

Rule sets in the sample category

In the Designer client repository view, expand the Standardization Rules folder and select the folder named Sample. The Sample folder contains sample rule sets that can be applied to different types of data.
Table 3. Rule sets in the sample category
Rule sets Location of rule set files
GENPROD

Demonstrates how product description data can be processed through standardization.

InfoSphere Information Server file directory
PHPROD

Demonstrates how pharmaceutical data can be processed through standardization.

InfoSphere Information Server file directory

Rule set development package

You can install a rule set development package from asset interchange, in the InfoSphere Information Server file directory. The package includes the following items:
Standardization rules templates
Includes templates for domains: address, area, and name.
Standardization rules development kit
Consists of a series of jobs that produce reports to assist in custom rules development.
Standardization quality assessment kit
Consists of a series of jobs that produce reports to assist in evaluating standardization results.