Match passes in DataStage®

In a match pass, you define the columns to compare and how to compare them. You also define the criteria for creating blocks.

You use multiple passes to implement complementary or independent business rules, to help overcome the complexities of processing large data volumes, and to compensate for data errors or missing values in the blocking columns.

Strategies for multiple pass include the following actions:

  • Make the early passes the most restrictive, and use the most reliable columns for blocks. Loosen the blocking constraints in successive passes as the number of nonmatched records decreases.
  • Use different blocking constraints with different perspectives or views. Also, use similar match comparisons so that pairs that are missed by the blocking conditions of one pass are found by the blocking conditions of another pass.
  • Use different sets of match comparisons to implement complementary match rules.
  • Select a strategy that produces small blocks of records in the match passes. Smaller blocks are many times more efficient than larger blocks. Use multiple passes to define different blocks.

The strategy that you choose to match data depends on your data cleansing goals. After you decide on the criteria, you can design a matching strategy to meet the goals. For example, if your goal is to match households, you might use name, address, and birth date data to determine a match.

Match pass examples

In the examples, notice how sometimes columns are used whole (birth date) or divided into parts (birth year). Where necessary, this strategy of using parts of columns is accomplished by creating additional columns.

The following examples assume that you have two sources containing a date, given name, family name, and gender.

Date, given name, family name, and gender columns

  • match pass 1: date and gender
  • match pass 2: Soundex of family name and first two characters of given name
  • match pass 3: Soundex of given name, year, and month (from the date column)

Date, family name, city, and postal code columns

  • match pass 1: date and gender
  • match pass 2: postal code
  • match pass 3: Soundex of family name and first two characters of city

National identity number, family name, given name, and birth date columns

  • match pass 1: national identity number
  • match pass 2: birth date
  • match pass 3: Soundex of family name, birth year

Family name, middle initial, given name, gender, and birth date (year, month, day) columns

  • match pass 1: family name, gender, and birth date
  • match pass 2: birth month, birth day, the first character of the given name, and middle initial