Topic
  • 1 reply
  • Latest Post - ‏2010-12-21T15:14:01Z by smithha
SystemAdmin
SystemAdmin
533 Posts

Pinned topic DATE8 in QualityStage

‏2010-12-21T01:31:42Z |
I am quite new to QS and need help in date comparison in one of job. It would be helpful if someone really reply me fast with solution or suggestion.

Requirement : Compare records on DATE with date range of +7 or -7.

MATCH Spec details: Match Type : Unduplicate
Blocked field F1 and F2.
Match command : PASS1 is on DATE.
Comparision Type : DATE8
M-prob .9
U-prob .01
PARAM1 : 7
PARAM2 : 0

INPUT DATA
F1 F2 DATE UNQRCD
30152473 9 2005-04-07 93731
30152473 9 2005-04-14 81259
30152473 9 2005-04-21 81542
30152473 9 2005-04-28 61352
30152473 9 2005-04-01 30077

See all the records in above input is in range of +7 or -7. My question is ow should we define the date range in MatchSpec to get below expected output:

Output DATA
F1 F2 DATE UNQRCD QsMatchSetId
30152473 9 2005-04-07 93731 14
30152473 9 2005-04-14 81259 14
30152473 9 2005-04-21 81542 14
30152473 9 2005-04-28 61352 14
30152473 9 2005-04-01 30077 14
QsMatchSetId is 14 but it can be any common number to show all of them are same records or matching.

Please let me know if anyone has any releated question or need any further information.
Updated on 2010-12-21T15:14:01Z at 2010-12-21T15:14:01Z by smithha
  • smithha
    smithha
    23 Posts

    Re: DATE8 in QualityStage

    ‏2010-12-21T15:14:01Z  
    An issue that you will encounter here is that the calculations will be dependent on the first record to arrive. In your example, that is the first record for April 7.

    The next record is at the outer limit of the range you've provided, so will match but with a low score. The subsequent records for April 21 and April 28 will not match as both are greater than 7 days from the first record and both will get full penalty for that match comparison. The final record should match as April 1 is within the 7 day range, but again will have relatively low score.

    The dates you show cover a span of 28 days so if you want all of those to group together using the date comparison you'd need to set the range at 28 to ensure all came together as you can't guarantee the April 7 record will arrive first.

    Some options you could use:
    • Ensure the file is sorted by date order prior to the match. Set 1st parameter to 28 (or whatever the maximum tolerated range is) and set 2nd parameter to 0 (being in date order there won't be a need to allow for earlier dates).
    • Let the file enter in whatever order is appropriate, but ensure the span covers all dates you might potentially match to, so in this case set 1st parameter to 28 and leave 2nd parameter blank.
    • If there is a record that could be considered a master record then separate the master record from the potentially associated records into two files and do a reference match rather than an unduplication. This will allow you to control the record from which you compare the associated records.
    • If you are trying to chain records together so that if record A and record B match, and record B and record C match, and at the end you get records A, B, and C matching then you need to use an Undup Independent (which would need at least 2 passes) or an Undup Transitive match.

    Harald