Pinned topic how match weight is calculated?
I try to understand how is weight calcualted (for each column) during matching.
For example we have small file with 10 records, column CODE contains four values (in parentheses number of rows with this value and calculated disagreement weight):
X10 (5, 2.32)
Y20 (2, 2.99)
Z20 (2, 2.99)
X20 (1, 3.16)
Now when I run test for simple matching specification (in Match Command for CODE column I use CHAR comparison type) I get such matching results (disagreement weights for two rows with specified values in CODE column):
X10&Y20 = 2.9
X10&X20 = 3.16
X10&Z20 = 2.99
X20&Z20 = 2.99
X20&Y20 = 3.16
Questins:
1. What disagreement weight is choosed for CHAR comparison type when there are two different disagreement weights (for each column value, for example for case X10&Z20 with disagr.weight 2.32/2.99)?
2. How to explain different disagreement weight for cases X10&Y20 and X10&Z20 (Y20 and Z20 value have the same frequency!!!)?
3. Is there any document/white paper/redbook which describe this matching process in more details (I looked at QualityStage user guide and redbook "IBM WebSphere QualityStage Methodologies, Standardization, and Matching"  but these documents doesn't answer to my questions)?
Thanks,
Oleg

Re: how match weight is calculated?
20110126T17:44:02ZThis is the accepted answer. This is the accepted answer.You can find some discussion on calculating weights and penalties here: link: http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/topic/com.ibm.swg.im.iis.qs.ug.doc/topics/logarithm.html
Also some reference starting on page 89 of the QualityStage Redbook.
However, here is a short summary on match weights and their calculation:
A match process includes some number of fields on a record that will be used in calculating the match weight/score.
 Each field agreeing between records is going to add an agreement weight to the probability that the records are a match
 Each field disagreeing is going to contribute a disagreement weight
 Missing values get a weight of zero.
 If a field is very reliable, it is more useful for classification than a field with many errors or missing values
Weights take into account two properties of statistical variables:
 Reliability of the field (measured by the mProbability)
 Probability of a random agreement of values (measured by the uProbability)
Agreement weight is log2(m/u)
Disagreement weight is log2((1m)/(1u))
m is the mprobability, u is the uprobability; logs are base 2
The agreement weight is applied to the field if it matches in the record pair being examined, else the disagreement weight is applied (i.e. a penalty).
At a record level, the weights are composites:
 If a field matches, the agreement weight is used  This is a positive number
 If a field doesn’t match, the disagreement weight is used  This is a negative number
 Weights for all field comparisons are summed to form a composite weight
 If most fields match, the result will be a large positive value
 If most fields do not match, the result will be a large negative value
mprobability
The mprobability is defined as the probability of the variable agreeing in a matched pair
It is equivalent to 1  error rate for the field
 For example, if the variable street type has a 10% error rate, then the mprobability is 0.90
The higher the mprobability, the higher the disagreement weight will be for the field not matching since errors are relatively rare events.
mprobability is an estimate
0.9 is generally good for most fields
Use 0.95 or greater if the field is fairly important as well as reliable
 note that this is what you enter for a particular field for a given match comparison http://e.g. you choose a CHAR comparison on a Country Code field, there is an mprobability value that defaults to 0.9, but can be changed
uprobability
The second property of a variable measured by weight is the probability that a variable agrees in an unmatched record (the uprobability). Consequently, the uprobability can be approximated as the probability that a field agrees at random (by chance).
uProbability for each value is calculated as:
 of occurrences of the value / total number of occurrences for all values
Relatively rare events are more useful for record classification than common events
A match frequency stage can be used to calculate the uprobability automatically for a given field
The smaller the uprobability, the higher the agreement weight will be for the field as the likelihood of a match is rare, so when a match occurs it is likely good.
Some example calculations
Field mprob uprob weight penalty
last name 0.9 0.01 6.491853096 3.307428525
first name 0.9 0.001 9.813781191 3.320484678
taxid 0.999 0.0001 13.28626896 9.965640008
date of birth 0.99 0.001 9.951284715 6.642412773
gender 0.9 0.5 0.847996907 2.321928095
If all of these were CHAR comparisons, they would either match (and get the full weight) or not match (and get the full penalty). Other comparisons will return various prorated scores between the maximum weight and the maximum penalty.
Changes to uprob impact weight; changes to mprob impact penalty
On your questions:
1. What disagreement weight is chosen for CHAR comparison type when there are two different disagreement weights (for each column value, for example for case X10&Z20 with disagr.weight 2.32/2.99)?
Each field comparison has one score  in your example X10 and Z20 are not a match, so they will get a disagreement and it will be calculated based on the mprob entered in the match comparison and the uprob will be based either on a value from a Match Frequency stage output, or what you enter in the match comparison.
2. How to explain different disagreement weight for cases X10&Y20 and X10&Z20 (Y20 and Z20 value have the same frequency!!!)?
It would be necessary to look at the details of your match specification, and any match frequency tables created for use with that.
Harald 
Re: how match weight is calculated?
20110127T06:55:44ZThis is the accepted answer. This is the accepted answer. smithha
 20110126T17:44:02Z
You can find some discussion on calculating weights and penalties here: link: http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/topic/com.ibm.swg.im.iis.qs.ug.doc/topics/logarithm.html
Also some reference starting on page 89 of the QualityStage Redbook.
However, here is a short summary on match weights and their calculation:
A match process includes some number of fields on a record that will be used in calculating the match weight/score.
 Each field agreeing between records is going to add an agreement weight to the probability that the records are a match
 Each field disagreeing is going to contribute a disagreement weight
 Missing values get a weight of zero.
 If a field is very reliable, it is more useful for classification than a field with many errors or missing values
Weights take into account two properties of statistical variables:
 Reliability of the field (measured by the mProbability)
 Probability of a random agreement of values (measured by the uProbability)
Agreement weight is log2(m/u)
Disagreement weight is log2((1m)/(1u))
m is the mprobability, u is the uprobability; logs are base 2
The agreement weight is applied to the field if it matches in the record pair being examined, else the disagreement weight is applied (i.e. a penalty).
At a record level, the weights are composites:
 If a field matches, the agreement weight is used  This is a positive number
 If a field doesn’t match, the disagreement weight is used  This is a negative number
 Weights for all field comparisons are summed to form a composite weight
 If most fields match, the result will be a large positive value
 If most fields do not match, the result will be a large negative value
mprobability
The mprobability is defined as the probability of the variable agreeing in a matched pair
It is equivalent to 1  error rate for the field
 For example, if the variable street type has a 10% error rate, then the mprobability is 0.90
The higher the mprobability, the higher the disagreement weight will be for the field not matching since errors are relatively rare events.
mprobability is an estimate
0.9 is generally good for most fields
Use 0.95 or greater if the field is fairly important as well as reliable
 note that this is what you enter for a particular field for a given match comparison http://e.g. you choose a CHAR comparison on a Country Code field, there is an mprobability value that defaults to 0.9, but can be changed
uprobability
The second property of a variable measured by weight is the probability that a variable agrees in an unmatched record (the uprobability). Consequently, the uprobability can be approximated as the probability that a field agrees at random (by chance).
uProbability for each value is calculated as:
 of occurrences of the value / total number of occurrences for all values
Relatively rare events are more useful for record classification than common events
A match frequency stage can be used to calculate the uprobability automatically for a given field
The smaller the uprobability, the higher the agreement weight will be for the field as the likelihood of a match is rare, so when a match occurs it is likely good.
Some example calculations
Field mprob uprob weight penalty
last name 0.9 0.01 6.491853096 3.307428525
first name 0.9 0.001 9.813781191 3.320484678
taxid 0.999 0.0001 13.28626896 9.965640008
date of birth 0.99 0.001 9.951284715 6.642412773
gender 0.9 0.5 0.847996907 2.321928095
If all of these were CHAR comparisons, they would either match (and get the full weight) or not match (and get the full penalty). Other comparisons will return various prorated scores between the maximum weight and the maximum penalty.
Changes to uprob impact weight; changes to mprob impact penalty
On your questions:
1. What disagreement weight is chosen for CHAR comparison type when there are two different disagreement weights (for each column value, for example for case X10&Z20 with disagr.weight 2.32/2.99)?
Each field comparison has one score  in your example X10 and Z20 are not a match, so they will get a disagreement and it will be calculated based on the mprob entered in the match comparison and the uprob will be based either on a value from a Match Frequency stage output, or what you enter in the match comparison.
2. How to explain different disagreement weight for cases X10&Y20 and X10&Z20 (Y20 and Z20 value have the same frequency!!!)?
It would be necessary to look at the details of your match specification, and any match frequency tables created for use with that.
Harald
> Each field comparison has one score  in your example X10 and Z20 are not a match, so they will get a disagreement ...
As I used frequency table (generated with Match Frequency stage) I have two disagreement scores  for X10 and Z20 values ( 2.32 and 2.99). As you can see in my example Qualitystage choosed value 2.99 and I guessed that for disagreement is used minimum value from two weights. But unfortunately my guess is not rights because in other case  X20&Z20 (weights respectively 3.16 and 2.99)  Qualitystage choosed maximum value 2.99.
> It would be necessary to look at the details of your match specification, and any match frequency tables created for use with that.
Example is very simple, input data (columns BlockId, Descr, mcode):
BlockId,Descr,mcode
1,AAAAAAAAAAAAAAAAAAAA,X10
1,AAAAAAAAAAAAAAAAAAAA,Y20
1,AAAAAAAAAAAAAAAAAAAA,X10
1,AAAAAAAAAAAAAAAAAAAA,X10
1,AAAAAAAAAAAAAAAAAAAA,X10
1,AAAAAAAAAAAAAAAAAAAA,Y20
1,AAAAAAAAAAAAAAAAAAAA,X20
1,AAAAAAAAAAAAAAAAAAAA,X10
1,AAAAAAAAAAAAAAAAAAAA,Z20
1,AAAAAAAAAAAAAAAAAAAA,Z20
Descr and mcode used in matching commands (with UNCERT and CHAR comparison types respectivelly).
I also attached screen snapshot of Match Designer (file simple10.PNG)
And frequency table for mcode column:
pre
VALUE FREQUENCY MPROB UPROB AGRWGT DISWGT MISWGT
ALLVALUESINTABLE 0 0.899999976 0.01 1.84000003 2.9000001 0
X10 5 0.899999976 0.5 0.839999974 2.31999993 0
X20 1 0.899999976 0.100000001 3.16000009 3.16000009 0
Y20 2 0.899999976 0.200000003 2.16000009 2.99000001 0
Z20 2 0.899999976 0.200000003 2.16000009 2.99000001 0
[/pre]
Regards,
OlegAttachments

Re: how match weight is calculated?
20110128T22:17:46ZThis is the accepted answer. This is the accepted answer. OlegT.
 20110127T06:55:44Z
Hi Harald,
> Each field comparison has one score  in your example X10 and Z20 are not a match, so they will get a disagreement ...
As I used frequency table (generated with Match Frequency stage) I have two disagreement scores  for X10 and Z20 values ( 2.32 and 2.99). As you can see in my example Qualitystage choosed value 2.99 and I guessed that for disagreement is used minimum value from two weights. But unfortunately my guess is not rights because in other case  X20&Z20 (weights respectively 3.16 and 2.99)  Qualitystage choosed maximum value 2.99.
> It would be necessary to look at the details of your match specification, and any match frequency tables created for use with that.
Example is very simple, input data (columns BlockId, Descr, mcode):
BlockId,Descr,mcode
1,AAAAAAAAAAAAAAAAAAAA,X10
1,AAAAAAAAAAAAAAAAAAAA,Y20
1,AAAAAAAAAAAAAAAAAAAA,X10
1,AAAAAAAAAAAAAAAAAAAA,X10
1,AAAAAAAAAAAAAAAAAAAA,X10
1,AAAAAAAAAAAAAAAAAAAA,Y20
1,AAAAAAAAAAAAAAAAAAAA,X20
1,AAAAAAAAAAAAAAAAAAAA,X10
1,AAAAAAAAAAAAAAAAAAAA,Z20
1,AAAAAAAAAAAAAAAAAAAA,Z20
Descr and mcode used in matching commands (with UNCERT and CHAR comparison types respectivelly).
I also attached screen snapshot of Match Designer (file simple10.PNG)
And frequency table for mcode column:
pre
VALUE FREQUENCY MPROB UPROB AGRWGT DISWGT MISWGT
ALLVALUESINTABLE 0 0.899999976 0.01 1.84000003 2.9000001 0
X10 5 0.899999976 0.5 0.839999974 2.31999993 0
X20 1 0.899999976 0.100000001 3.16000009 3.16000009 0
Y20 2 0.899999976 0.200000003 2.16000009 2.99000001 0
Z20 2 0.899999976 0.200000003 2.16000009 2.99000001 0
[/pre]
Regards,
Oleg
Ok, I can see the pieces you are looking at with that detail.
What you are seeing is primarily a case of sequencing based on the input data set.
QualityStage will look at the data here recordbyrecord.
First record is X10, nothing else to match to, so it forms the first match group. However, that record's frequency information will not be the basis of the subsequent scoring.
Next record is Y20. It is compared to X10. You are using a CHAR comparison, so values must be exact to match. They aren't, so disagreement weight is based on the frequency of Y20 occurring: 2.99. Y20 will also become the second match group available.
Next record is X10. That is a match and you will get agreement weight based on X10 occurring: 0.84.
When you get to X20, it will compare to X10 and get a disagreement based on X20 occurring: 3.16.
When you get to Z20, it will compare first to X10 and get a disagreement based on Z20 occurring: 2.99.
It will subsequently compare to Y20, and then X20, in both cases getting same disagreement: 2.99.
If you change the sequence of the file so that Z20 comes first, and then X10, when X10 goes to match with Z20, that would get the disagreement of: 2.32.
So, think of the agreement or disagreement weight applied as the rarity of the value being compared to the group (not as the rarity of the group base or as the higher/lower of the possible weights).
Hope that helps.
Harald 
Re: how match weight is calculated?
20110201T13:56:18ZThis is the accepted answer. This is the accepted answer. smithha
 20110128T22:17:46Z
Hi Oleg,
Ok, I can see the pieces you are looking at with that detail.
What you are seeing is primarily a case of sequencing based on the input data set.
QualityStage will look at the data here recordbyrecord.
First record is X10, nothing else to match to, so it forms the first match group. However, that record's frequency information will not be the basis of the subsequent scoring.
Next record is Y20. It is compared to X10. You are using a CHAR comparison, so values must be exact to match. They aren't, so disagreement weight is based on the frequency of Y20 occurring: 2.99. Y20 will also become the second match group available.
Next record is X10. That is a match and you will get agreement weight based on X10 occurring: 0.84.
When you get to X20, it will compare to X10 and get a disagreement based on X20 occurring: 3.16.
When you get to Z20, it will compare first to X10 and get a disagreement based on Z20 occurring: 2.99.
It will subsequently compare to Y20, and then X20, in both cases getting same disagreement: 2.99.
If you change the sequence of the file so that Z20 comes first, and then X10, when X10 goes to match with Z20, that would get the disagreement of: 2.32.
So, think of the agreement or disagreement weight applied as the rarity of the value being compared to the group (not as the rarity of the group base or as the higher/lower of the possible weights).
Hope that helps.
Harald
thanks for you help and patience.
I am still confused with disagreement weight choosing during matching.
I tried to use input dataset with different sort order on MCODE coumn and result was the same (there was no any difference compared to my result which I describe in my previous post).
Input data:
BlockId Descr mcode 1 AAAAAAAAAAAAAAAAAABB Z20 1 AAAAAAAAAAAAAAAAAACC Z20 1 AAAAAAAAAABBBBBBBBBB Z20 1 AAAAAAAAAABBBBBBBBBB Z20 1 AAAAAAAAAAAAAAAAAAAA X10 1 AAAAAAAAAAAAAAAAAAAB X10 1 AAAAAAAAAAAAAAAAAAAC X10 1 AAAAAAAAAAAAAAAAAAAF X10 1 AAAAAAAAAAAAAAAAAAAE X10 1 AAAAAAAAAAAAAAAAAAAF X10 1 AAAAAAAAAAAAAAAAAAAG X10 1 AAAAAAAAAABBBBBBBBBB X10 1 AAAAAAAAAABBBBBBBBBB X20 1 AAAAAAAAAAAAAAAAAACC X20 1 AAAAAAAAAAAAAAAAAAEE Y20 1 AAAAAAAAAAAAAAAAAAFF Y20 1 AAAAAAAAAABBBBBBBBBB Y20 1 AAAAAAAAAAAAAAAAAAGG Y20
Weight info:
VALUE FREQUENCY MPROB UPROB AGRWGT DISWGT X10 8 0.90 0.44 1.01 2.47 *ALLVALUESINTABLE* 0 0.90 0.01 1.84 2.90 Y20 4 0.90 0.22 2.01 2.95 Z20 4 0.90 0.22 2.01 2.95 X20 2 0.90 0.11 3.01 3.15
Matching result:
QSMATCHSETID QSMATCHTYPE QSMATCHDATAID BlockId Descr mcode QSMATCHWEIGHT MCODE_WEIGHT 3 XA 3 1 AAAAAAAAAABBBBBBBBBB Z20 8.50 3 DA 13 1 AAAAAAAAAABBBBBBBBBB X20 3.54 2.95 3 DA 17 1 AAAAAAAAAABBBBBBBBBB Y20 3.54 2.95 3 DA 12 1 AAAAAAAAAABBBBBBBBBB X10 4.02 2.47 3 DA 4 1 AAAAAAAAAABBBBBBBBBB Z20 8.50 2.01 8 XA 8 1 AAAAAAAAAAAAAAAAAAAF X10 7.50 8 DA 14 1 AAAAAAAAAAAAAAAAAACC X20 0.76 2.47 8 DA 15 1 AAAAAAAAAAAAAAAAAAEE Y20 0.76 2.47 8 DA 18 1 AAAAAAAAAAAAAAAAAAGG Y20 0.76 2.47 8 DA 1 1 AAAAAAAAAAAAAAAAAABB Z20 0.76 2.47 8 DA 2 1 AAAAAAAAAAAAAAAAAACC Z20 0.76 2.47 8 DA 16 1 AAAAAAAAAAAAAAAAAAFF Y20 2.39 2.47 8 DA 5 1 AAAAAAAAAAAAAAAAAAAA X10 5.87 1.01 8 DA 6 1 AAAAAAAAAAAAAAAAAAAB X10 5.87 1.01 8 DA 7 1 AAAAAAAAAAAAAAAAAAAC X10 5.87 1.01 8 DA 9 1 AAAAAAAAAAAAAAAAAAAE X10 5.87 1.01 8 DA 11 1 AAAAAAAAAAAAAAAAAAAG X10 5.87 1.01 8 DA 10 1 AAAAAAAAAAAAAAAAAAAF X10 7.50 1.01
For me strange things are:
 for first matching group (SetId=3, master record is with mcode="Z20") disagreement weight usually is from matched record (for record with mcode=Y20 weight is for Y20 value, for record mcode=X10 weights is for X10 value); But why for record mcode=X20 is choosed weight 2.95 (X20 has disagreement weight 3.15 )
 for second matching group (SetId=8, master record is with mcode="X10") for all mathed rows with different value in mcode Qualitystage choosed disagreement weight 2.47 (it is value for X10)
 why for the first matching group (SetId=3) Qualitystage didn't choose master record with mcode=X20 (in this group all values for Descr column is the same, but "X20" value has highest agreement weight)
Regards,
Oleg for first matching group (SetId=3, master record is with mcode="Z20") disagreement weight usually is from matched record (for record with mcode=Y20 weight is for Y20 value, for record mcode=X10 weights is for X10 value); But why for record mcode=X20 is choosed weight 2.95 (X20 has disagreement weight 3.15 )
 for second matching group (SetId=8, master record is with mcode="X10") for all mathed rows with different value in mcode Qualitystage choosed disagreement weight 2.47 (it is value for X10)
 why for the first matching group (SetId=3) Qualitystage didn't choose master record with mcode=X20 (in this group all values for Descr column is the same, but "X20" value has highest agreement weight)
Regards,
Oleg 
Re: how match weight is calculated?
20110201T20:19:07ZThis is the accepted answer. This is the accepted answer. OlegT.
 20110201T13:56:18Z
Hi Harald,
thanks for you help and patience.
I am still confused with disagreement weight choosing during matching.
I tried to use input dataset with different sort order on MCODE coumn and result was the same (there was no any difference compared to my result which I describe in my previous post).
Input data:
<pre class="jivepre">BlockId Descr mcode 1 AAAAAAAAAAAAAAAAAABB Z20 1 AAAAAAAAAAAAAAAAAACC Z20 1 AAAAAAAAAABBBBBBBBBB Z20 1 AAAAAAAAAABBBBBBBBBB Z20 1 AAAAAAAAAAAAAAAAAAAA X10 1 AAAAAAAAAAAAAAAAAAAB X10 1 AAAAAAAAAAAAAAAAAAAC X10 1 AAAAAAAAAAAAAAAAAAAF X10 1 AAAAAAAAAAAAAAAAAAAE X10 1 AAAAAAAAAAAAAAAAAAAF X10 1 AAAAAAAAAAAAAAAAAAAG X10 1 AAAAAAAAAABBBBBBBBBB X10 1 AAAAAAAAAABBBBBBBBBB X20 1 AAAAAAAAAAAAAAAAAACC X20 1 AAAAAAAAAAAAAAAAAAEE Y20 1 AAAAAAAAAAAAAAAAAAFF Y20 1 AAAAAAAAAABBBBBBBBBB Y20 1 AAAAAAAAAAAAAAAAAAGG Y20
</pre>
Weight info:
<pre class="jivepre">VALUE FREQUENCY MPROB UPROB AGRWGT DISWGT X10 8 0.90 0.44 1.01 2.47 *ALLVALUESINTABLE* 0 0.90 0.01 1.84 2.90 Y20 4 0.90 0.22 2.01 2.95 Z20 4 0.90 0.22 2.01 2.95 X20 2 0.90 0.11 3.01 3.15
</pre>
Matching result:
<pre class="jivepre">QSMATCHSETID QSMATCHTYPE QSMATCHDATAID BlockId Descr mcode QSMATCHWEIGHT MCODE_WEIGHT 3 XA 3 1 AAAAAAAAAABBBBBBBBBB Z20 8.50 3 DA 13 1 AAAAAAAAAABBBBBBBBBB X20 3.54 2.95 3 DA 17 1 AAAAAAAAAABBBBBBBBBB Y20 3.54 2.95 3 DA 12 1 AAAAAAAAAABBBBBBBBBB X10 4.02 2.47 3 DA 4 1 AAAAAAAAAABBBBBBBBBB Z20 8.50 2.01 8 XA 8 1 AAAAAAAAAAAAAAAAAAAF X10 7.50 8 DA 14 1 AAAAAAAAAAAAAAAAAACC X20 0.76 2.47 8 DA 15 1 AAAAAAAAAAAAAAAAAAEE Y20 0.76 2.47 8 DA 18 1 AAAAAAAAAAAAAAAAAAGG Y20 0.76 2.47 8 DA 1 1 AAAAAAAAAAAAAAAAAABB Z20 0.76 2.47 8 DA 2 1 AAAAAAAAAAAAAAAAAACC Z20 0.76 2.47 8 DA 16 1 AAAAAAAAAAAAAAAAAAFF Y20 2.39 2.47 8 DA 5 1 AAAAAAAAAAAAAAAAAAAA X10 5.87 1.01 8 DA 6 1 AAAAAAAAAAAAAAAAAAAB X10 5.87 1.01 8 DA 7 1 AAAAAAAAAAAAAAAAAAAC X10 5.87 1.01 8 DA 9 1 AAAAAAAAAAAAAAAAAAAE X10 5.87 1.01 8 DA 11 1 AAAAAAAAAAAAAAAAAAAG X10 5.87 1.01 8 DA 10 1 AAAAAAAAAAAAAAAAAAAF X10 7.50 1.01
</pre>
For me strange things are:
 for first matching group (SetId=3, master record is with mcode="Z20") disagreement weight usually is from matched record (for record with mcode=Y20 weight is for Y20 value, for record mcode=X10 weights is for X10 value); But why for record mcode=X20 is choosed weight 2.95 (X20 has disagreement weight 3.15 )
 for second matching group (SetId=8, master record is with mcode="X10") for all mathed rows with different value in mcode Qualitystage choosed disagreement weight 2.47 (it is value for X10)
 why for the first matching group (SetId=3) Qualitystage didn't choose master record with mcode=X20 (in this group all values for Descr column is the same, but "X20" value has highest agreement weight)
Regards,
Oleg
My earlier response indicating selection based on the record order was incorrect. I did confirm that the selection between the pair will be based on whichever value is most frequently occurring.
What you see in this selection is correct:
 for instance, between Z20 and X20, Z20 occurs more frequently, therefore gets the disagreement weight of Z20.
 in the group with X10 as the master, X10 always occurs most frequently, therefore all pairings get the disagreement weight of X10.
Now I'm not sure why in the original post you may have seen a difference. Were you reviewing output in the Match Designer or on an output file?
Harald 
Re: how match weight is calculated?
20110202T14:42:30ZThis is the accepted answer. This is the accepted answer. smithha
 20110201T20:19:07Z
Hi Oleg,
My earlier response indicating selection based on the record order was incorrect. I did confirm that the selection between the pair will be based on whichever value is most frequently occurring.
What you see in this selection is correct:
 for instance, between Z20 and X20, Z20 occurs more frequently, therefore gets the disagreement weight of Z20.
 in the group with X10 as the master, X10 always occurs most frequently, therefore all pairings get the disagreement weight of X10.
Now I'm not sure why in the original post you may have seen a difference. Were you reviewing output in the Match Designer or on an output file?
Harald
I got these weights from Match Designer. Now I am cannot reproduce this problem,
maybe there are additional factors which produce this strange results (for example I changed input file format  record delimiter).
Now I see that during mismatch Qualitystage choose weight for most frequent value (or maximum disagreement weight from two values).
What about my third question from my last post? As I know master record is a record with a highest composite weight calculated against this record itself. And according this definition it is a record with mcode="X20" (agreement weights for Descr column is 6.49 because I set NOFREQ spec.var.processing, and weight for X20 value is 3.16 so composite weight is 9.65 > 8.5  composite weight for row with mcode="Y20"). Am I right?
Is there a way to output from Macth Stage for each row deatlisation of composite weight (weight for each attribute)?
Thanks,
Oleg 
Re: how match weight is calculated?
20170726T23:51:22ZThis is the accepted answer. This is the accepted answer.a) By default, the u probability for each comparison is calculated automatically by the matching process using the frequency information from the Frequency stage. This calculated u probability is important for columns with nonuniform distributions.
from
b) uprob will be based either on a value from a Match Frequency stage output, or what you enter in the match comparison.
(1)According to above reference webpage, I guess i can get automatically calculated U probability.
so my question is how can I configure a Frequency stage (or other stage) to get U probability
(2) when in match designer> match command>choose a certain column>properties
and it yells fails to retrieve freq data. How to fix this?
Thank you for your help.
Updated on 20170726T23:53:58Z at 20170726T23:53:58Z by yichen.kp