Topic
  • 3 replies
  • Latest Post - ‏2016-09-08T01:20:52Z by spiggle
GauravCTS02
GauravCTS02
10 Posts

Pinned topic Negative score generated during mpimcomp for exactly same addresses

‏2015-11-03T23:43:32Z | mdm-migration

Hi ,

I am observing a strange issue in my project. We have processed records for a new source through mpxcomp and mpxlink command line utilities in IXM mode. When I am doing a member comparison of Households, I can see that in a lot of cases a negative score is generated even though the addresses are exactly the same.So even though two records seem to be from the same household on the first look, they do not get the score to cross AL threshold. Below is the example of one mpimcomp job-

 

Comparing proband 42102025 with candidate 2101870, score= 8.0

HFNAME: KEITH|H|               -- KEITH|HOGG|                                                        , wgt= +4.55, mcc=P
HLNAME: KEITH|                 -- KEITH|                                                                              , wgt= +3.00, mcc=E
HADDRESS: 46|YOUNG|ST|BACCHUS|MARS-- 46|YOUNG|ST|BACCHUS|MARS                   , wgt= -0.51, mcc=P
HHSEX:                         --                                                                                                           , wgt= +0.00, mcc=M
HHEMAIL:                       --                                                                                                          , wgt= +0.00, mcc=M
HHID:                          --                                                                                                             , wgt= +0.00, mcc=M
HPARADD:                       --                         , wgt= +0.00, mcc=M

HFNAME Detail:
  HFNAME[01]: KEITH            -- KEITH                   , wgt= +2.94, mcc=X
  HFNAME[02]: H                -- HOGG                 , wgt= +1.61, mcc=I


HLNAME Detail:
  HLNAME[01]: KEITH            -- KEITH                   , wgt= +4.00, mcc=X


HADDRESS Detail:
  HADDRESS[01]: 46             -- 46                                                  , wgt= +4.24, mcc=X
  HADDRESS[02]: YOUNG          -- YOUNG                                 , wgt= +3.91, mcc=X
  HADDRESS[03]: ST             -- ST                                                  , wgt= +1.17, mcc=X
  HADDRESS[04]: BACCHUS        -- BACCHUS                         , wgt= +6.66, mcc=X
  HADDRESS[05]: MARSH          -- MARSH                                  , wgt= +6.66, mcc=X
  HADDRESS[06]: VIC            -- VIC                                                , wgt= +6.66, mcc=X
  HADDRESS[07]: 6340           -- 6340                                           , wgt=+13.00, mcc=X

 

This is happening for a lot of cases.The mpi_wgt2dim weight file has negative weights only when there is a edit distance of more than 2 in the address fields. It looks like below-

 

1|1|A|CMPHH-HADDRESS-2DIM|0|-800|-800|-800|-800|-800|-800|-800|-800|0|0|0|0|0|0|0|0|
1|1|A|CMPHH-HADDRESS-2DIM|1|810|820|810|810|810|810|810|810|0|0|0|0|0|0|0|0|
1|1|A|CMPHH-HADDRESS-2DIM|2|0|520|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
1|1|A|CMPHH-HADDRESS-2DIM|3|-1|-2|-3|-4|-5|-6|-7|-8|0|0|0|0|0|0|0|0|
1|1|A|CMPHH-HADDRESS-2DIM|4|-51|-52|-53|-54|-55|-56|-57|-58|0|0|0|0|0|0|0|0|
1|1|A|CMPHH-HADDRESS-2DIM|5|-101|-102|-103|-104|-105|-106|-107|-108|0|0|0|0|0|0|0|0|
.......

.......

.......

If we see the above case then a weight of -0.51 is assigned which is when there is a edit distance of 3 in weights and phone number is missing. Here I do not see any difference in the address fields at all.

Am I missing something here , do I need to look at any other place as well which might be the cause of this negative weights ?Please help with your suggestions. Thanks

  • KaranBal
    KaranBal
    227 Posts

    Re: Negative score generated during mpimcomp for exactly same addresses

    ‏2015-11-05T00:47:09Z  

    MDM compares the values to see if they're anonymous, or if some filter like false positive filter is active. If not, then we will check the weights in the corresponding weight table. So the values have to come from the database if they are not being screened by a filter. You can set MAD_ALGO=1 before running mpimcomp to get more detailed console output.

    If there is still no clear cause, open a PMR.

  • GauravCTS02
    GauravCTS02
    10 Posts

    Re: Negative score generated during mpimcomp for exactly same addresses

    ‏2015-11-05T05:35:04Z  
    • KaranBal
    • ‏2015-11-05T00:47:09Z

    MDM compares the values to see if they're anonymous, or if some filter like false positive filter is active. If not, then we will check the weights in the corresponding weight table. So the values have to come from the database if they are not being screened by a filter. You can set MAD_ALGO=1 before running mpimcomp to get more detailed console output.

    If there is still no clear cause, open a PMR.

    Hi Karan,

    These values are not part of anonymous file, also If it is part of anonymous file then it won't be used for comparison at all, but in my case it is used for comparison and negative score is generated. I do have FPF2 in my algorithm but that is only for Individuals and we have not configured it for households. My concern is solely the address part, even though all the tokens have matched exactly yet a negative score has been generated.

    I had pasted a part of my weight file according to which negative weight assignment should be there only if the address token differ by at least an edit distance of two which is not the case.

    I am observing this for a lot of comparisons.

  • spiggle
    spiggle
    7 Posts

    Re: Negative score generated during mpimcomp for exactly same addresses

    ‏2016-09-08T01:20:52Z  

    In cases such as this I would suggest looking at the PARM weights in your mpi_wgtsval table. When comparing addresses the system first scores the two addresses independently of each other to get an idea of how verbose the address(s) are.  It then compares the two addresses to see how much they have in common. It then compares how much they have in common compared to the entire address(s), so for example, matching on 2 out of 3 tokens, would not be considered as good as 9 out of 10 tokens. In practice it's also looking at the weights for each token, not just the number of tokens. 

    But the important part here I think is that when the score for the address is calculated it also takes into account the minimum score from the PARM weights. Let's say your address scores 1500 and the matched parts score 1500 and you get a perfect match. But if your address scores 800 and the matched parts score 800, but let's say the PARM minimum weight says 1200, then it scores it as 800 out of 1200, which is not an exact match even though the addresses might be identical.

    The above is a high level description to what is a long description in the IBM doco. I may have some parts not 100% correct, but the gist of it is correct.

    You may need to look at min values for STREET, POSTAL and REGION.