Topic
  • 3 replies
  • Latest Post - ‏2016-09-06T22:54:21Z by spiggle
ayanbiswas
ayanbiswas
18 Posts

Pinned topic INCORRECT LINKAGES IN IBM MDM v11.3

‏2016-03-11T04:11:46Z | mdm mdm-migration virtual

Hi,

I ran mpxcomp & mpxlink job on 30 k records and found out that around 20 k records got linked in one EID !

When I tried to review the records manually , I see that there are no similarities between some of the records.

what should be the strategy to figure out the issue?

I tried to enable algorithm log for mpxcomp and rerun it , but the log size is becoming huge(it had already exceeded 100 gb , so I killed the job)  and the job takes ages to complete .

 

  • Venkat Podatharapu
    Venkat Podatharapu
    75 Posts

    Re: INCORRECT LINKAGES IN IBM MDM v11.3

    ‏2016-03-11T16:12:39Z  

    1. Check the Clerical Review and Auto Link threshold values.

    2. Check the SystemOut.log and Trace Log to investigate further on this issue.

    3. mpxcomp and mpxlink Log already exceeded to 100GB for this run ? if log is creating 100gb for this run then there must be some issue in the process, also check with your WAS team to change the log properties to only Error.

  • KaranBal
    KaranBal
    227 Posts

    Re: INCORRECT LINKAGES IN IBM MDM v11.3

    ‏2016-03-21T16:41:56Z  

    I don't see a reason to think this is an issue. There may be a lot of duplicates or the thresholds may not be set or you may have transitive linking where records have to match any member of the entity to join it rather than all etc. There are plenty of potential causes but we don't have enough details to narrow it down.

  • spiggle
    spiggle
    7 Posts

    Re: INCORRECT LINKAGES IN IBM MDM v11.3

    ‏2016-09-06T22:54:21Z  

    Generally if you end up with very large entities it will be down to transitive linking. Match scores are based on weights, and these weights usually are a combination of positive weights for good attribute matches and negative weights for bad attribute matches - all pretty basic stuff.

    This means two people may have the same name, but the differences in their dates of birth, addresses etc will give enough of a negative to not result in a linkage or a task being created.

    The problem can arise where you have some customers in your DB who only have minimal data, e.g. names but not much else. This means one person called Mary Jones can end up linking with hundreds of other Mary Jones who have a multitude of dates of birth, addresses, because you got a plus weight on name, and nothing else. In turn, one of these Mary Jones may previously have been Mary Smith, so now you might start bringing in many other Mary Smiths,, etc etc......

    One rule of thumb is to make your CR at least a little higher than just a good name score. But this is a rule of thumb I use, everybody's requirements are different. If all your data is well populated then you are less likely to have this type of issue.

    What you may find is a CR of say #.0 gives you huge tasks, but #.1 prevents it from occurring.

    In your case you're talking AL not CR, but I'd guess the issue is along these lines.