Topic
  • 3 replies
  • Latest Post - ‏2016-09-15T21:24:47Z by spiggle
wings123
wings123
2 Posts

Pinned topic mpxlink's slow performance

‏2016-09-15T12:33:26Z | initial load mdm mdm-migration mpxlink se

Hi,

We plan to use the BXM utilities for our initial load (approx 25 mil records). I am testing it for a a sample of 1.3 mil records. After Weight generations, threshold adjustments, I am now executing the mpxlink utility. The mpxdata and mpxcomp completed in no time. However, the mpxlink is running for more than 24hrs now. This is surprising given that mpxcomp completed in just abt an hour's time.

1. I dont see any system resources as bottleneck (have monitored the CPU and memory mgmt and they seem pretty ok).

2.From the logs, I see that the bxmtask of the mpxlink is the culprit. Not sure what it is doing. So far it says that it has processed more than 30mil records. This step started abt 22 hrs ago.  I do know that we are expecting a lot of tasks (this is as per design).

3. Is it normal (such long running mpxlink task). If for 1.3 mil source records it takes this long, for 25 mil, would it take weeks to complete (assuming the sample is the right representation of the entire dataset).

Can someone kindly help me please?

Thanks

  • spiggle
    spiggle
    7 Posts

    Re: mpxlink's slow performance

    ‏2016-09-15T20:34:53Z  

    It all depends on size of dataset and number of tasks, but bottom line is mpxlink should take something like a minute to run.

    You say you're expecting lots of tasks and with millions of records that's fine and would be expected, but I think you're finding more than you think, or larger ones to more precise. If you use transitive linking then A may have a low scoring dubious match with B, who has a dubious match with C and so on. If you increase your CR threshold this will fix this issue. How much to increase it will be a little trial and error, but any time it runs more than 5 minutes you can just kill the job and try a new CR value.

    I'd guess that if you set -notskRelatedMembers on for the mpxlink this would also stop the process from carrying out the transitive linking during the mpxlink process. Not tried this, as I'd rather have the CR set at a level that prevents this type of crazy linking from happening in the first place.

     

  • wings123
    wings123
    2 Posts

    Re: mpxlink's slow performance

    ‏2016-09-15T20:51:55Z  
    • spiggle
    • ‏2016-09-15T20:34:53Z

    It all depends on size of dataset and number of tasks, but bottom line is mpxlink should take something like a minute to run.

    You say you're expecting lots of tasks and with millions of records that's fine and would be expected, but I think you're finding more than you think, or larger ones to more precise. If you use transitive linking then A may have a low scoring dubious match with B, who has a dubious match with C and so on. If you increase your CR threshold this will fix this issue. How much to increase it will be a little trial and error, but any time it runs more than 5 minutes you can just kill the job and try a new CR value.

    I'd guess that if you set -notskRelatedMembers on for the mpxlink this would also stop the process from carrying out the transitive linking during the mpxlink process. Not tried this, as I'd rather have the CR set at a level that prevents this type of crazy linking from happening in the first place.

     

    Great. Thanks for the quick response, Spiggle.

    Regarding increasing the CR, I had to specify the current CR based on the requirements (and the weights). I may not be able to tweak it beyond a point but let me give it a shot.

    I was not aware of the -notskRelatedMembers option but would be interesting to try this out.

    I also see a suggestion in the knowledge center to specify 'noTskSets' for improving the performance. However, it is not clear as to what would be the impact on the result set. If there is no impact on the result set, when would this option be set to false? Please let me know if you are aware.

    Thanks in advance.

  • spiggle
    spiggle
    7 Posts

    Re: mpxlink's slow performance

    ‏2016-09-15T21:24:47Z  
    • wings123
    • ‏2016-09-15T20:51:55Z

    Great. Thanks for the quick response, Spiggle.

    Regarding increasing the CR, I had to specify the current CR based on the requirements (and the weights). I may not be able to tweak it beyond a point but let me give it a shot.

    I was not aware of the -notskRelatedMembers option but would be interesting to try this out.

    I also see a suggestion in the knowledge center to specify 'noTskSets' for improving the performance. However, it is not clear as to what would be the impact on the result set. If there is no impact on the result set, when would this option be set to false? Please let me know if you are aware.

    Thanks in advance.

    You don't need TskSets unless you are need this to run some data analytic reports. This option is usually used for a data analytic exercise but not for a go-live scenario. When you say CR based on requirements, you have to realise that with CR too low some task sets can be 1000s of members, or even 10,000s or more. These have so many members that they're like a black hole sucking in so many other members. Just raising the CR even by a little can sometimes stop this from occurring. Even if you get mpxlink to complete in a timely manner, do the business really want tasks that when opened contain 1000s of members, most of which are totally unrelated, they just happen to have the same name, or DoB or whatever.

    Also, with 25 million members there'll be so many tasks that there's a fair chance that low scoring tasks will almost never get looked at. Yes, they're nice to have, but there not usually a big loss. But I do realise every business requirement is different.