Topic
  • 2 replies
  • Latest Post - ‏2012-09-05T09:26:55Z by VD7T_John_Sutton
VD7T_John_Sutton
VD7T_John_Sutton
2 Posts

Pinned topic Duplicate Documents

‏2012-09-03T16:27:13Z |
Hi

I'd like to be able to identify which documents in a set of search results contain duplicates so we can identify the duplicates in the source system and eventually cleanse them as required.

I can see the % of documents that are duplicates and then painstakingly click on each show similar documents icon in the document results view but there does not appear to be a means to create a report on the details of just the duplicate documents.

Any help or thoughts on this would be much appreciated.

Regards
Updated on 2012-09-05T09:26:55Z at 2012-09-05T09:26:55Z by VD7T_John_Sutton
  • bfoyle
    bfoyle
    60 Posts

    Re: Duplicate Documents

    ‏2012-09-04T04:31:16Z  
    There is an internal field "$dup" is used to eliminate duplicates. Turn duplicate detection on and you should be able to use the following queries. You can then use flagging to flag either the masters or the duplicates of the masters and then maybe export the xml to take further action or validate.

    <original query> \$dup:yes => this query will return documents that match the original query only from duplicates.
    <original query> -\$dup:yes => this query will return documents that match the original query only from master documents.
  • VD7T_John_Sutton
    VD7T_John_Sutton
    2 Posts

    Re: Duplicate Documents

    ‏2012-09-05T09:26:55Z  
    That is perfect - many thanks for your help