I'd like to be able to identify which documents in a set of search results contain duplicates so we can identify the duplicates in the source system and eventually cleanse them as required.
I can see the % of documents that are duplicates and then painstakingly click on each show similar documents icon in the document results view but there does not appear to be a means to create a report on the details of just the duplicate documents.
Any help or thoughts on this would be much appreciated.
bfoyle 060001WDQ360 Posts
Re: Duplicate Documents2012-09-04T04:31:16ZThis is the accepted answer. This is the accepted answer.There is an internal field "$dup" is used to eliminate duplicates. Turn duplicate detection on and you should be able to use the following queries. You can then use flagging to flag either the masters or the duplicates of the masters and then maybe export the xml to take further action or validate.
<original query> \$dup:yes => this query will return documents that match the original query only from duplicates.
<original query> -\$dup:yes => this query will return documents that match the original query only from master documents.