I'd like to be able to identify which documents in a set of search results contain duplicates so we can identify the duplicates in the source system and eventually cleanse them as required.
I can see the % of documents that are duplicates and then painstakingly click on each show similar documents icon in the document results view but there does not appear to be a means to create a report on the details of just the duplicate documents.
Any help or thoughts on this would be much appreciated.
NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
This topic has been locked.
2 replies Latest Post - 2012-09-05T09:26:55Z by VD7T_John_Sutton
Pinned topic Duplicate Documents
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2012-09-05T09:26:55Z at 2012-09-05T09:26:55Z by VD7T_John_Sutton
bfoyle 060001WDQ360 PostsACCEPTED ANSWER
Re: Duplicate Documents2012-09-04T04:31:16Z in response to VD7T_John_SuttonThere is an internal field "$dup" is used to eliminate duplicates. Turn duplicate detection on and you should be able to use the following queries. You can then use flagging to flag either the masters or the duplicates of the masters and then maybe export the xml to take further action or validate.
<original query> \$dup:yes => this query will return documents that match the original query only from duplicates.
<original query> -\$dup:yes => this query will return documents that match the original query only from master documents.