IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
1 reply Latest Post - ‏2013-12-12T16:04:29Z by dschoppmann
3 Posts

Pinned topic Deleting Web Documents from Index

‏2013-08-07T19:55:34Z |

I accidentally made the crawlspace for my web crawler too large, and many urls were crawled and parsed that I did not want. I have not found a successful way to remove all of these documents from the search results. Is there an easy way to remove all the documents crawled by a specific crawler from the index? Or possibly remove all web sources from a collection that has other sources as well?

  • dschoppmann
    8 Posts

    Re: Deleting Web Documents from Index

    ‏2013-12-12T16:04:29Z  in response to murb

    You can remove documents form index based on their URL. Open the ESAdmin web console and navigate to Parser/Indexer of the collection. Switch to edit mode and click on "Remove URIs from the index". Be careful with usage of wildcard star (*)! Alternatively you can delete the crawler. This causes a deletion of all documents crawled by this crawler. For both ways the parser needs to be running.