I accidentally made the crawlspace for my web crawler too large, and many urls were crawled and parsed that I did not want. I have not found a successful way to remove all of these documents from the search results. Is there an easy way to remove all the documents crawled by a specific crawler from the index? Or possibly remove all web sources from a collection that has other sources as well?
dschoppmann 270002P2U58 Posts
Re: Deleting Web Documents from Index2013-12-12T16:04:29ZThis is the accepted answer. This is the accepted answer.
You can remove documents form index based on their URL. Open the ESAdmin web console and navigate to Parser/Indexer of the collection. Switch to edit mode and click on "Remove URIs from the index". Be careful with usage of wildcard star (*)! Alternatively you can delete the crawler. This causes a deletion of all documents crawled by this crawler. For both ways the parser needs to be running.