Manually Cleaning Up the Index for a Collection

As the data associated with a search collection is indexed, index data is initially stored in RAM and then flushed to small files on disk. This is done by a specific thread, known as the builder, within the indexer. The data in these temporary files is not permanent until it is merged into the actual index for that collection, which is done by a specific thread, known as the merger, within the indexer.

The indexer automatically merges indices and segments into a single index that can then replace those indices and segments. However, for performance reasons, you may want to manually initiate a merge at some point. The merge operation not only combines all existing indices and segments for a collection into a single index, but also processes all pending deletes, which can substantially reduce collection size if many indexed URLs have been deleted. Indices that contain many deleted items can occur when the data that the index reflects has been substantially reduced in size and/or number of indexable items and the collection has been refreshed.

Manually initiating an index merge operation is done using the search-collection-indexer-restart function. The following is an example of using the SOAP version of this function in C#:

    SearchCollectionIndexerFullMerge merge = new SearchCollectionIndexerFullMerge();
    merge.collection = COLLECTION;
    port.SearchCollectionIndexerFullMerge(merge);

Once the merge completes, the indexer will begin using the newly created index file and will remove the segments and indices that it merged into the new index.

Note: If the indexer terminates or is shut down while a merge is being done, the merged index will be discarded and, when restarted, the search collection will continue to use the indices and segments that were being used before you began the merge operation. No index data will be lost, but you will either have to re-initiate the merge operation or wait for the indexer to automatically initiate a merge.