Monitoring crawlers

You can view general information about the status of each crawler in a collection or select options to view detailed information about a crawler activity.

Before you begin

If your administrative role limits you to monitoring collections, you can view crawler statistics but you cannot change a crawler's behavior (such as starting or stopping the crawler).

Procedure

To monitor a crawler:

  1. On the Collections view, expand the collection that you want to monitor and go to the Crawl and Import pane.
  2. If you want to start or stop multiple crawlers, click the icon to select the crawlers that you want to start or stop.
  3. If a specific crawler is running or paused and you want to see detailed status information about the crawler, click the icon to monitor details about the content crawled by the crawler.
    The types of statistics that you see vary with the crawler type.

    If your administrative role allows you to administer processes for a collection, you can start and stop the crawler while you view details about crawler activity. If the crawler can be scheduled, you can also enable and disable the crawling schedule.

    For non-web crawlers, two sets of statistics are maintained. One set relates to the crawler session, which begins when the crawler is started. The other set relates to actual crawler activity. The time displayed reflects the time that the crawler last spent processing documents, not the total time since the crawler session began.

  4. If the crawler is stopped and you want to start a crawler session, click the icon to start or resume the crawler.
    For Web crawlers:
    If the crawler was stopped, the crawler begins crawling again and crawls the entire crawl space. If the crawler was paused, it resumes crawling at the beginning of the target where it was paused.

    If you want to force the crawler to start a full crawl immediately, click the icon to view details about the crawler, and then click the Start a full recrawl icon. The crawler starts crawling the entire crawl space, including pages that did not change since the last time that they were crawled. You might want to recrawl all documents, for example, if you change the rules for parsing documents and want to apply those rules to documents that were previously indexed.

    For all other crawler types:
    If the crawler was stopped, the crawler begins crawling at its scheduled date and time. The first time that the crawler crawls a data source, the crawler does a full crawl. When a scheduled crawl repeats, the crawler crawls either all updates to the data source (document additions, deletions, and modifications), or only new and modified documents. You can specify the type of crawl to run when a crawler session is started when you configure the crawler properties or crawler schedule.

    If you select the option to crawl new and modified documents only, the crawler looks for documents with modification dates that are later than the previous crawl time. If you copy files to a resource, the modification date might not change, which means that the crawler might not detect that the files were added to the resource. For example, if you copy files to a Windows folder, Windows does not automatically change the modification date of the files. To ensure such files are crawled, select the option to crawl all updates or a full crawl.

    If you want to force a crawler to start crawling, click the icon to view details about the crawler. Then, in the crawl space details area, click the icon for the type of crawl that you want to start: a full crawl, all updates, or new and modified documents only. You must click the icon to start crawling each data source that you want to crawl (such as a server, database, or subfolder).

  5. If the crawler is running and you want to stop it, click the icon to stop the crawler.
    The crawler stops crawling data until you restart the crawler.