Viewing details about Web crawler activity

By viewing details about Web crawler activity, you can assess overall performance and adjust the Web crawler properties and crawl space definitions as necessary.

Procedure

To view details about a Web crawler's activity:

  1. On the Collections view, expand the collection that you want to monitor and go to the Crawl and Import pane.
  2. If the Web crawler that you want to monitor is running or paused, click the icon to monitor details about the content crawled by the crawler.
  3. On the details page for the Web crawler, view or select the following options to see detailed statistics about the crawler's current and past activity.
    • Click Thread details to see how many threads are actively crawling Web sites and how many are in an inactive state.
    • Click Active sites to see information about the Web sites that the crawler is actively crawling.
    • Click Recently crawled URLs. This information shows what the crawler recently crawled. If the items in the list do not change as you refresh the view, then no crawling is occurring.
    • Click Crawler history to view reports about past crawler activity.
    • In the URL status area, type a URL that you want to see information about.
      1. Click URL details to see status information for the URL. You can request URL details only for URLs that were previously crawled.
      2. Click Site details to specify information that you want to include in a report about the Web site that the URL belongs to. You can request site details for a previously crawled Web site or for a Web site that has not yet been crawled.

      For example, use this option to see whether a URL is in the crawl space, whether it has been crawled or only discovered, when it should be crawled again, and information about the last attempt to crawl the Web site. You can also ask to see the contents of the robots.txt file for the Web site, which might help you determine why the site is not being crawled.