By viewing reports about past Web crawler activity, you
can assess overall performance and adjust the Web crawler properties
and crawl space definitions as necessary.
Before you begin
If your administrative role limits you to monitoring collections,
you can view crawler statistics and create reports about crawler activity,
but you cannot change the crawler's behavior (such as starting or
stopping the crawler).
About this task
Different types of reports can provide you with information
about Web crawler activity. For certain types of reports, information
is returned as fast as it can be collected from the crawler's internal
database. The Site report and HTTP status code reports take time to
create. If you create these types of reports, you can specify an email
address for receiving the report instead of waiting for results to
be returned to the administration console.
Procedure
To create Web crawler reports:
- Expand the collection that owns the Web crawler that you
want to monitor and go to the Crawl and Import pane.
- If the Web crawler that you want to create reports for
is running or paused, click the icon to monitor details about the
content crawled by the crawler.
- On the details page for the Web crawler, select an option
for the type of report that you want to create:
- In the Crawler status summary area, click Crawler
history to create reports about the crawler and all of
the sites that it discovers or crawls.
- In the URL status area, specify the URL
of specific site that you want to create a report for, and then click Site
details.
- For both crawler history and site reports, you can select
the check box of each statistic that you want to see in a report,
then click View report.
For these
types of statistics, the crawler returns a report to the administration
console as fast as it can retrieve information from its internal database.
- If you are creating a crawler history report, you can specify
options for creating a Site report, then click Run Report.
This report is created with the statistics that you choose
to include and saved in a file that you specify (the file name must
be absolute). You can specify that you want to receive email after
the report is created.
- If you are creating a crawler history report, you can specify
options for creating an HTTP status code report, then click Run
Report.
This report provides information
about the number of HTTP status codes distributed per site. The report
is saved in a file that you specify (the file name must be absolute).
You can specify that you want to receive email after the report is
created.
Use this report to see which sites return a large number
of 4xx status codes (which indicate that pages were not found), 5xx
status codes (which indicate a server problem), 6xx status codes (which
indicate connectivity problems), and so on.
This report is most
useful when the crawler has been active for some time (for example,
a crawler that has been active for weeks). It can help you identify
vanished sites, newly arrived sites, sites with huge numbers of URLs
(which might indicate redundant crawling of a Lotus Notes® database), and sites with
a recursive file system served by the HTTP server. If the sites with
large numbers of HTTP status codes are not contributing to the index,
you can improve the performance of the crawler by removing the sites
from the crawl space.