Indexing the anchor text in links to forbidden documents

If a document includes links to documents that the Web crawler is forbidden to crawl, you can specify whether you want to retain the anchor text for those links in the index when you configure a Web crawler.

About this task

Directives in a robots.txt file or in the metadata of Web documents can prevent the Web crawler from accessing documents on a Web site. If a document that the Web crawler is allowed to crawl includes links to forbidden documents, you can specify how you want to handle the anchor text for those links.

You can specify whether you want to index the anchor text to forbidden documents when you configure the Web crawler. For maximum security, specify that you do not want to index the anchor text in links to forbidden documents. By not indexing anchor text, however, the search results might not include all of the documents that are potentially relevant to a query.

Procedure

To enable or disable the indexing of anchor text in links to forbidden documents:

In the Crawl and Import pane of the Collections view, locate the Web crawler that you want to configure and click Edit crawler properties.
Expand the advanced options and click Edit advanced Web crawler properties.
To index the anchor text in all of the documents that this crawler crawls, select the Index the anchor text in links to forbidden documents check box.
Users will be able to learn about pages that the Web crawler is not allowed to crawl by searching for text that is in the anchor text of links that point to those pages.
To exclude anchor text in links to forbidden documents from the index, clear this check box. Users will not be able to learn about pages that the Web crawler is not allowed to crawl. The anchor text will be excluded from the index in addition to the forbidden documents.
Click OK and then, on the Web Crawler Properties page, click OK again.
For the changes to become effective, stop and restart the crawler.

What to do next

To apply the changes to documents that were previously indexed, the documents must be recrawled so that they can be indexed again. If a previous crawl added information about forbidden documents to the index, that information will then be removed from the index.