Anchor text analysis

If you enable collection security, the global analysis processes apply special rules for indexing the anchor text in documents crawled by Web crawlers. If you do not enable collection security, you can specify whether you want to index the anchor text in links to forbidden documents when you configure individual Web crawlers.

Anchor text is the information within a hypertext link that describes the page that the link connects to. For example, in the following link, the text Query Syntax is the anchor text in a link that connects to the syntax.htm page:

<a href="../doc/syntax.htm">Query Syntax</a>

Typically, the Web crawler follows links in documents to crawl additional documents and includes these linked pages in the index. During global analysis, the index processes associate the anchor text not only with the document in which it is embedded (the source document) but also with the target document. In the example above, the anchor text Query Syntax is associated with the target page syntax.htm and with the source page that contains the anchor construct. This association enables the target document to be retrieved by queries that specify text that appears in the source document. The association presents a security risk, however, if users are allowed to view the target document but not the source document.

If you enable collection security when you create a collection, anchor text processing is disabled. The anchor text is no longer indexed with a document unless it actually appears in the document or in the document metadata. This security control ensures that users are not exposed to information in documents that they are not allowed to access; a document is returned in the search results only if its own content or metadata matches the query.

Enabling collection security can enhance the security of Web documents by enabling users to search only the documents with security tokens that match their credentials. However, by not processing anchor text, the search results might not include all of the documents that are potentially relevant to a query.

If you do not enable collection security, you can specify whether you want to index the anchor text in links to forbidden documents when you configure advanced Web crawler properties.