The Converting discusses relaxing the 1-1 relationship between documents and URLs. In this section, we explain how the vse-key attribute of Documents and the add-to attribute of Contents are handled. Of particular importance is the way that default values are filled in based on the anchor, URL and container XML nodes.
At a basic level, every document served by the index has a unique vse-key associated with it. If you want to add content to a document, you create a Content element that has an add-to attribute that identifies the vse-key of the destination document.
As documents are converted, they develop an anchor. For example, if you expand a tar file, each of the files that it contains will be given an anchor that starts with the URL (or anchor) of the original tar file, to which #{filename-i} is appended for each of the files in the archive.
Also, as documents are converted, new anchors can be created by specifying a fork action of fork-with-new-name, which causes a copy of the current data to be created and both branches of processing to acquire a new anchor name (by appending #n, where n is a number that increases).
By default, all vse-key attributes are URL-normalized relative to the anchor developed during conversion (which, by default, is the original fetched URL). That is, if you specify a key of fred flintstone and crawled http://vivisimo.com/news.html, the resulting key would be http://vivisimo.com:80/fred_flintstone/.
Internally, the indexed information passed to the search-engine is XML where there is a crawl-url element that contains one or more crawl-data elements which contain the actual XML to be indexed. When documents are processed from a crawl-data node, they inherit new attributes (attributes that don't already exist) from their containing crawl-data element and then inherit new attributes from the containing crawl-url element. Thus, a vse-key on a crawl-data will become the vse-key on the document, if one doesn't already exist.
The actual vse-key that is used is based on the following values, in order of priority:
When a content element appears within a document, the only potential key is the add-to attribute. Otherwise, it is added to the containing document node.
Finally, when a content appears outside of a document, the following priorities apply for determining the add-to key:
These keys will then be treated as a potentially-relative URL and normalized based on the anchor of the data. If you want to use your own keys, you should also add the attributes vse-key-normalized, forced-vse-key-normalized or vse-add-to-normalized (whichever is applicable) to disable the normalization step.
Contents will be indexed even if there is no document that contains the key. A content that does not have a containing document will never be returned as a search result but will still take time and space in the index. There is an attribute on content elements, vse-add-to-check="crawlable" that will only add the content to the index if the (normalized) key corresponds to a URL that could potentially be (but not necessarily has been) crawled by the current configuration.