Converter Process State
At any given time, the state of the data being converted involves three pieces of information:
- a type, which is the current type-in value associated with the data
- an anchor which is the current base URL for the data
- the data itself at this point in the conversion process
The conversion process operates by applying rules to generate new states. As mentioned previously, whether or not a given rule is applied depends on conditions that can include the input document type (the type-in for the data), logical conditions such as URL pattern matches, data content, and so on.
The initial state of the conversion process has a type-in value that will be one of the following, in order of precedence:
- The forced content-type of the crawled URL. (This can be set in the Retrieval and Encodings section of a search collection's Configuration > Crawling tab.)
- If a forced content-type is not specified, it will be the content-type returned by the server
- If the content-type from the server is not known, it will be the default content type of the crawled URL
- If a default content type is not provided, the content-type will be unknown, which will be processed by the guess-content converter to determine the actual content type