Converter Process State

At any given time, the state of the data being converted involves three pieces of information:

  • a type, which is the current type-in value associated with the data
  • an anchor which is the current base URL for the data
  • the data itself at this point in the conversion process

The conversion process operates by applying rules to generate new states. As mentioned previously, whether or not a given rule is applied depends on conditions that can include the input document type (the type-in for the data), logical conditions such as URL pattern matches, data content, and so on.

The initial state of the conversion process has a type-in value that will be one of the following, in order of precedence:

  • The forced content-type of the crawled URL. (This can be set in the Retrieval and Encodings section of a search collection's Configuration > Crawling tab.)
  • If a forced content-type is not specified, it will be the content-type returned by the server
  • If the content-type from the server is not known, it will be the default content type of the crawled URL
  • If a default content type is not provided, the content-type will be unknown, which will be processed by the guess-content converter to determine the actual content type