Examining and Acting on Converter Errors

As discussed in Custom Converters, custom converters enable you to specify a fallback content-type that is used if a converter fails or a sub-command generates an error code. In these cases, the original data is preserved unmodified and is passed on with the specified content-type for subsequent processing.

Watson™ Explorer Engine also provides a default fallback content-type. When any converter that does not define its own fallback content-type fails, the content type vivisimo/fallback is used. Per-converter fallback content-type definitions therefore have priority over the generic fallback content-type that is provided by Watson Explorer Engine.

The Watson Explorer Engine default configuration includes a rule that converts from vivisimo/fallback to application/vxml-unnormalized, discards the input data, and produces an empty <document/> node instead. This rule guarantees that every URL that enters the converter framework generates at least one document.

Tip: A related content-type, vivisimo/crawler-error is produced when the crawler encounters an error crawling a URL. When this occurs, the crawler generates an empty crawl-data node and immediately falls back to this content type. Watson Explorer Engine does not directly act upon this information, but provides it so that customers can identify when errors occur by being sure that every input URL generates at least one document in the index.

To increase the usability of fallbacks, you can also inspect and act upon the error that caused the fallback content-type to be assigned. This is done by specifying an XPath conditional on a converter rule. For example, the following XPath condition would cause a rule to be applied only if an error had occurred to trigger the fallback content-type and that error contained the string "permission denied":

      '*permission denied*', 'wc')
Note: When testing for errors, make sure that you examine the correct attribute. Errors that cause the fallback content-type to be used are reported in the crawl-data element's fallback-error attribute, rather than the error attribute. The fallback-error attribute accumulates all errors that are caught by a fallback, not just the last one. Also, when the global fallback content-type is triggered, the content type that triggered it is added as an attribute called fallback in case you want to examine that in an XPath expression.