HTML Body Text Tagger

About this task

The HTML body text tagger analyzes HTML documents for text and structure, tagging the HTML elements that seem most likely to be body text instead of navigation, layout elements, headers, or footers.

This converter is not enabled by default. To enable it in a search collection, do the following:

Procedure

  1. Go to the collection's Configuration > Converting tab.
  2. Click Add a new converter, select HTML body text tagger from the list in the dialog that displays, and click Add.
  3. Leave the Type-In and Type-Out as the default, text/html. Output forking should be left as (unset).
  4. Click OK to save.

Results

For proper operation, the Component: HTML to XML converter, which is enabled in all search collections by default, must appear below the Component: HTML body text tagger in the list of converters.