PDF to HTML (Hybrid Mode)
This mode converts PDF to both HTML and text, which results in a more accurate text extraction for indexing, but with a heavier performance cost. It has an intermediary format - CFHTML, including the first two HTML pages for title extraction, and all text content. After title extraction, the first two HTML pages are removed. Conversion is performed using the Perceptive Software converters with the Java converter framework. The following configuration option is available:
- Show Hidden Text - This options toggles the indexing and display of content which
typically isn't considered to be part of the printable document. This includes comments, notes,
annotations, and track-changes.
- Off - Hidden text will not be indexed
- Hidden - Hidden text will be indexed but not viewable in document cache previews
- Visible - Hidden text will be indexed and viewable in document cache previews