strip-html

This program is useful for reducing the size of the HTML that will be stored in the index and, in the process, converts HTML to IBM XML.

usage: [--title-weight t] [--snippet-weight w] [--anchors] [--bad-encoding]
[--pdf] [--strip-br] [--no-title] [--strip-all]

The options are:

In addition to the operations that can be specified using its options, this program removes all attributes on tags, removes extra spaces, removes comments, JavaScript and CSS sections.