Summarization (Snippet Processing)

About this task

During the clustering process, Watson™ Explorer Engine creates a small summary for each input document. This summary, or "snippet", is a piece of text formed automatically from the content tags appearing in the document that are marked as being summarizable. Watson Explorer Engine stores this summarization in a new content tag with name="snippet" and type="html". This new content element is inserted as a child of the document.

When it is not necessary to include input content elements in the output, enabling the compact option will remove all content elements other than those with name="title" or name="snippet".

Summarizing an input document follows a process:

Procedure

  1. Select text passages from contents marked as summarizable that best match the labels of the cluster in which the document has been placed by Watson Explorer Engine. The size of these passages is specified by the option context_words.
  2. The text passages are combined to form a new content. The length of the text in this content is specified by the option context_total_words. This option value is not strict, but serves as a target for the total size.
  3. Some text from the start of the document may be included in the summary, depending on the value of the context_initial_words option.
  4. Query terms and cluster labels occurring in the summary text are displayed in bold text.

Results

The defaults are appropriate for most cases, but may be modified for more verbose or more terse output. To include the complete document with only the query terms displayed as bold text, use these options:

  -context_words -1 -bolding 1

To include the original document with no words displayed in bold, use these:

  -context_words -1 -bolding 0