Tuning results on the Text Analytics Workbench

The Text Analytics Workbench contains the extraction results and the category model that is contained in the text analytics package. It is an interactive workbench where you can explore and fine-tune the extracted results, build and refine categories, and build category model nuggets.

Concepts data

During the extraction process, the text data is analyzed to identify interesting or relevant single words such as airport or location, and word phrases such as airport pick-up. These words and phrases are collectively referred to as terms. Using the linguistic resources, the relevant terms are extracted, and similar terms are grouped under a lead term that is called a concept.

In this way, a concept might represent multiple underlying terms. It depends on how the term is used in your text and the set of linguistic resources you're using.

You can also use a Filter to select a subset of concepts:
Figure 1. Text Analytics Workbench - filtering concepts
Text Analytics Workbench - filtering concepts
And you can select from different options:
Figure 2. Text Analytics Workbench - filter options
Text Analytics Workbench - filter options

If you want to remove the Filter and display all concepts, select Clear Filter.

Text link analysis

Text link analysis (TLA) is a pattern-matching technology that compares TLA rules to extracted concepts and relationships that are found in your text. On the Text links tab, you can build and explore the TLA patterns that are found in your text data.

You can select a Type Pattern (for example, <Services> + <Positive>) to see a preview of the text in the document. As the text in the Document preview cell might be truncated, you can click a cell to display the entire text.
Figure 3. Text Analytics Workbench - patterns
Text Analytics Workbench - Text links tab. Shows the type patterns in the Text link tab. On the side is the Preview pane, which has a table with three columns. The three columns are Entry, Document Preview, and Category path.

Category data

On the Categories tab, you can build and manage your categories. After the concepts and types are extracted from your text data, you can begin building categories automatically by using techniques such as concept inclusion, semantic network (in English only), or manually.

Since this example flow uses a text analysis package (TAP), the category model is already populated.

Each time a category is created or updated, you can see whether any text matches a descriptor in a given category by clicking Score to score the documents or records. If a match is found, the document or record is assigned to that category. The result is that most, if not all, of the documents or records are assigned to categories based on the descriptors in the categories.
Figure 4. Text Analytics Workbench - category data
Text Analytics Workbench - category data
You can expand each category or subcategory, and select them or a descriptor to see the source data:
Figure 5. Text Analytics Workbench - category source data
Text Analytics Workbench - category source data