Configuring sentiment analysis for content analytics collections

You can configure the parser to assign positive sentiment, negative sentiment, or no sentiment to expressions extracted from text.

Before you begin

To configure the parser to recognize sentiment, you must enable sentiment analysis. If you did not enable sentiment analysis when you created the collection, you can enable it by editing the collection settings. When you enable sentiment analysis, the system applies analysis that is provided with the product. If you want to customize or extend the built-in analysis, follow the steps in this procedure to add or block additional expressions.

About this task

When you customize sentiment analysis, you specify words and phrases to help the parser categorize whether a sentence conveys sentiment and, if so, what the sentiment is. Add expressions that are meaningful in your enterprise data that are not recognized by the built-in sentiment analysis feature. Support for sentiment expressions differs between languages and domains that use different jargons. For example, when you analyze sentiment about food you use other expressions than when you analyze sentiment about cars or customer service.

In the content analytics miner, analysts can assess sentiment as they explore facets and documents in the collection. For example, analysts can explore correlations, see positive and negative expressions in context, and see how positive and negative sentiment changes over time. If a sentence does not contain enough expressions to convey positive or negative sentiment, the sentence is classified as ambivalent.

To help analysts identify sentiment expressions in context, positive expressions in document text are underlined in green, negative expressions are underlined in red, and ambivalent expressions are underlined in gray. When exploring facet values in the Sentiment view, expressions with high correlation are indicated by darker shades of green and red. For example, negative sentiment expressions that have higher correlation are highlighted by darker shades of red.

Analysts can narrow results by adding selected facet values to the query and selecting a button to search only documents with positive sentiment or only documents with negative sentiment. This type of search does not mean that only expressions of the specified sentiment type are returned. If a document includes positive and negative expressions, that document is included in the search and both types of expressions in that document can be returned in the results.

The level of parsing that is done for sentiment analysis depends on the source language.

Deep parsing: English and Japanese

With deep parsing, the Sentiment facet generates three subfacets: Phrase, Expression, and Target. The Target subfacet lets analysts analyze the targets of sentiment expressions. For example, if a negative expression conveys "not like", you can analyze the objects that are evaluated by the expression, such as not liking a particular book or movie.

Consider this example sentence: "I don't like this car but I'm not sure how he feels". Deep parsing captures the following elements:

The phrase, I don't like this car (the text involved in analyzing sentiment)
The expression, NOT like (the expression that generates sentiment)
The target, car (the object evaluated by the expressed sentiment)

Unlike shallow parsing, where sentiment analysis is based on document count and correlation, deep parsing can store and analyze direct relationships within each document. This means that the expression "don't like" and the target "car" are not just in the same document or highly correlated, but that they occur in the same sentence and have a syntactic relationship. Deep parsing provides very precise results, but the coverage might not be as high; both the expression and the target must be identified in order to analyze them as a pair.

Because deep parsing cannot always identify both the expression and the target (for example, people sometimes omit words from sentences), the facet values that are visible when exploring targets might be fewer than the facet values that are displayed for the Sentiment facet. Deep parsing analyzes paired expressions and targets, whereas the Sentiment facet shows all expressions and targets independently. Buttons in the Sentiment view allow analysts to navigate the direct relationships between expressions and targets, and switch their focus between exploring sentiment facets, exploring sentiment expressions, and exploring the targets of the expressions.

Shallow parsing: Chinese, Czech, Dutch, Hebrew, Russian, Spanish, and Turkish

With shallow parsing, only Phrase and Expression subfacets are generated for the Sentiment facet. Analysts can explore correlations, see positive and negative expressions in context, and so on, but expressions and their targets are not analyzed as a pair.

Procedure

To customize how sentiment analysis is applied to a content analytics collection:

On the Collections view, expand the collection that you want to configure. In the Parse and Index pane, click Configure > Sentiment analysis.
On the Sentiment Analysis Configuration page, click the edit icon for the language to be applied when parsing content for sentiment.
Specify at least one expression. If you specify more than one expression of a given type, enter each expression on a separate line:
- Specify words and phrases that are to be recognized as positive expressions when content is parsed.
- Specify words and phrases that are to be recognized as negative expressions when content is parsed.
- Specify words and phrases that are to be blocked and not categorized as sentiment when content is parsed. For example, the phrase, "good night".
Optional: Repeat these steps to specify expressions for a different language. The Sentiment Analysis Configuration page shows the number of positive expressions, negative expressions, and blocked expressions that are defined for each language.
To apply changes, redeploy the analytic resources and rebuild the index. If a document cache is not enabled for the collection, deploy the analytic resources and then re-crawl or re-import documents.