IBM Streams 4.2.1

Operator SentimentExtractor

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.ibm.streams.text/op$com.ibm.streams.text.analytics$SentimentExtractor.svg
This operator analyzes a portion of text and determines what kind of sentiment the text expresses. For each input tuple, the operator will produce a tuple containing attributes of type SentimentScore. The score indicates whether or not any sentiment was detected, and if so, whether the sentiment expressed was positive, negative, or neutral. This operator uses dictionaries to determine which words express sentiment. The dictionaries are lists of negative and positive verbs and nouns that are used to determine the type of sentiment expressed in the text. The dictionaryPath parameter is used to specify the location of these files. The operator's ability to detect sentiment depends in part on the content of the dictionaries, and so to improve performance, add additional words to the dictionaries. The initial set of dictionaries to use is found in STREAMS_INSTALL/toolkits/com.ibm.streams.text/impl/lib/dictionaries/sentiment. The following snippet demonstrates how to use this operator:
stream<SentimentScore, InputType> ExtractedSentimentStream = SentimentExtractor(Data)	{
  param
    inputName : "line" ; //the name of the attribute in the input stream that has the text to analyze
    dictionaryPath:  "etc/dicts";  //we have copied the default set of dictionaries to the etc/dicts folder of this application
}

Behavior in a consistent region

  • The operator can participate in a consistent region if the optional resources port is not used. Otherwise, if the resources port is used, changes to external resources received through this port not be persisted upon reset or restart.
  • The operator cannot be at the start of a consistent region.
  • The operator is stateless and does not preserve any states during checkpoint and reset.
  • If the streams processing application fails, the operator re-reads the input files. If the files have changed between the initial start and the restart, the new files are used when the application restarts.

NOTE: The SentimentExtractor operator is not supported on IBM Power Systems.

Summary

Ports
This operator has 1 or more input ports and 1 output port.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 2 parameters.

Optional: dictionaryPath, inputAttribute

Metrics
This operator does not report any metrics.

Properties

Implementation
Java

Input Ports

Ports (0)

Input Stream must contain an attribute of type rstring that has the text on which sentiment analysis is to be performed.

Properties
Ports (1...)

This operator has an optional input port called the resources port that is identical in behaviour to the resources port of the TextExtract operator. See the TextExtract operator documentation for more information.

Properties

Output Ports

Assignments
Java operators do not support output assignments.
Ports (0)

For each incoming tuple, this port will produce an output of type SentimentScore. This tuple describes the sentiment detected in the input, if any sentiment was detected.

Properties

Parameters

This operator supports 2 parameters.

Optional: dictionaryPath, inputAttribute

dictionaryPath
This is the path to a folder containing dictionaries to use to configure the operator. If the dictionaryPath parameter is ommitted, it is expected that the files to use will be in found in STREAMS_INSTALL/toolkits/com.ibm.streams.text/impl/lib/dictionaries/sentiment. Otherwise, copy those files and use their contents as a starting point, and then set the dictionaryPath parameter to the path to the folder containing your dictionaries.
    The following is a list of the required dictionaries and their purpose:
  • * SentimentCustom_Speaker.dict : Terms that indicate who is expressing the sentiment, such as the pronouns 'I' and 'we'
  • * SentimentCustom_SpeakerDoesTargetNegative_V.dict: Verbs in base form that express the speaker's negative feeling about an object, such as 'hate' or 'dislike'. Phrasal verbs are not supported.
  • * SentimentCustom_SpeakerDoesTargetPositive_V.dict: Verbs in base form that express the speaker's positive feeling about an object, such as 'love', 'like' or 'adore'. Phrasal verbs, e.g., 'care for', are not supported.
  • * SentimentCustom_TargetDoesNegative_V.dict: Verbs in base form that describe the negative action of the sentence subject, such as 'insult' or 'fail'.
  • * SentimentCustom_TargetDoesPositive_V.dict: Verbs in base form that describe the positive action of the sentence subject, such as 'complete' or 'compliment'.
  • * SentimentCustom_TargetIsObjectNegative_O.dict: Adjectives that negatively describe the person, place, or thing, such as 'rude' or 'clumsy'.
  • * SentimentCustom_TargetIsObjectPositive_O.dict: Adjectives that positively describe the person, place, or thing, such as 'considerate' or 'tidy'. This operator requires that the folder pointed to by the dictionaryPath parameter contain files with the above names. The path is expected to be relative to the application directory, or absolute.
Properties
inputAttribute

This is a string that contains the name of the attribute in the input stream whose text is to be analyzed. If there is only one attribute on the input tuple, this parameter is not required. The specified attribute must have a data type of ustring or rstring.

Properties

Libraries

Java operator code
Library Path: ../../impl/java/bin, ../../impl/lib/TextAnalyticsForStreams.jar
Operator class library
Library Path: ../../impl/java/bin, ../../lib/TextAnalytics/lib/text-analytics/*, ../../lib/TextAnalytics/lib/ant-1.7.1/*, ../../lib/TextAnalytics/lib/commons-codec-1.4/*, ../../lib/TextAnalytics/lib/htmlparser-2.1/*, ../../lib/TextAnalytics/lib/opencsv-2.3/*, ../../lib/TextAnalytics/lib/uima-2.3.0/*, ../../lib/TextAnalytics/lib/*, ../../lib/TextAnalytics/lib/multilingual/*, ../../lib/TextAnalytics/lib/wrappers/*, ../../lib/TextAnalytics/lib/commons-pool2-2.1/*, ../../lib/TextAnalytics/action-api/*