IBM SPSS Modeler Text Analytics nodes

Along with the many standard nodes delivered with IBM® SPSS® Modeler, you can also work with text mining nodes to incorporate the power of text analysis into your streams. IBM SPSS Modeler Text Analytics offers you several text mining nodes to do just that. These nodes are stored in the IBM SPSS Modeler Text Analytics tab of the node palette.

The following nodes are included:

  • The File List source node generates a list of document names as input to the text mining process. This is useful when the text resides in external documents rather than in a database or other structured file. The node outputs a single field with one record for each document or folder listed, which can be selected as input in a subsequent Text Mining node. See the topic File List node for more information.
  • The Web Feed source node makes it possible to read in text from Web feeds, such as blogs or news feeds in RSS or HTML formats, and use this data in the text mining process. The node outputs one or more fields for each record found in the feeds, which can be selected as input in a subsequent Text Mining node. See the topic Web Feed node for more information.
  • The Language Identifier node is a process node that scans source text to determine which human language it is written in and then marks that up in a new field. Primarily designed to be used with large amounts of data, this node is particularly useful when you have more than one language in your data sources and want to process just one language. See the topic Language Node for more information.
  • The Text Mining node uses linguistic methods to extract key concepts from the text, allows you to create categories with these concepts and other data, and offers the ability to identify relationships and associations between concepts based on known patterns (called text link analysis). The node can be used to explore the text data contents or to produce either a concept model or category model. The concepts and categories can be combined with existing structured data, such as demographics, and applied to modeling. See the topic Text Mining modeling node for more information.
  • The Text Link Analysis node extracts concepts and also identifies relationships between concepts based on known patterns within the text. Pattern extraction can be used to discover relationships between your concepts, as well as any opinions or qualifiers attached to these concepts. The Text Link Analysis node offers a more direct way to identify and extract patterns from your text and then add the pattern results to the dataset in the stream. But you can also perform TLA using an interactive workbench session in the Text Mining modeling node. See the topic Text Link Analysis node for more information.
  • When mining text from external documents, the Text Mining Output node can be used to generate an HTML page that contains links to the documents from which concepts were extracted. See the topic File Viewer node for more information.