Text Mining Nugget: Category Model

A Text Mining category model nugget is created whenever you generate a category model from within the interactive workbench. This modeling nugget contains a set of categories, whose definition is made up of concepts, types, TLA patterns, and/or category rules. The nugget is used to categorize survey responses, blog entries, other Web feeds, and any other text data.

If you launch an interactive workbench session in the modeling node, you can explore the extraction results, refine the resources, fine-tune your categories before you generate category models. When you execute a stream containing a Text Mining model nugget, new fields are added to the data according to the build mode selected on the Model tab of the Text Mining modeling node prior to building the model. See the topic Category Model Nugget: Model Tab for more information.

If the model nugget was generated using translated documents, the scoring will be performed in the translated language. Similarly, if the model nugget was generated using English as the language, you can specify a translation language in the model nugget, since the documents will then be translated into English.

Text Mining model nuggets are placed in the model nugget palette (located on the Models tab in the upper right side of the IBM® SPSS® Modeler window) when they are generated.

Viewing Results

To see information about the model nugget, right-click the node in the model nuggets palette and choose Browse from the context menu (or Edit for nodes in a stream).

Adding Models to Streams

To add the model nugget to your stream, click the icon in the model nuggets palette and then click the stream canvas where you want to place the node. Or right-click the icon and choose Add to Stream from the context menu. Then connect your stream to the node, and you are ready to pass data to generate predictions.

Caution: If you want to use a scoring nugget to regenerate a modeling node that contains both the category model and the template used, we recommend that you create a TAP and use it in an interactive session, in place of the modeling node, before generating the scoring nugget.