Generate Directly

In the Model tab of the text mining modeling node, you can choose a build mode for your model nuggets. If you choose Generate directly, you can set the options in the node and then just execute your stream. The output is a concept model nugget, which was placed directly in the Models palette. Unlike the interactive workbench, no additional manipulation is needed from you at execution time besides the frequency settings defined for this option in the node.

Maximum number of concepts to include in model. This option, which applies only when you build a model automatically (non-interactive), indicates that you want to create a concept model. It also states that this model should contain no more than the specified number of concepts.

  • Check concepts based on highest frequency. Top number of concepts. Starting with the concept with the highest frequency, this is the number of concepts that will be checked. Here, frequency refers to the number of times a concept (and all its underlying terms) appears in the entire set of the documents/records. This number could be higher than the record count, since a concept can appear multiple times in a record.
  • Uncheck concepts that occur in too many records. Percentage of records. Unchecks concepts with a record count percentage higher than the number you specified. This option is useful for excluding concepts that occur frequently in your text or in every record but have no significance in your analysis.

Optimize for speed of scoring. Selected by default, this option ensures that the model created is compact and scores at high speed. Deselecting this option creates a much larger model which scores more slowly. However, the larger model ensures that scores displayed initially in the generated concept model are the same as those obtained when scoring the same text with the model nugget.