Text classification algorithms

To train your text classification model, you need to select a proper text classification algorithm. Each algorithm behaves differently, which can lead to different results when applied to the same data.

See Planning for classifying text to learn about the concepts and requirements that are related to IBM RPA's text classification model.

The Bag-of-Words algorithm

Use the Bag-of-Words algorithm on the Machine Learning Model Builder to generate this model.

The Bag-of-Words algorithm uses word frequency. Each word relates to its number of occurrences in the training set. To classify the text, the algorithm tallies each word in the target text and infers to its model to see which frequency array is more relevant. It uses a vocabulary of known words that are provided by IBM RPA.

You can use this model when you need to classify your text based on the frequency of certain words that the text contains. For example, if you want to classify the text message "Win 50000 by participating in the lottery." using Classify Text command, you receive output data such as best choice and scores. See the following sample result:

Results for text classification using Bag-of-Words model:

Best choice: not spam
Best choice score: 0.75820382759259
Label and score: not spam, 0.75820382759259, , not spam
                 spam, 0.24179617240741, , spam

The N-Gram algorithm

Use the N-Gram algorithm on the Machine Learning Model Builder to generate this model.

The N-Gram algorithm behaves as the Bag-of-Words algorithm, but it uses sequences of 2 characters. This sequence is built from texts in the training set. To classify a text, the algorithm tallies each sequence of characters in the target text and infers to its model to see which frequency array is more relevant.

You can use this model when you need to classify your text based on about sequence of characters that appears in a text. Using the Classify Text command, you receive similar output as explained in the Bag-of-Words algorithm.

The Text Classifier algorithm

Use the Text Classifier algorithm on the Machine Learning Model Builder to generate this model.

The Text Classifier model represents a set of text documents that are arranged and categorized in tagged directories. Thus, each text document is associated with one tag. The text classifier algorithm combines different algorithms to train the model. It is a proprietary IBM RPA algorithm.
You can use this model to classify a text value according to a set of categories according to a specific subject.

Using the Classify Text command, you receive similar output as explained in the Bag-of-Words algorithm.

The functional algorithms

IBM RPA provides a Functional algorithm for each machine learning algorithm. Even though it works for all languages, it does a semantic treatment specifically for Portuguese (Brazil) language, by removing stopwords from the text, for example.