Creating a text analysis experiment
Use AutoAI's text analysis feature to perform text analysis of your experiments. For example, perform basic sentiment analysis to predict an outcome based on text comments.
Text analysis overview
When you create an experiment that uses the text analysis feature, the AutoAI process uses the word2vec
algorithm to transform the text into vectors, then compares the vectors to establish the impact on the prediction column.
The word2vec
algorithm takes a corpus of text as input and outputs a set of vectors. By turning text into a numerical representation, it can detect and compare similar words. When trained with enough data, word2vec
can make accurate predictions about a word's meaning or relationship to other words. The predictions can be used to analyze text and guess at the meaning in sentiment analysis applications.
During the feature engineering phase of the experiment training, 20 features are generated for the text column, by using the word2vec
algorithm. Auto-detection of text features is based on analyzing the number of unique values in
a column and the number of tokens in a record (minimum number = 3). If the number of unique values is less than number of all values divided by 5, the column is not treated as text.
When the experiment completes, you can review the feature engineering results from the pipeline details page. You can also save a pipeline as a notebook, where you can review the transformations and see a visualization of the transformations.
Example: Analyzing customer comments
In this example, the comments for a fictional car rental company are used to train a model that predicts a satisfaction rating when a new comment is entered.
Watch this short video to see this example and then read further details about the text feature below the video.
This video provides a visual method to learn the concepts and tasks in this documentation.
Given a data set that contains a column of review comments for the rental experience (Customer_service), and a column that contains a binary satisfaction rating (Satisfaction) where 0 represents a negative comment and 1 represents a positive comment, the experiment is trained to predict a satisfaction rating when new feedback is entered.
Training a text transformation experiment
After you load the data set and specify the prediction column (Satisfaction), the Experiment settings selects the Use text feature engineering option.
Note some of the details for tuning your text analysis experiment:
- You can accept the default selection of automatically selecting the text columns or you can exercise more control by manually specifying the columns for text feature engineering.
- As the experiment runs, a default of 20 features is generated for the text column by using the
word2vec
algorithm. You can edit that value to increase or decrease the number of features. The more vectors that you generate the more accurate your model are, but the longer training takess. - The remainder of the options applies to all types of experiments so you can fine-tune how to handle the final training data.
Run the experiment to view the transformations in progress.
Select the name of a pipeline, then click Feature summary to review the text transformations.
You can also save the experiment pipeline as a notebook and review the transformations as a visualization.
Deploying and scoring a text transformation model
When you score this model, enter new comments to get a prediction with a confidence score for whether the comment results in a positive or negative satisfaction rating.
For example, entering the comment "It took us almost three hours to get a car. It was absurd" predicts a satisfaction rating of 0 with a confidence score of 95%.
Next steps
Building a time series forecast experiment
Parent topic: Building an AutoAI model