Training a classifier

You can train a classifier by providing it with training data that it uses to determine how documents should be classified.

About this task

After you create and save a classifier, the classifier training page Overview tab is displayed. This shows the status of the latest model, if any have been previously created.

Procedure

  1. Note: Applies to version 12.0.2.2 and subsequent versions unless specifically overridden When a classifier is created from an already existing collection, request for starting training will be revoked if that collection is still indexing.
    Click New Model and proceed through the wizard. You can use the default values suggested.

    While the training is in progress, the controls on the page are inactive. When the training is complete, statistics about the model are displayed.

    • Model name: The new model name
    • Model description: The model description
    • How to create training resources
      • Divide dataset by ratios: When selected, Training set, Validation set, and Test set input is shown to input the ratios
      • Update training set: Keep test and validation documents from the base model, and update training documents with new data
    • Override model training configuration
      • Only used with federated models. With federated models, you can specify model training configurations for each individual model in the Federated model tab. You can override these settings by using the checkbox
    • Train a federated model
      • Select this option to train a federated model. When selected, a Field used to split training data dropdown is shown to select how to split training data into models. This field is only editable when Divide dataset by ratios is selected
  2. Click Deploy to deploy the classifier.
    1. Target classifier instance: Select which labeler instance will be updated with this model. Or select New classifier instance to create a new labeler instance
    2. Name: Name of the new classifier instance
    3. Description: Description of the new classifier instance
    4. Override probability threshold (federated models only): For federated models, you can specify a probability threshold value for each individual model in the Federated model tab. This option allows you to ignore the individual values and specify a single value to all the individual models.
    5. Value: The probability threshold value.
      Note: When you apply a model to a collection document, the classifier calculates a probability of the label first, and then adds the label to the document if the value is higher than the threshold.

Results

The deployed classifier instance can now be used.
Models list - Your new model is added to the Models table on the Overview tab. You can browse model details from the list.
  • You can Deploy/Undeploy, Cancel training, or Delete a model from here
  • For a normal model, the Model details dialog is open for the model detail
  • For federated models, a tab for the federated model is opened
Model details dialog - Once model training is completed (step 1 in the procedure), detailed model information is available in the Model details dialog.
  • Details tab
    • Model evaluation scores: F1, precision and recall of the model
    • Number of labels: number of labels that the model is expected to output (except for federated models)
  • Labels tab
    • Label evaluation scores: F1, precision and recall scores for each label and confusion matrix, showing how many of the validation data the trained classifier classifies as:
      • True positive - the data has the label and the classifier correctly predicts it
      • False positive - the data doesn't have the label but the classifier wrongly predicts it
      • False negative - validation data has the label but the classifier failed to the label)
      Note: You can switch to a Cards-style view from the icon at the right-top of the table.
  • Training tab
    • Training resource configuration: configuration used to train this particular model (except for federated models)
    • Training loss: A graph showing how the loss value moved during the training
  • Runtime tab
    • Deployed instance: Information about "labeler" enrichment when this model is deployed (except for individual models)
    • Parameters (except for federated models)
      • Probability threshold (read-only)
        Note: Change in this value should affect model evaluation scores, but the scores available in the UI show the initial scores (the values calculated with the initial threshold value).
Federated model tab - The Federated model tab displays a model information card with overview information about your federated model. Select the Show model details menu to open the Model information dialog for the federated model
  • Individual models table - Provides a list of individual models in the federated model
    • Name, F1 score, Recall, and Precision columns: Show the individual model information for those metrics
    • Probability threshold column: Shows the current value for the model. When the value has a (Pending) suffix, the value is used to update the model the next time you deploy the federated model. You can set the (Pending) value by selecting the row and choosing the Edit probability threshold button shown on top of the table
    • Model training configuration column: Shows how the model will be retrained the next time you train a federated model, based on the current federated model. The options are:
        • Update training set (default): Keeps the current test and validation set, and updates the training set with newly-added documents
        • Reinitialize training resource: Splits training data into training, validation, and test sets by specified ratios
        • Keep the current model: Keeps using the current model

        You can update the strategy by selecting the row and choosing the Edit model training configuration button shown on top of the table

What to do next

The process can be redone to use new training data.