Training and testing the classification model

After you determine your document types, you train the machine learning model to recognize your document types.

About this task

Use the initial sample documents. The designer applies multiple models to your document types to determine which one matches your document types in the best way for your business need.

For best results, make sure that you use the same number of files to train or test each of your document types, including the pre-trained document types.

Procedure

To train your classification model:

  1. From the main project page in the Document Processing Designer, click Classification model.
  2. In the Confirm inputs tab, double check your document types and the samples that are categorized under each type.
    You can also upload additional samples to help train the model.
  3. Click Next.
  4. In the Review samples tab, you see your samples divided into a set for training and a set for testing. You can manually sort your samples into each set, or click Auto generate 70/30 split for a random categorization.
  5. When you are satisfied with your document types and samples, click Train.
    This step can take some time to complete.
  6. In the Review training results tab, review how well the model has classified your document types.
    For each sample, you see the following results:
    • Classified as - What document type the model determined this sample to be
    • Classification result - Whether the result corresponds to the expected document type
    • Confidence - The level of confidence that the model correctly assigned this sample

    You can use the Filter " " option to see results according to their classification result or confidence level.

  7. Click Next.
  8. Optional: Test your trained model.
    1. Drag your new samples to the pane, or click Upload to browse for and add your new sample documents.
      The model is tested automatically when you add documents.
    2. In the results table, examine how the model interacted with the test documents that you uploaded.
    3. For any document type that requires more testing, upload additional samples.
  9. When your model is sufficiently tested and an acceptable confidence threshold has been achieved, click Done.