After you determine your document types, you train the machine learning model to
recognize your document types.
About this task
Use the initial sample documents. The designer applies multiple models to your document
types to determine which one matches your document types in the best way for your business
need.For best results, make sure that you use the same number of files to train or test each of
your document types, including the pre-trained document
types.
Procedure
To train your classification model:
- From the main project page in the Document Processing Designer, click
Classification model.
- In the Confirm inputs tab, double check your document types and
the samples that are categorized under each type.
You can also upload additional samples
to help train the model.
- Click Next.
- In the Review samples tab, you see your samples divided into a set
for training and a set for testing. You can manually sort your samples into each set, or click
Auto generate 70/30 split for a random categorization.
- When you are satisfied with your document types and samples, click
Train.
This step can take some time to complete.
- In the Review training results tab, review how well the model has
classified your document types.
For each sample, you see the following results:
- Classified as - What document type the model determined this sample to
be
- Classification result - Whether the result corresponds to the expected
document type
- Confidence - The level of confidence that the model correctly assigned
this sample
You can use the Filter
option to see results according to their classification result or confidence
level.
- Click Next.
- Optional: Test your trained model.
- Drag your new samples to the pane, or click Upload to browse
for and add your new sample documents.
The model is tested automatically when you add
documents.
- In the results table, examine how the model interacted with the test documents that
you uploaded.
- For any document type that requires more testing, upload additional
samples.
- When your model is sufficiently tested and an acceptable confidence threshold has been
achieved, click Done.