Review your document samples and assign samples to the training and test
sets.
Procedure
- From the main page in the Designer, on Extraction model, click
Start.
In the Review samples tab you can view all of the samples for each
document type, which are divided into a set for training and a set for testing.
- Review your training and test sets.
You can click a document to view it. For each document type, it is a good practice to assign 70%
of your samples to the training set and 30% to the test set. The training set is used to train the
machine learning model that extracts the fields from your documents. The test set is used to
generate the model training results to see how accurate the machine learning model is.
You can select one or multiple samples in each column, and use the right and left arrows to move
these documents between the training set and the test set. Alternatively, click
Auto-generate 70/30 split to automatically sort your documents into each set
according to a 70/30 ratio. The percentage might not be exactly 70/30 in most cases, but as close to
that ratio as possible.
- When you have finished reviewing your samples, click
Next.
What to do next
In the next step, you can define the fields that you want to extract from your document types in
the Add fields tab.