Reviewing document samples

Review your document samples and assign samples to the training and test sets.

Procedure

  1. From the main page in the Designer, on Extraction model, click Start.

    In the Review samples tab you can view all of the samples for each document type, which are divided into a set for training and a set for testing.

  2. Review your training and test sets.

    You can click a document to view it. For each document type, it is a good practice to assign 70% of your samples to the training set and 30% to the test set. The training set is used to train the machine learning model that extracts the fields from your documents. The test set is used to generate the model training results to see how accurate the machine learning model is.

    You can select one or multiple samples in each column, and use the right and left arrows to move these documents between the training set and the test set. Alternatively, click Auto-generate 70/30 split to automatically sort your documents into each set according to a 70/30 ratio. The percentage might not be exactly 70/30 in most cases, but as close to that ratio as possible.

  3. When you have finished reviewing your samples, click Next.

What to do next

In the next step, you can define the fields that you want to extract from your document types in the Add fields tab.