The Extract PDF Text tool
The Extract PDF Text tool uses Optical character recognition (OCR) capabilities to extract text from scanned documents, for example, and to convert this text into machine-readable data.

No. | Topic | Description |
---|---|---|
1 | File | Click Open to select a PDF file. It adds the Open PDF file (pdfOpen ) command to the current script. |
2 | Paging | Pagination of a PDF file. |
3 | Anchor | Options to configure the anchor. |
4 | Target | Options to configure the target text. |
5 | Actions | Actions to run on the file based on the previous configurations. |
6 | PDF page viewer | This pane displays the PDF file contents. |
Paging
Use the controls in the Paging group to move through the PDF pages. The following list describes the controls available:
-
Icons
Click the ◀ icon to return to the previous page, and click the ▶ icon to advance to the next page. -
Page selection
Select a page from the PDF file in the page field.
Anchor
Use the controls in the Anchor group to configure the anchor. An anchor text is used as a reference location for the target text. The following list describes the controls available:
-
OCR provider
Select the OCR provider used to recognize the anchor text.Note:ABBYY® has a different behavior when compared to Google Cloud Vision™ and Google Tesseract™, which might lead to different results on the tool. -
Language
Select the language used by the OCR provider to recognize the anchor text. -
Operator
Select a comparison option to search for the anchor. -
Anchor type
Configure whether the OCR provider is looking for a word or phrase. -
Text box
Enter the anchor text. -
High contrast icon
Click this icon to enable or disable high contrast in the PDF file. -
View text icon
Click this icon to preview the recognized anchor text. -
Fetch anchor icon
Click this icon to fetch and highlight the recognized anchor text.
Target
Use the controls in the Target group to configure the target text. The following list describes the controls available:
-
OCR provider
Select the OCR provider used to recognize the target text. -
High contrast icon
Click this icon to enable or disable high contrast in the PDF file. -
View text icon
Click this icon to preview the recognized target text.
Actions
Use the controls in the Actions group to act on the file based on the previous configurations. The following list describes the controls available:
-
Clear
Clear the settings fields. -
Create command
This tool adds the Get PDF text by OCR (extractPdfText
) command to the script based on anchor, target text, and position that you entered in this tool.