Extract Text

This task extracts the text of email attachments and prepares it for full-text indexing.

If text extraction fails, the Extract Text task writes an error notification to the text-search indexing document. Refer to the related topic for a list of possible error strings.

Task summary

Table 1. Extract Text task summary
Characteristic Value
Task name Extract Text
Main purpose Extracts the text of files or email attachments and prepares it for full-text indexing
Usable with which source connectors? Email Connector, SMTP Connector
Usable with which target connectors? IBM® FileNet® P8 Connector
When needed? Required in email archiving task routes when processing attachments that must be full-text indexed
Placement in task route Can appear only after the EC Extract Attachments task in email archiving task routes
Produces which metadata? Task Status, Text Extraction
Configuration options

File Extension Filter

Define a filter for file extensions. When you define an exclude filter, the Extract Text task will skip files with the listed extensions for text indexing. If the list is empty, the task will render all files that are passed in. When you define an include filter, the Extract Text task will process files with the listed extensions for text indexing. If the list is empty, the task will render none of the files that are passed in.

For all files that are skipped, the task writes the string "IcmFceWarning:IcmConfigFilteringFile" to the icc_attachment and icc_attachment_text fields of the text-search indexing document, so that you can search for all documents that contain attachments that were not indexed.

To restore the default list of extensions, click Load Default Extension List. To empty the list, click Clear Extension List.