Configuring text extraction for document classes

Enable persistent text extraction for any document class that you want to process with the Content Assistant service.

Configuring text extraction for document classes for CPE V5.5.12 and earlier

You can enable text extraction for any document class when you create a subscription for the text extraction event.

About this task

By default, a document class is not enabled for persistent text extraction. You need to create a document class subscription for persistent text extraction. After you create the subscription, text is automatically extracted from any new documents that are added to the object store, which belong to the subscribed document class. However, you need to run a custom sweep job to extract text for existing documents that were added to the object store before the subscription was created.

Procedure

  1. In the Administration Console for Content Platform Engine (ACCE), go to the object store where you want to enable text extraction.
  2. Go to Data Design > Classes > Document and select a document class.
    You can select a document class that subscribes to the text extraction event action.
  3. For the selected class, click Actions > New subscription.
  4. Enter a subscription name and click Next.
    For example, you can enter Document-Text Extraction as the subscription name.
  5. Select the subscription behavior Scope value as Applies to all objects of this class and click Next.
    Leave other properties as default values.
  6. Select Checkin Event and Update Event as events that trigger the subscription and click Next.
  7. Select Text Extraction Event Action as the event action for the subscription and click Next.
  8. Specify the values for additional options.
    1. Set the initial state as Enable the subscription.
    2. Select Include subclasses if you want to enable text extraction for all sub-classes of the parent class.
      Tip: If you want to enable persistent text extraction for the Document class along with subclasses, create the subscription for the Document base class and select Include sub-classes.
    3. Set Run synchronously as the subscription run mode.
    4. Leave other fields blank and click Next.
  9. Click Finish to create the subscription for text extraction.
  10. Repeat the process for all document classes where you want to enable persistent text extraction.

What to do next

If you want to configure text extraction for existing documents that were already added to the object store, see the topic Running a custom text extraction sweep for existing documents.

Configuring text extraction for document classes for CPE V5.6 and later

Enable persistent text extraction for any document class that you want to process with the Content Assistant.

About this task

By default, a document class is not enabled for persistent text extraction. You can enable persistent text extraction for a document class. After you enable the feature, text is automatically extracted from any new documents that are added to the object store, which belong to the selected document class. However, you need to run a custom sweep job to extract text for existing documents that were added to the object store before the feature was enabled.
Tip: When you enable persistent text extraction for a class, the feature is applied to the document class and all subclasses.
  • If you want to extract text from all incoming documents, enable the persistent text extraction feature for the base Document class.
  • You can also extract text for all document classes and subclasses within a given subtree of classes. For example, if you enable text extraction for the 'Claims' document class, it also enables text extraction for the three document subclasses 'Auto Claims', 'Property Claims', 'Life Claims'.

Procedure

  1. In the Administration Console for Content Platform Engine (ACCE), go to the object store where you want to enable text extraction.
  2. Go to Data Design > Classes and select Document or any document subclass for which you want to enable persistent text extraction.
  3. In the right-pane, go to the General tab and select the Persistent Text Extracts enabled option.
    Note: If the Persistent Text Extracts enabled option is already selected and appears greyed out, review the option settings for the parent class.
  4. Click Save to persist the changes to the document class.

What to do next

If you want to configure text extraction for existing documents that were already added to the object store, see the topic Running a custom text extraction sweep for existing documents.