(Optional) Running a custom text extraction job sweep for existing documents

You can create a document class subscription for persistent text extraction. After you create the subscription, text is automatically extracted from any new documents that are added to the object store, which belong to the subscribed document class. However, you need to run a custom text extraction job sweep for existing documents that were added to the object store before the subscription was created.

Before you begin

You need to have the following permissions for all existing documents for which you want to extract text:
  • READ
  • WRITE
  • LINK
  • VIEWCONTENT

About this task

Configure and run a custom job sweep for text extraction to extract text for existing document content.

Procedure

  1. Start the New Custom Job wizard in the Administration Console for Content Platform Engine (ACCE):
    1. In the domain navigation pane, select the object store.
    2. In the object store navigation pane, select the Sweep Management > Job Sweeps > Custom Jobs folder.
    3. Click New.
      You can configure the new custom job sweep to extract text for existing documents.
  2. Enter the name and description for the custom job sweep for text extraction.
    You can name the sweep as Custom Text Extraction Job.
  3. Specify the Custom Sweep subclass as Custom Sweep Job.
  4. Specify the Sweep Mode as Normal.
  5. Select the Enabled property to enable to custom sweep to run. Click Next.
  6. Specify the Target Class as Document.
  7. Specify the Filter Expression to filter objects of the selected target class type.
    For example, you can include a filter VersionStatus=1 for document classes that are Released.
  8. Optional: Select Enabled if you want to include subclasses and if you want to record failures.
  9. Specify the Sweep Action as Text Extraction Sweep Action.
  10. Optional: Specify the start and end dates for the custom job sweep.
  11. Complete the wizard.