Bulk processing

You can perform bulk operations on search and sweep results.

A sweep is an instance of a background service that you configure to process objects in a database table or some other set of items. The purpose of a sweep is to do something with the items that match the filter conditions for the sweep. For example, you might create a sweep to delete selected objects or to create thumbnails for selected objects. The passes that a sweep makes through a set of items are called iterations.

In addition, you can perform bulk operations on search results. Bulk operations can be scripted or selected from a set of predefined operations such as delete, cancel checkout, file, unfile, and change security.

Bulk processing also supports enhanced extraction for processing large sets of documents in the background. Enhanced extraction extends standard text extraction with specialized handlers that can extract text and metadata from complex document types, such as PDF forms, images, and structured data. When enhanced extraction is configured for a document class, the sweep framework can process queued documents, apply the appropriate extraction handler, and store the extracted content for uses such as indexing, content-based retrieval, summarization, vector indexing, and generative AI inferences.

For vector indexing scenarios, the extracted text and annotations can also prepare content for downstream processing by vector search technologies such as Elasticsearch or OpenSearch. This support helps large document sets become available for semantic retrieval and other AI-driven experiences that depend on vectorized content representations.

This capability combines the scalability of sweeps with the extensibility of enhanced extraction. As a result, you can use bulk processing not only for administrative actions on large result sets, but also for high-volume content enrichment scenarios that require specialized extraction and annotation generation.