Indexing object text with Elasticsearch

Elasticsearch or OpenSearch provides full-text search capabilities for Content Platform Engine object stores. Configuration involves adding a cluster, creating index areas, enabling classes for indexing, and selecting language analyzers.

Overview

Elasticsearch indexing enables users to search for documents based on their text content and metadata. Configuration involves setting up the infrastructure and policies for indexing content, processing text, and executing search queries.

Configuration is performed at the domain and object store levels using the Administration Console for Content Platform Engine. After initial configuration, Elasticsearch automatically indexes new and modified documents in CBR-enabled classes.

Key configuration areas

Configuring Elasticsearch indexing involves these main areas:

Elasticsearch cluster: Add an Elasticsearch cluster to the Content Platform Engine domain before you can create index areas. The cluster connection defines how Content Platform Engine communicates with Elasticsearch, including connection endpoints, authentication, and SSL settings. A domain can support only one Elasticsearch cluster.
Index areas: Index areas define the connection between your object store and the Elasticsearch cluster. Create one Elasticsearch index area for each object store that requires indexing. The index area contains configuration settings for shards, replicas, and the maximum results window. Each object store supports a single Elasticsearch index area.
CBR-enabled classes: Content-based retrieval (CBR) must be enabled for each document class that you want to index. You can enable CBR for individual classes or for all classes in the object store. Only documents in CBR-enabled classes are added to the full-text index.
Language analyzers: Select one or more language analyzers that reflect the languages of your documents. Language analyzer selection affects how text is analyzed and indexed, improving search accuracy and relevance. Elasticsearch supports a wide range of language analyzers and custom analysis configurations.
Indexing queue sweep: The Elasticsearch Indexing Queue Sweep processes documents and adds them to the Elasticsearch index. The sweep runs on a schedule and processes documents in batches. You can configure the sweep schedule, batch size, and other parameters to optimize indexing performance for your workload.
Index maintenance: After initial configuration, you maintain indexes by monitoring indexing status, managing index settings, performing reindexing operations when needed, and tuning performance parameters. Regular monitoring helps ensure optimal search performance.

Configuration workflow

Follow this general workflow when you configure Elasticsearch indexing:

Add an Elasticsearch cluster to the Content Platform Engine domain
Configure Content Platform Engine to access the cluster (connection settings, authentication, SSL)
Create an Elasticsearch index area for your object store
Select one or more language analyzers for text processing
Enable CBR for the document classes that you want to index
Configure the Elasticsearch Indexing Queue Sweep schedule and parameters
Enable the sweep to begin processing documents
Monitor indexing status and tune performance as needed

The topics in the section provide detailed procedures for each configuration area.

Migration considerations

If you are migrating from Content Search Services to Elasticsearch, you can configure dual mode indexing to index content to both engines simultaneously during the migration period. Dual mode indexing validates Elasticsearch functionality without disrupting Content Search Services search capabilities.