Configuring HPSTRA

About this task

With the High-Performance Task Search Reference Architecture (HPSTRA) module in Task Engine you can store and search task business data on an Elasticsearch server. Task Engine uses a built-in REST client to communicate to Elasticsearch.

You enable and configure the HPSTRA module in My webMethods. You can use the predefined module settings that Task Engine provides, or apply custom settings to match an existing Elasticsearch configuration.

To configure the HPSTRA module

Procedure

In My webMethods, navigate to Applications > Administration > Business > Task > HPSTRA Configuration.
By default, the HPSTRA module is disabled. Select Enable Module to enable it.

Note: After you enable the module, you can add a preset configuration with default values for all configurable fields by clicking Create Default. All default values for the Task Engine HPSTRA module are described in the following tables. For the default values of configurations specific to Elasticsearch, see the Elasticsearch documentation.

On the Basic tab, click Add to add an Elasticsearch server, and specify the following settings in the Add new Elasticsearch sever dialog:

Field	Description
Host	The host name of the server where Elasticsearch is installed. The default value is `localhost`.
Port	Optional. The port number to connect to the Elasticsearch server. The default value is `9200`.
Use SSL	Select to enable secure communication to the Elasticsearch server. Disabled by default.

Under Authentication, select Enable Authentication and type the name and password of the user to connect to the Elasticsearch store.
The Authentication settings are required when the Elasticsearch server is configured to use basic authentication. For more information, see the Elasticsearch documentation.

To configure advanced HPSTRA settings, go to the Advanced tab. Click Edit to open the configuration dialog for each section.

In the Connection Pool section, specify the following settings for connecting to the Elasticsearch server:

Field	Description
Connection Timeout	The time in seconds to wait for establishing a connection to the Elasticsearch server. The default value is `60`.
Request Timeout	The time in seconds to wait to get a response from the Elasticsearch server, before failing the request. This setting applies to all requests that Task Engine sends to the Elasticsearch server, for example requests to persist or search task data. The default value is `60`.
Maximum Connections	The maximum number of connections to the Elasticsearch server in the connection pool. The default value is `50`.

In the Auto Discovery section, specify the following Elasticsearch clustering settings:

Field	Description
Enabled	Select to enable the automatic discovery of available Elasticsearch nodes for round-robin distribution of requests across the nodes in the cluster. This option is disabled by default.
Polling Interval	Time interval in seconds to poll and update the list of Elasticsearch servers when the automatic discovery of cluster nodes is enabled. The default value is `10`.

In the Persistence section, specify the following data storing settings:

Field	Description
Consistency	The number of available shards or replicas that Elasticsearch requires when indexing or deleting task data. The options are: quorum - Default. Elasticsearch requires that the majority of shards or replicas are available before indexing or deleting task data. The majority consists of half the shards and replicas in the cluster (including the primary shard), plus one more shard. one - Elasticsearch requires that only the primary shard is available before persisting the task data. all - Elasticsearch requires that the primary shard and all replicas are available before persisting the task data. For more information about consistency settings, see the Elasticsearch documentation.
Timeout	The time in seconds to wait for the required number of shards or replicas to become available. Select Use Elasticsearch Defaults to use the default Elasticsearch setting. For more information about the default setting, see the Elasticsearch documentation.

Field

Description

Consistency

The number of available shards or replicas that Elasticsearch requires when indexing or deleting task data. The options are:

quorum - Default. Elasticsearch requires that the majority of shards or replicas are available before indexing or deleting task data. The majority consists of half the shards and replicas in the cluster (including the primary shard), plus one more shard.
one - Elasticsearch requires that only the primary shard is available before persisting the task data.
all - Elasticsearch requires that the primary shard and all replicas are available before persisting the task data.

For more information about consistency settings, see the Elasticsearch documentation.

Timeout

The time in seconds to wait for the required number of shards or replicas to become available. Select Use Elasticsearch Defaults to use the default Elasticsearch setting.

For more information about the default setting, see the Elasticsearch documentation.

In the Create Index section, specify the following index creation settings:

Field	Description
Append Cluster ID	Enabled by default. The HPSTRA module creates an Elasticseach index for every HPSTRA-enabled task type. When this option is enabled, the HPSTRA module appends the ID of the IBM My webMethods Server cluster node to the name of the Elasticsearch index and creates an index name in the following format: `sag_mws_te_taskdef_taskTypeID_clusterID`.
Number of Shards	The number of shards to include when creating the Elasticsearch index. Use this setting to improve the scalability of the Elasticsearch cluster. Select Use Elasticsearch Defaults to use the default Elasticsearch setting. For more information about Elasticsearch shards and default settings, see the Elasticsearch documentation.
Number of Replicas	The number of replicas to include when creating the Elasticsearch index. Use this setting to improve the availability of the Elasticsearch cluster. Select Use Elasticsearch Defaults to use the default Elasticsearch setting. For more information about Elasticsearch replicas and default values, see the Elasticsearch documentation.

Note: The shard and replica settings apply only when creating a new Elasticsearch index using the HPSTRA configuration page. You cannot modify an existing Elasticsearch index through the HPSTRA configuration page.

In the Synchronization section, specify the following guaranteed delivery settings:

Field	Description
Enabled	Select to enable the synchronization between the nodes in a IBM My webMethods Server cluster for guaranteed delivery of tasks and task types to the Elasticsearch server. When synchronization is enabled and a cluster node fails to persist a task entry to Elasticsearch, the node stores the entry in the IBM My webMethods Server database. All IBM My webMethods Server nodes poll the database, retrieve failed task entries in batches, and retry persisting the entries in Elasticsearch until all entries are successfully persisted. If a node tries to store a task, but a newer version of the task is already persisted, the node discards the old version. You can configure the time interval for polling, and the number of events in the batches.
Polling Interval	The interval in seconds at which IBM My webMethods Server nodes poll the server database for failed task entries, and retry sending the entries to Elasticsearch. The default value is `20`.
Batch Size	The number of failed task entries that IBM My webMethods Server nodes retrieve from the server database in a single read operation. A particular entry can be included only in one batch at a time. The default value is `100`.
Lock Timeout	The interval of time in minutes for which a IBM My webMethods Server node can lock a task entry for processing. After a lock expires, other IBM My webMethods Server nodes can pick up the entry for processing. The default value is `10`.

In the Search section, specify the following distributed search settings:

Field	Description
Search Type	The type of search to execute across the shards of an Elasticsearch index. Use these settings to control how Elasticsearch calculates the relevancy of the documents in the index to a specified search query. The options are: Query then Fetch - Default. Elasticsearch calculates the term/document frequency for a search request locally for each shard and returns aggregated search results from relevant shards. Dfs, Query then Fetch - Elasticsearch calculates the term/document frequency for a search request across all shards in the index. This option increases the relevancy of search results, but includes a preliminary search phase which decreases the search speed. For more information about search types and search term relevancy options, see the Elasticsearch documentation.
Timeout	The interval of time in seconds for which to wait before failing the search request. Select Use Elasticsearch Defaults to use the default Elasticsearch setting. For more information about the default setting, see the Elasticsearch documentation.

Field

Description

Search Type

The type of search to execute across the shards of an Elasticsearch index. Use these settings to control how Elasticsearch calculates the relevancy of the documents in the index to a specified search query. The options are:

Query then Fetch - Default. Elasticsearch calculates the term/document frequency for a search request locally for each shard and returns aggregated search results from relevant shards.
Dfs, Query then Fetch - Elasticsearch calculates the term/document frequency for a search request across all shards in the index. This option increases the relevancy of search results, but includes a preliminary search phase which decreases the search speed.

For more information about search types and search term relevancy options, see the Elasticsearch documentation.

Timeout

The interval of time in seconds for which to wait before failing the search request. Select Use Elasticsearch Defaults to use the default Elasticsearch setting.

For more information about the default setting, see the Elasticsearch documentation.

Click Save Configuration.