Configuring HPSTRA
About this task
With the High-Performance Task Search Reference Architecture (HPSTRA) module in Task Engine you can store and search task business data on an Elasticsearch server. Task Engine uses a built-in REST client to communicate to Elasticsearch.
You enable and configure the HPSTRA module in My webMethods. You can use the predefined module settings that Task Engine provides, or apply custom settings to match an existing Elasticsearch configuration.
To configure the HPSTRA module
Procedure
- In My webMethods, navigate to Applications > Administration > Business > Task > HPSTRA Configuration.
-
By default, the HPSTRA module is disabled. Select
Enable Module to enable it.
Note: After you enable the module, you can add a preset configuration with default values for all configurable fields by clicking Create Default. All default values for the Task Engine HPSTRA module are described in the following tables. For the default values of configurations specific to Elasticsearch, see the Elasticsearch documentation.
-
On the
Basic tab, click
Add to add an Elasticsearch server, and
specify the following settings in the
Add new Elasticsearch sever dialog:
Field Description Host The host name of the server where Elasticsearch is installed. The default value is localhost
.Port Optional. The port number to connect to the Elasticsearch server. The default value is 9200
.Use SSL Select to enable secure communication to the Elasticsearch server. Disabled by default. -
Under
Authentication, select
Enable Authentication and type the name and
password of the user to connect to the Elasticsearch store.
The Authentication settings are required when the Elasticsearch server is configured to use basic authentication. For more information, see the Elasticsearch documentation.
-
To configure advanced HPSTRA settings, go to the
Advanced tab. Click
Edit to open the configuration dialog for each
section.
-
In the
Connection Pool section, specify the
following settings for connecting to the Elasticsearch server:
Field Description Connection Timeout The time in seconds to wait for establishing a connection to the Elasticsearch server. The default value is 60
.Request Timeout The time in seconds to wait to get a response from the Elasticsearch server, before failing the request. This setting applies to all requests that Task Engine sends to the Elasticsearch server, for example requests to persist or search task data. The default value is 60
.Maximum Connections The maximum number of connections to the Elasticsearch server in the connection pool. The default value is 50
. -
In the
Auto Discovery section, specify the
following Elasticsearch clustering settings:
Field Description Enabled Select to enable the automatic discovery of available Elasticsearch nodes for round-robin distribution of requests across the nodes in the cluster. This option is disabled by default. Polling Interval Time interval in seconds to poll and update the list of Elasticsearch servers when the automatic discovery of cluster nodes is enabled. The default value is 10
. -
In the
Persistence section, specify the following
data storing settings:
Field Description Consistency The number of available shards or replicas that Elasticsearch requires when indexing or deleting task data. The options are: - quorum - Default. Elasticsearch requires that the majority of shards or replicas are available before indexing or deleting task data. The majority consists of half the shards and replicas in the cluster (including the primary shard), plus one more shard.
- one - Elasticsearch requires that only the primary shard is available before persisting the task data.
- all - Elasticsearch requires that the primary shard and all replicas are available before persisting the task data.
For more information about consistency settings, see the Elasticsearch documentation.
Timeout The time in seconds to wait for the required number of shards or replicas to become available. Select Use Elasticsearch Defaults to use the default Elasticsearch setting. For more information about the default setting, see the Elasticsearch documentation.
-
In the
Create Index section, specify the
following index creation settings:
Field Description Append Cluster ID Enabled by default. The HPSTRA module creates an Elasticseach index for every HPSTRA-enabled task type. When this option is enabled, the HPSTRA module appends the ID of the IBM My webMethods Server cluster node to the name of the Elasticsearch index and creates an index name in the following format: sag_mws_te_taskdef_taskTypeID_clusterID
.Number of Shards The number of shards to include when creating the Elasticsearch index. Use this setting to improve the scalability of the Elasticsearch cluster. Select Use Elasticsearch Defaults to use the default Elasticsearch setting. For more information about Elasticsearch shards and default settings, see the Elasticsearch documentation.
Number of Replicas The number of replicas to include when creating the Elasticsearch index. Use this setting to improve the availability of the Elasticsearch cluster. Select Use Elasticsearch Defaults to use the default Elasticsearch setting. For more information about Elasticsearch replicas and default values, see the Elasticsearch documentation.
Note: The shard and replica settings apply only when creating a new Elasticsearch index using the HPSTRA configuration page. You cannot modify an existing Elasticsearch index through the HPSTRA configuration page. -
In the
Synchronization section, specify the
following guaranteed delivery settings:
Field Description Enabled Select to enable the synchronization between the nodes in a IBM My webMethods Server cluster for guaranteed delivery of tasks and task types to the Elasticsearch server. When synchronization is enabled and a cluster node fails to persist a task entry to Elasticsearch, the node stores the entry in the IBM My webMethods Server database. All IBM My webMethods Server nodes poll the database, retrieve failed task entries in batches, and retry persisting the entries in Elasticsearch until all entries are successfully persisted. If a node tries to store a task, but a newer version of the task is already persisted, the node discards the old version. You can configure the time interval for polling, and the number of events in the batches.
Polling Interval The interval in seconds at which IBM My webMethods Server nodes poll the server database for failed task entries, and retry sending the entries to Elasticsearch. The default value is 20
.Batch Size The number of failed task entries that IBM My webMethods Server nodes retrieve from the server database in a single read operation. A particular entry can be included only in one batch at a time. The default value is 100
.Lock Timeout The interval of time in minutes for which a IBM My webMethods Server node can lock a task entry for processing. After a lock expires, other IBM My webMethods Server nodes can pick up the entry for processing. The default value is 10
. -
In the
Search section, specify the following
distributed search settings:
Field Description Search Type The type of search to execute across the shards of an Elasticsearch index. Use these settings to control how Elasticsearch calculates the relevancy of the documents in the index to a specified search query. The options are: - Query then Fetch - Default. Elasticsearch calculates the term/document frequency for a search request locally for each shard and returns aggregated search results from relevant shards.
- Dfs, Query then Fetch - Elasticsearch calculates the term/document frequency for a search request across all shards in the index. This option increases the relevancy of search results, but includes a preliminary search phase which decreases the search speed.
For more information about search types and search term relevancy options, see the Elasticsearch documentation.
Timeout The interval of time in seconds for which to wait before failing the search request. Select Use Elasticsearch Defaults to use the default Elasticsearch setting. For more information about the default setting, see the Elasticsearch documentation.
-
In the
Connection Pool section, specify the
following settings for connecting to the Elasticsearch server:
- Click Save Configuration.