Operational data store
IBM® Automation foundation includes an operational data store based on Apache 2.0 OSS Elasticsearch with the addition of a custom security plug-in to enable Basic Authentication and a proxy sidecar for TLS capability.
Warning: The disks are not monitored for usage and it is the user’s responsibility to plan for the appropriate disk usage and monitor the disks to ensure they do not fill. If the disks get full, it is likely that it results in data loss.
Jump to
- Running in production
- Node groups
- Security
- Storage
- Additional Allowed APIs
- Backup and restore
- Monitoring
- Audit logging
- JVM options
- Elasticsearch on multi-zone clusters
- Support for the latest Elasticsearch version (7.15.1)
Running in production
When you run Elasticsearch in production, consider these settings. For more information, see Important System Configuration .
- Ensure that sufficient virtual memory exists on the nodes. To configure virtual memory, use the node tuning Operator in OpenShift. Set the value of
vm.max_map_count
to at least262144
. - For more information, see Using the Node Tuning Operator . - Ensure that the Elasticsearch container JVM is configured with sufficient heap. For more information, see JVM Options.
- For production environments, persistent storage is required to ensure resilient operation. Cluster scaling is only supported with persistence enabled by using compatible block storage and a minimum of three Elasticsearch nodes are recommended for resilience and availability. Single node deployments should be reserved for proof of concept deployments only.
Node groups
Elasticsearch is configured into node groups inside the Elasticsearch
CR:
spec:
elasticsearch:
nodegroupspecs:
- name: master
replicas: 3
storage: {}
config:
- key: node.master
value: "true"
- key: node.data
value: "false"
- key: node.ingest
value: "false"
- name: data
replicas: 3
storage: {}
config:
- key: node.master
value: "false"
- key: node.data
value: "true"
- key: node.ingest
value: "false"
These node groups allow the user to define the configuration for different sets of nodes within the Elasticsearch cluster.
The names of these node groups are significant as it defines the names of Kubernetes resources, which are created for installing the Elasticsearch cluster. When a node group is removed, all the resources, except for the PersistentVolumeClaims
(used for persistent storage under those nodes), are removed. This means that if you change the name of a group, it leads to the removal of the previous group and the creation of a new group. Therefore, you must consider the node groups carefully
before installing or adjusting the configuration.
Security
The Elasticsearch cluster included in Automation foundation uses a custom security plug-in that secures all API interactions with Basic Authentication. A CartridgeRequirements
Custom Resource (CR) has its status that is updated regarding
a secret that contains the username and generated password that is assigned to the CartridgeRequirements
request.
A superuser credential is created for administration of the Elasticsearch instance that can access the security APIs.
- A warning message in the status section of the Elasticsearch CR prompts the administrator who creates the Elasticsearch instance to update the generated password.
- To update the password, the administrator needs to update the corresponding secret's password value and remove the annotation.
All communication with the client port of Elasticsearch is encrypted with TLS. The default TLS configuration can be changed by using one of the following methods:
- Configure TLS to use selfsigned generated certificates with
spec.elasticsearch.tls: {}
. - Specify a cert-manager
Issuer
with a reference to the certificate authority (CA)'s public certificate that is used to verify the provided issuer as shown in the following example:
spec:
elasticsearch:
tls:
issuer:
name: my-issuer
caSecret:
secretName: my-ca-secret
key: ca.crt
You can customize Elasticsearch user credentials. Currently, the behavior is as following.
-
If the user wants a particular combination of user name and password to be used for the Elasticsearch instance, then they must create a secret with name
<cloudpakName>-es-auth
with key-value pairs as follows:{"username": "(your-username)", "password": "(your-password)"}
- After this when the user applies a CartridgeRequirements CR, the reconcile picks the credentials that are provided in the secret to create the Elasticsearch user account.
- If the user does not provide any secret as stated previously, then the Elasticsearch user credentials are auto-generated, username is the Cloud Pak name and password is an auto generated alphanumeric string.
- If the user wants to update the created credentials, then they can modify the password in the secret
<cloudpakName>-es-auth
. It takes around 30 secs for the updated password to reflect in the Elasticsearch instance. Note: After the credentials are created, the username must not be changed. Only the password can be modified.
Updating and preconfiguring the superuser password
Follow the process to change the generated password for the elasticsearch-admin
superuser:
-
Ensure you are logged in to the OpenShift cluster by using the
oc login
. -
Update and export the following environment variables in your command line.
export ELASTIC_INSTANCE_NAME="elasticsearch-sample" # <-- Change this variable to your CR name. export NAMESPACE="acme-abp" # <-- Change this variable to the namespace that you want. export NEW_PASSWORD="your-new-password" # <-- Change this variable to the new password that you want.
-
Run the following code block in your command line to update the superuser password:
export SECRET_NAME=$(oc get elasticsearch "$ELASTIC_INSTANCE_NAME" -n "$NAMESPACE" -o jsonpath='{.status.adminAuthSecretName}') oc patch secret $SECRET_NAME -n $NAMESPACE -p '{"data": {"password": "'$(echo -n "$NEW_PASSWORD" | base64)'"}}' oc annotate secret $SECRET_NAME -n $NAMESPACE elastic.automation.ibm.com/generated-default-credentials- unset NEW_PASSWORD
Alternatively, to preconfigure the superuser password, a secret can be created with the appropriate naming and labeling conventions in advance. Creating a secret in advance can be useful in a disaster recovery scenario where dependent services have an existing set of credentials.
-
Ensure you are logged in to the OpenShift cluster by using the
oc login
. -
Update and export the following environment variables in your command line.
export ELASTIC_INSTANCE_NAME="elasticsearch-sample" # <-- Change this variable to your CR name. export NAMESPACE="acme-abp" # <-- Change this variable to the namespace that you want. export NEW_PASSWORD="your-new-password" # <-- Change this variable to the new password that you want.
-
Run the following code block in your command line to preconfigure the superuser password:
cat <<EOF | oc apply -f - kind: Secret apiVersion: v1 metadata: name: ${ELASTIC_INSTANCE_NAME}-elasticsearch-es-default-user namespace: $NAMESPACE labels: app.kubernetes.io/component: es app.kubernetes.io/instance: $ELASTIC_INSTANCE_NAME app.kubernetes.io/name: elasticsearch elastic.automation.ibm.com/cr-name: $ELASTIC_INSTANCE_NAME data: password: $(echo -n "$NEW_PASSWORD" | base64 -w0) username: ZWxhc3RpY3NlYXJjaC1hZG1pbg== type: kubernetes.io/basic-auth EOF unset NEW_PASSWORD
Note: This precreated credentials secret is not owned by the Elasticsearch instance and as such is not tethered to the Elasticsearch instance lifecycle. Users are responsible for managing the lifecycle of this secret.
Storage
By default the Automation foundation provided Elasticsearch cluster does not have persistence that is configured. Cluster administrators are required to provide either a StorageClass
supporting dynamic provisioning or pre-created PersistentVolumes
before configuring Elasticsearch. There are multiple PersistentVolume storage classes available, depending on your cluster setup. For more information, see Understanding Persistent Storage.
Each node group requires independent storage configuration. This approach enables different tiers of storage capability to be provided to each node group depending on requirements that is, fast storage for the data nodes and slower storage for the controller nodes.
The controller
and data
nodes require storage that can be used with a ReadWriteOnce
(RWO
) access mode. This mode specifies that the volume can be mounted as read and write by a single node and
is available only to that node.
To use the Elasticsearch cluster data storage, you must declare its use in the elasticsearch
section of the AutomationBase
custom resource by defining a storage
element for each nodegroupspecs
.
For example,
spec:
elasticsearch:
nodegroupspecs:
- name: data
replicas: 3
storage:
size: 50Gi
class: rook-ceph-block
This example creates a 50 GB PersistentVolumeClaim
for each of the three replicas where the physical storage for each of the data nodes is provisioned by using the rook-ceph-block
StorageClass
.
The following table shows the child elements that can be specified as part of the storage
object. All elements are optional. If the element is not provided, and a default value is set for the cluster, then that default value is used.
Otherwise, the element is not included.
Element | Default | Description |
---|---|---|
size | data: 50 Gi, controller: 10 Gi | Size of the storage with scale suffix. |
class | default cluster StorageClass | Storage class name. |
selector | A Label selector to allow finer-grained PersistentVolume selection. See https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors for syntax. | |
volumeClaimTemplate | A PersistentVolumeClaim that allows a greater detailed specification of the volume. Only used for Snapshot storage. See https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims for syntax. | |
fsGroup | The group ID for the file system. May need to be set for some storage providers such as NFS | |
supplementalGroups | An array of group IDs to be added on the security context for the container. |
If spec.elasticsearch.nodegroupspecs[].storage.class
is not specified, and a default StorageClass is set for the cluster, the default StorageClass is used.
If you don't want a StorageClass to be used because your PersistentVolumes do not define a StorageClass, like NFS, then specify spec.elasticsearch.nodegroupspecs[].storage.class
with a value of ""
, as in class: ""
.
-
If StorageClasses support dynamic provisioning
- Some StorageClasses support dynamic provisioning. If the StorageClass in use supports dynamic provisioning, then a PersistentVolume is dynamically created when the PersistentVolumeClaim is created. Otherwise, the PersistentVolume needed to be created before the Elasticsearch storage is defined.
- On OpenShift, all dynamically provisioned volumes are created with the RECLAIMPOLICY set to
Delete
by default. Thus, the volume lasts only while the claim still exists in the system. If you delete the claim, the volume is also deleted, and all data on the volume is lost.
-
If StorageClasses do not support dynamic provisioning
- If the storage class doesn't support dynamic provisioning, for example, NFS, or if the PersistentVolumes are already created for use with Elasticsearch, the PersistentVolumes need to be referenced from the PersistentVolumeClaim that gets
created by the Elasticsearch operator. Use the
storage.selector
orstorage.volumeClaimTemplate
child elements of thespec.elasticsearch.nodegroupspecs[]
element in theAutomationBase
custom resource.
- If the storage class doesn't support dynamic provisioning, for example, NFS, or if the PersistentVolumes are already created for use with Elasticsearch, the PersistentVolumes need to be referenced from the PersistentVolumeClaim that gets
created by the Elasticsearch operator. Use the
Note: The storage.volumeClaimTemplate
child element can be used only when you configure the Elasticsearch storage snapshot.
Create PersistentVolumes by using a label so that they can be correctly defined in the PersistentVolumeClaim, as shown in the following example:
apiVersion: v1
kind: PersistentVolume
metadata:
name: elasticsearch-data1
labels:
es-storage-type: data
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteOnce
nfs:
path: /data/es-data1
server: 192.168.1.10
persistentVolumeReclaimPolicy: Recycle
See the following example of the spec.elasticsearch.nodegroupspecs[].storage
that uses a selector
that matches the es-storage-type: data
label:
spec:
elasticsearch:
license:
accept: true
version: "v1.0"
nodegroupspecs:
- name: data
replicas: 3
storage:
class: ""
selector:
matchLabels:
es-storage-type: data
size: 50Gi
Sufficient PersistentVolumes must be available to meet the needs of the replicas that are defined for each node group. In the previous example, three replicas are defined for the data
node group. Thus, at least three PersistentVolumes
of sufficient capacity must be available to meet the claim needs.
Note: If separate data
and master
nodes are being used with persistent storage, then a spec.elasticsearch.nodegroupspecs[].storage
object is required for both nodegroups
. It is
not possible to define persistent storage just for the data
nodes. However, different tiers of storage might be assigned to each node group.
If no storage is defined, the Elasticsearch cluster uses only the storage that is local to the container. All indices and configurations are lost when an Elasticsearch container restarts.
When an Elasticsearch cluster is deleted the PersistentVolumeClaims and bound PersistentVolumes remain intact to preserve the data.
Note: If a new Elasticsearch cluster that uses persisted storage is created with the same name as a previous cluster, then any previous PersistentVolumeClaims and hence PersistentVolumes are reused. If the storage
object that is defined on the new cluster is changed, for example the StorageClass
differs, then the PersistentVolumeClaims and PersistentVolumes need to be manually deleted before the creation of the new Elasticsearch
cluster of the same name in order for new PersistentVolumeClaims to be created.
Access modes
The Elasticsearch cluster requires storage that uses the following access modes:
-
The
master
anddata
nodes require storage that uses aReadWriteOnce
(RWO
) access mode that specifies that the volume can be mounted as read/write by a single node. -
Snapshot storage requires a
ReadWriteMany
(RWX
) access mode that specifies that the volume can be mounted as read/write by many nodes.
Only PersistentVolume mechanisms that support these access modes can be used for the different storage uses.
For more information, see Access modes.
Storage permissions
To access provided storage, workloads require permissions. These permissions are controlled with the securityContext
for the pod, in this case, the fsGroup
, and supplementalGroups
. When the securityContext
has an fsGroup
that is specified, all processes of the containers within the pod are part of the supplementary group ID that is specified by the fsGroup
. The owner for the persisted volume and any files that are created
in that volume are also in the group ID that is specified by the fsGroup
.
The default settings are valid for most scenarios. However, if your storage configuration requires fsGroup
or supplementalGroups
, use the Storage object to configure each node group.
- If you configure the
storage.fsGroup
element on thenodegroupspec
, this sets thespec.securityContext.fsGroup
element on the pods. - If you configure the
storage.supplementalGroups
element on thenodegroupspec
, this sets thespec.securityContext.supplementalGroups
element on the pods.
Note: The spec.securityContext.runAsGroup
cannot be set as part of the storage
object, so the default primary group ID for all containers is 0
(root).
The following example demonstrates setting an fsGroup
and supplementalGroups
in the elasticsearch
field of the AutomationBase
custom resource for a node group:
spec:
elasticsearch:
nodegroupspecs:
- name: data
replicas: 3
storage:
class: rook-ceph-block
size: 50Gi
fsGroup: 2000
supplementalGroups: [2001,2002]
The three pods that result from the previous node group definition each have a securityContext
as shown:
spec:
securityContext:
runAsNonRoot: true
fsGroup: 2000
supplementalGroups: [2001,2002]
If you run the id
command within the pod's container, you can confirm if the configurations took effect:
$ id
uid=1000660000(1000660000) gid=0(root) groups=0(root),2000,2001,2002
Note: If fsGroup
or supplementalGroups
are required, you might need to provide a SecurityContextConstraint
that is configured to support the specified values.
See the following example of the settings that are required in a SecurityContextConstraint
to accommodate these configurations:
fsGroup:
ranges:
- max: 3000
min: 2000
type: MustRunAs
supplementalGroups:
type: RunAsAny
Additional Allowed APIs
Use the additional allowed APIs (additionalAllowedAPIs
) to allow a predefined set of APIs and optionally a broader set of user-specified APIs. Configure this list with the spec.elasticsearch.additionalAllowedAPIs
field
in the AutomationBase CR.
Note: More APIs are allowed at your own risk!
Follow this format: GET:[api, api_two],POST:[api_three]...
- Add the method in capital letters followed by a colon (
:
). - Then, surrounded by brackets,
[]
, add the list of API names. - Separate the list items with commas.
For an example, see the default allowlist:
"GET:[main_action, cat_health_action, nodes_stats_action, get_snapshots_action, snapshot_status_action, recovery_action, cat_recovery_action, get_indices_action, search_action],HEAD:[main_action, get_indices_action],POST:[rest_handler_security_user_add, put_repository_action, restore_snapshot_action, create_snapshot_action, document_create_action_auto_id, document_index_action, document_create_action, search_action, bulk_action],DELETE:[rest_handler_security_user_delete, delete_repository_action, delete_index_action],PUT:[rest_handler_security_user_update, cluster_update_settings_action, put_repository_action, create_index_action, document_index_action, document_create_action]";
To open all APIs to connect OSS Kibana or other third-party applications, use the following wildcard approach:
"GET:[*],PUT:[*],POST:[*],HEAD:[*],DELETE:[*]"
See the following table for the default allowed APIs and their corresponding Elasticsearch documentation names:
Method | Allowed API syntax | API |
---|---|---|
GET | main_action | info |
cat_health_action | cat.health | |
nodes_stats_action | nodes.stats | |
get_snapshots_action | snapshot.get | |
snapshot_status_action | snapshot.status | |
recovery_action | indices.recovery | |
cat_recovery_action | cat.recovery | |
get_indices_action | indices.get | |
search_action | search | |
cluster_health_action | cluster.health | |
HEAD | main_action | ping |
get_indices_action | indices.exists | |
POST | rest_handler_security_user_add | security.users.create |
put_repository_action | snapshot.create_repository | |
restore_snapshot_action | snapshot.restore | |
create_snapshot_action | snapshot.create | |
document_create_action_auto_id | index | |
document_index_action | index | |
document_create_action | create | |
search_action | search | |
bulk_action | bulk | |
DELETE | rest_handler_security_user_delete | security.users.delete |
delete_repository_action | snapshot.delete_repository | |
delete_index_action | indices.delete | |
PUT | rest_handler_security_user_update | rest_handler_security_user_update |
cluster_update_settings_action | cluster.put_settings | |
put_repository_action | snapshot.create_repository | |
create_index_action | indices.create | |
document_index_action | index | |
document_create_action | create |
Note: APIs that are deprecated are blocked by the allowlist and cannot be enabled.
Reference table for available APIs to add to the allowlist
Method | Allowed API syntax | API | Paths |
---|---|---|---|
DELETE | clear_scroll_action | clear_scroll | [/_search/scroll, /_search/scroll/{scroll_id}] |
DELETE | clear_voting_config_exclusions_action | cluster.delete_voting_config_exclusions | [/_cluster/voting_config_exclusions] |
DELETE | delete_component_template_action | cluster.delete_component_template | [/_component_template/{name}] |
DELETE | delete_composable_index_template_action | indices.delete_index_template | [/_index_template/{name}] |
DELETE | delete_index_action | indices.delete | [/, /{index}] |
DELETE | delete_index_template_action | indices.delete_template | [/_template/{name}] |
DELETE | delete_repository_action | snapshot.delete_repository | [/_snapshot/{repository}] |
DELETE | delete_snapshot_action | snapshot.delete | [/_snapshot/{repository}/{snapshot}] |
DELETE | delete_stored_script_action | delete_script | [/_scripts/{id}] |
DELETE | document_delete_action | delete | [/{index}/_doc/{id}, /{index}/{type}/{id}] |
DELETE | index_delete_aliases_action | indices.delete_alias | [/{index}/_alias/{name}] |
DELETE | ingest_delete_pipeline_action | ingest.delete_pipeline | [/_ingest/pipeline/{id}] |
DELETE | rest_handler_security_user_delete | security.users.delete | [/_security/users/{username}] |
GET | _scripts_painless_execute | scripts_painless_execute | [/_scripts/painless/_execute] |
GET | analyze_action | indices.analyze | [/_analyze, /{index}/_analyze] |
GET | cat_action | cat.help | [/_cat] |
GET | cat_alias_action | cat.aliases | [/_cat/aliases, /_cat/aliases/{alias}] |
GET | cat_allocation_action | cat.allocation | [/_cat/allocation, /_cat/allocation/{nodes}] |
GET | cat_count_action | cat.count | [/_cat/count, /_cat/count/{index}] |
GET | cat_fielddata_action | cat.fielddata | [/_cat/fielddata, /_cat/fielddata/{fields}] |
GET | cat_health_action | cat.health | [/_cat/health] |
GET | cat_indices_action | cat.indices | [/_cat/indices, /_cat/indices/{index}] |
GET | cat_master_action | cat.master | [/_cat/master] |
GET | cat_node_attrs_action | cat.nodeattrs | [/_cat/nodeattrs] |
GET | cat_nodes_action | cat.nodes | [/_cat/nodes] |
GET | cat_pending_cluster_tasks_action | cat.pending_tasks | [/_cat/pending_tasks] |
GET | cat_plugins_action | cat.plugins | [/_cat/plugins] |
GET | cat_recovery_action | cat.recovery | [/_cat/recovery, /_cat/recovery/{index}] |
GET | cat_repositories_action | cat.repositories | [/_cat/repositories] |
GET | cat_segments_action | cat.segments | [/_cat/segments, /_cat/segments/{index}] |
GET | cat_shards_action | cat.shards | [/_cat/shards, /_cat/shards/{index}] |
GET | cat_snapshot_action | cat.snapshots | [/_cat/snapshots, /_cat/snapshots/{repository}] |
GET | cat_tasks_action | cat.tasks | [/_cat/tasks] |
GET | cat_templates_action | cat.templates | [/_cat/templates, /_cat/templates/{name}] |
GET | cat_threadpool_action | cat.thread_pool | [/_cat/thread_pool, /_cat/thread_pool/{thread_pool_patterns}] |
GET | cluster_allocation_explain_action | cluster.allocation_explain | [/_cluster/allocation/explain] |
GET | cluster_get_settings_action | cluster.get_settings | [/_cluster/settings] |
GET | cluster_health_action | cluster.health | [/_cluster/health, /_cluster/health/{index}] |
GET | cluster_search_shards_action | search_shards | [/_search_shards, /{index}/_search_shards] |
GET | cluster_state_action | cluster.state | [/_cluster/state, /_cluster/state/{metric}, /_cluster/state/{metric}/{indices}] |
GET | cluster_stats_action | cluster.stats | [/_cluster/stats, /_cluster/stats/nodes/{nodeId}] |
GET | count_action | count | [/_count, /{index}/_count, /{index}/{type}/_count] |
GET | document_get_action | get | [/{index}/_doc/{id}, /{index}/{type}/{id}] |
GET | document_get_source_action | get_source | [/{index}/_source/{id}, /{index}/{type}/{id}/_source] |
GET | document_mget_action | mget | [/_mget, /{index}/_mget, /{index}/{type}/_mget] |
GET | document_multi_term_vectors_action | mtermvectors | [/_mtermvectors,/{index}/_mtermvectors, /{index}/{type}/_mtermvectors] |
GET | document_term_vectors_action | termvectors | [/{index}/_termvectors, /{index}/_termvectors/{id}, /{index}/{type}/_termvectors, /{index}/{type}/{id}/_termvectors] |
GET | explain_action | explain | [/{index}/_explain/{id}, /{index}/{type}/{id}/_explain] |
GET | field_capabilities_action | field_caps | [/_field_caps, /{index}/_field_caps] |
GET | flush_action | indices.flush | [/_flush, /{index}/_flush] |
GET | get_aliases_action | indices.get_alias | [/_alias /{index}/_alias] |
GET | get_component_template_action | cluster.get_component_template | [/_component_template, /_component_template/{name}] |
GET | get_composable_index_template_action | indices.get_index_template | [/_index_template, /_index_template/{name}] |
GET | get_field_mapping_action | indices.get_field_mapping | [/_mapping/field/{fields}, /_mapping/{type}/field/{fields}, /{index}/_mapping/field/{fields}, /{index}/{type}/_mapping/field/{fields}, /{index}/_mapping/{type}/field/{fields}] |
GET | get_index_template_action | indices.get_template | [/_template, /_template/{name}] |
GET | get_indices_action | indices.get | [/{index}] |
GET | get_mapping_action | indices.get_mapping | [/_mapping, /{index}/{type}/_mapping, /{index}/_mapping, /{index}/_mappings, /{index}/_mappings/{type}, /{index}/_mapping/{type}, /{index}/_mapping/{type}, /_mapping/{type}] |
GET | get_repositories_action | snapshot.get_repository | [/_snapshot, /_snapshot/{repository}] |
GET | get_settings_action | indices.get_settings | [/_settings, /_settings/{name}, /{index}/_settings, /{index}/_settings/{name}] |
GET | get_snapshots_action | snapshot.get | [/_snapshot/{repository}/{snapshot}] |
GET | get_stored_scripts_action | get_script | [/_scripts/{id}] |
GET | get_task_action | tasks.get | [/_tasks/{task_id}] |
GET | indices_segments_action | indices.segments | [/_segments, /{index}/_segments] |
GET | indices_shard_stores_action | indices.shard_stores | [/_shard_stores, /{index}/_shard_stores] |
GET | indices_stats_action | indices.stats | [/_stats, /_stats/{metric}, /{index}/_stats, /{index}/_stats/{metric}] |
GET | ingest_get_pipeline_action | ingest.get_pipeline | [/_ingest/pipeline, /_ingest/pipeline/{id}] |
GET | ingest_processor_grok_get | ingest.processor_grok | /_ingest/processor/grok] |
GET | ingest_simulate_pipeline_action | ingest.simulate | [/_ingest/pipeline/{id}/_simulate, /_ingest/pipeline/{id}/_simulate, /_ingest/pipeline/_simulate] |
GET | list_tasks_action | tasks.list | [/_tasks] |
GET | main_action | info | [/] |
GET | msearch_action | msearch | [/_msearch, /{index}/_msearch, /{index}/{type}/_msearch] |
GET | multi_search_template_action | msearch_template | [/_msearch/template, /{index}/_msearch/template, /{index}/{type}/_msearch/template] |
GET | nodes_hot_threads_action | nodes.hot_threads | [/_nodes/hot_threads, /_nodes/{nodeId}/hot_threads] |
GET | nodes_info_action | nodes.info | [/_nodes, /_nodes/{nodeId}, /_nodes/{nodeId}/{metrics}, /_nodes/{nodeId}/info/{metrics}] |
GET | nodes_stats_action | nodes.stats | [/_nodes/stats, /_nodes/{nodeId}/stats, /_nodes/stats/{metric}, /_nodes/{nodeId}/stats/{metric}, /_nodes/stats/{metric}/{index_metric}, /_nodes/{nodeId}/stats/{metric}/{index_metric}] |
GET | nodes_usage_action | nodes.usage | [/_nodes/usage, /_nodes/{nodeId}/usage, /_nodes/usage/{metric}, /_nodes/{nodeId}/usage/{metric}] |
GET | pending_cluster_tasks_action | cluster.pending_tasks | [/_cluster/pending_tasks] |
GET | rank_eval_action | rank_eval | [/_rank_eval, /{index}/_rank_eval] |
GET | recovery_action | indices.recovery | [/_recovery, /{index}/_recovery] |
GET | refresh_action | indices.refresh | [/_refresh, /{index}/_refresh] |
GET | remote_cluster_info_action | cluster.remote_info | _remote/info] |
GET | render_search_template_action | render_search_template | [/_render/template, /_render/template/{id}] |
GET | script_context_action | get_script_context | [/_script_context] |
GET | script_language_action | get_script_languages | [/_script_language] |
GET | search_action | search | [/_search, /{index}/_search, /{index}/{type}/_search] |
GET | search_scroll_action | scroll | [/_search/scroll, /_search/scroll/{scroll_id}] |
GET | search_template_action | search_template | [/_search/template, /{index}/_search/template, /{index}/{type}/_search/template] |
GET | snapshot_status_action | snapshot.status | [/_snapshot/{repository}/{snapshot}/_status, /_snapshot/{repository}/_status, /_snapshot/_status] |
GET | synced_flush_action | indices.flush_synced | [/_flush/synced, /{index}/_flush/synced] |
GET | upgrade_status_action | indices.get_upgrade | [/_upgrade, /{index}/_upgrade] |
GET | validate_query_action | indices.validate_query | [/_validate/query, /{index}/_validate/query, /{index}/{type}/_validate/query] |
HEAD | document_get_action | exists | [/{index}/_doc/{id}, /{index}/{type}/{id}] |
HEAD | document_get_source_action | exists_source | [/{index}/_source/{id}, /{index}/{type}/{id}/_source] |
HEAD | get_aliases_action | indices.exists_alias | [/_alias /{index}/_alias] |
HEAD | get_component_template_action | cluster.exists_component_template | [/_component_template, /_component_template/{name}] |
HEAD | get_composable_index_template_action | indices.exists_index_template | [/_index_template, /_index_template/{name}] |
HEAD | get_index_template_action | indices.exists_template | [/_template, /_template/{name}] |
HEAD | get_indices_action | indices.exists | [/{index}] |
HEAD | get_mapping_action | indices.exists_type | [/_mapping, /{index}/{type}/_mapping, /{index}/_mapping, /{index}/_mappings, /{index}/_mappings/{type}, /{index}/_mapping/{type}, /{index}/_mapping/{type}, /_mapping/{type}] |
HEAD | main_action | ping | [/] |
POST | _scripts_painless_execute | scripts_painless_execute | [/_scripts/painless/_execute] |
POST | add_voting_config_exclusions_action | cluster.post_voting_config_exclusions | [/_cluster/voting_config_exclusions/{node_name}, /_cluster/voting_config_exclusions] |
POST | analyze_action | indices.analyze | [/_analyze, /{index}/_analyze] |
POST | bulk_action | bulk | [/_bulk, /{index}/_bulk, /{index}/{type}/_bulk] |
POST | cancel_tasks_action | tasks.cancel | [/_tasks/_cancel, /_tasks/{task_id}/_cancel] |
POST | cleanup_repository_action | snapshot.cleanup_repository | [/_snapshot/{repository}/_cleanup] |
POST | clear_indices_cache_action | indices.clear_cache | [/_cache/clear, /{index}/_cache/clear] |
POST | clone_index_action | indices.clone | [/{index}/_clone/{target}] |
POST | close_index_action | indices.close | [/_close, /{index}/_close] |
POST | cluster_allocation_explain_action | cluster.allocation_explain | [/_cluster/allocation/explain] |
POST | cluster_reroute_action | cluster.reroute | [/_cluster/reroute] |
POST | cluster_search_shards_action | search_shards | [/_search_shards, /{index}/_search_shards] |
POST | count_action | count | [/_count, /{index}/_count, /{index}/{type}/_count] |
POST | create_snapshot_action | snapshot.create | [/_snapshot/{repository}/{snapshot}] |
POST | delete_by_query_action | delete_by_query | [/{index}/_delete_by_query, /{index}/{type}/_delete_by_query] |
POST | document_create_action_auto_id | index | /{index}/_doc, /{index}/{type}] |
POST | document_create_action | create | [/{index}/_create/{id}, /{index}/{type}/{id}/_create] |
POST | document_index_action | index | [/{index}/_doc/{id}, /{index}/{type}/{id}] |
POST | document_mget_action | mget | [/_mget, /{index}/_mget, /{index}/{type}/_mget] |
POST | document_multi_term_vectors_action | mtermvectors | [/_mtermvectors,/{index}/_mtermvectors, /{index}/{type}/_mtermvectors] |
POST | document_term_vectors_action | termvectors | [/{index}/_termvectors, /{index}/_termvectors/{id}, /{index}/{type}/_termvectors, /{index}/{type}/{id}/_termvectors] |
POST | document_update_action | update | [/{index}/_update/{id}, /{index}/{type}/{id}/_update] |
POST | explain_action | explain | [/{index}/_explain/{id}, /{index}/{type}/{id}/_explain] |
POST | field_capabilities_action | field_caps | [/_field_caps, /{index}/_field_caps] |
POST | flush_action | indices.flush | [/_flush, /{index}/_flush] |
POST | force_merge_action | indices.forcemerge | [/_forcemerge, /{index}/_forcemerge] |
POST | index_put_alias_action | indices.put_alias | [/{index}/_alias/{name}, /_alias/{name}, /_aliases/{name}, /{index}/_alias, /{index}/_aliases, /_alias] |
POST | indices_aliases_action | indices.update_aliases | [/_aliases] |
POST | ingest_simulate_pipeline_action | ingest.simulate | [/_ingest/pipeline/{id}/_simulate, /_ingest/pipeline/{id}/_simulate, /_ingest/pipeline/_simulate] |
POST | msearch_action | msearch | [/_msearch, /{index}/_msearch, /{index}/{type}/_msearch] |
POST | multi_search_template_action | msearch_template | [/_msearch/template, /{index}/_msearch/template, /{index}/{type}/_msearch/template] |
POST | nodes_reload_action | nodes.reload_secure_settings | [/_nodes/reload_secure_settings, /_nodes/{nodeId}/reload_secure_settings] |
POST | open_index_action | indices.open | [/_open, /{index}/_open] |
POST | put_component_template_action | cluster.put_component_template | [/_component_template/{name}] |
POST | put_composable_index_template_action | indices.put_index_template | [/_index_template/{name}] |
POST | put_index_template_action | indices.put_template | [/_template/{name}] |
POST | put_mapping_action | indices.put_mapping | [/{index}/_mapping/, /{index}/{type}/_mapping, /{index}/_mapping/{type}, /_mapping/{type}, /{index}/_mappings/, /{index}/{type}/_mappings, /{index}/_mappings/{type}, /_mappings/{type}] |
POST | put_repository_action | snapshot.create_repository | [/_snapshot/{repository}] |
POST | put_stored_script_action | put_script | [/_scripts/{id}, /_scripts/{id}/{context}] |
POST | rank_eval_action | rank_eval | [/_rank_eval, /{index}/_rank_eval] |
POST | refresh_action | indices.refresh | [/_refresh, /{index}/_refresh] |
POST | reindex_action | reindex | [/_reindex] |
POST | render_search_template_action | render_search_template | [/_render/template, /_render/template/{id}] |
POST | rest_handler_security_user_add | security.users.create | [/_security/users] |
POST | restore_snapshot_action | snapshot.restore | [/_snapshot/{repository}/{snapshot}/_restore] |
POST | rethrottle_action | delete_by_query_rethrottle | [/_update_by_query/{taskId}/_rethrottle, /_delete_by_query/{taskId}/_rethrottle, /_reindex/{taskId}/_rethrottle] |
POST | rethrottle_action | reindex_rethrottle | [/_update_by_query/{taskId}/_rethrottle, /_delete_by_query/{taskId}/_rethrottle, /_reindex/{taskId}/_rethrottle] |
POST | rethrottle_action | update_by_query_rethrottle | [/_update_by_query/{taskId}/_rethrottle, /_delete_by_query/{taskId}/_rethrottle, /_reindex/{taskId}/_rethrottle] |
POST | rollover_index_action | indices.rollover | [/{index}/_rollover, /{index}/_rollover/{new_index}] |
POST | search_action | search | [/_search, /_search, /{index}/_search, /{index}/{type}/_search] |
POST | search_scroll_action | scroll | [/_search/scroll, /_search/scroll/{scroll_id}] |
POST | search_template_action | search_template | [/_search/template, /{index}/_search/template, /{index}/{type}/_search/template] |
POST | shrink_index_action | indices.shrink | [/{index}/_shrink/{target}] |
POST | simulate_index_template_action | indices.simulate_index_template | [/_index_template/_simulate_index/{name}] |
POST | split_index_action | indices.split | [/{index}/_split/{target}] |
POST | synced_flush_action | indices.flush_synced | [/_flush/synced, /{index}/_flush/synced] |
POST | update_by_query_action | update_by_query | [/{index}/_update_by_query, /{index}/{type}/_update_by_query] |
POST | upgrade_action | indices.upgrade | [/_upgrade, /{index}/_upgrade] |
POST | validate_query_action | indices.validate_query | [/_validate/query, /{index}/_validate/query, /{index}/{type}/_validate/query] |
POST | verify_repository_action | snapshot.verify_repository | [/_snapshot/{repository}/_verify] |
PUT | bulk_action | bulk | [/_bulk, /{index}/_bulk, /{index}/{type}/_bulk] |
PUT | clone_index_action | indices.clone | [/{index}/_clone/{target}, /{index}/_clone/{target}] |
PUT | cluster_update_settings_action | cluster.put_settings | [/_cluster/settings] |
PUT | create_index_action | indices.create | [/{index}] |
PUT | create_snapshot_action | snapshot.create | [/_snapshot/{repository}/{snapshot}] |
PUT | document_create_action | create | [/{index}/_create/{id}, /{index}/{type}/{id}/_create] |
PUT | document_index_action | index | [/{index}/_doc/{id}, /{index}/{type}/{id}] |
PUT | index_put_alias_action | indices.put_alias | [/{index}/_alias/{name}, /_alias/{name}, /_aliases/{name}, /{index}/_alias, /{index}/_aliases, /_alias] |
PUT | ingest_put_pipeline_action | ingest.put_pipeline | [/_ingest/pipeline/{id}] |
PUT | put_component_template_action | cluster.put_component_template | [/_component_template/{name}] |
PUT | put_composable_index_template_action | indices.put_index_template | [/_index_template/{name}] |
PUT | put_index_template_action | indices.put_template | [/_template/{name}] |
PUT | put_mapping_action | indices.put_mapping | [/{index}/_mapping/, /{index}/{type}/_mapping, /{index}/_mapping/{type}, /_mapping/{type}, /{index}/_mappings/, /{index}/{type}/_mappings, /{index}/_mappings/{type}, /_mappings/{type}] |
PUT | put_repository_action | snapshot.create_repository | [/_snapshot/{repository}] |
PUT | put_stored_script_action | put_script | [/_scripts/{id}, /_scripts/{id}/{context}] |
PUT | rest_handler_security_user_update | security.users.modify | [/_security/users/{username}] |
PUT | shrink_index_action | indices.shrink | [/{index}/_shrink/{target}] |
PUT | split_index_action | indices.split | [/{index}/_split/{target}] |
PUT | update_settings_action | indices.put_settings | [/{index}/_settings, /_settings] |
Backup and restore
Take a snapshot of a running Elasticsearch cluster to back it up.
Snapshot storage requires a ReadWriteMany (RWX) access mode that specifies that the volume can be mounted as read/write by many nodes.
To use the Elasticsearch cluster snapshot storage, declare its use in the elasticsearch
section of the AutomationBase
custom resource by defining a snapshotStores
element as shown in the following example.
This example creates a 10 GB volume that is mounted at /usr/share/elasticsearch/snapshots/main
by using the csi-cephfs
StorageClass:
spec:
elasticsearch:
snapshotStores:
- name: main
storage:
class: "csi-cephfs"
size: "10Gi"
See the following example of the spec.elasticsearch.snapshotStores.storage
that uses a volumeClaimTemplate
:
spec:
elasticsearch:
license:
accept: true
version: "v1.0"
snapshotStores:
- name: main
storage:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteMany
volumeMode: Filesystem
resources:
requests:
storage: 50Gi
storageClassName: "myStorageClass"
selector:
matchLabels:
es-storage-type: data
Note: Only the spec
section of PersistentVolumeClaim provided for the volumeClaimTemplate
field is used when creating the PersistentVolumeClaim resource. Any metadata
entries provided under
the volumeClaimTemplate
is not used.
- The definition of a snapshot store allocates storage for a snapshot repository that is mounted at a path by using the provided name in the following form:
/usr/share/elasticsearch/snapshots/<name>
. - Define one or more snapshot stores within the
snapshotStores
array. This definition does not create the Elasticsearch snapshot repositories but only the volumes for it.
When you use a shared file system to store snapshots, add the file system path or parent directory to the path.repo
setting in the elasticsearch.yml
file for each main and data node. Define these values in the Elasticsearch
custom resource by defining a config
element as shown in the following example:
spec:
elasticsearch:
snapshotStores:
- name: main
storage:
class: "csi-cephfs"
size: "10Gi"
nodegroupspecs:
- name: data
replicas: 3
config:
- key: path.repo
value: "/usr/share/elasticsearch/snapshots/main"
If you configure more than one snapshot store, enter the file system paths with a single path.repo
config element as shown in the following example:
spec:
elasticsearch:
nodegroupspecs:
- name: data
replicas: 3
config:
- key: path.repo
value: '["/usr/share/elasticsearch/snapshots/main", "/usr/share/elasticsearch/snapshots/temporary"]'
Then, create the snapshot repositories with the Elasticsearch REST API. The REST API references the mount path of the storage.
For more information, see Snapshot and restore.
Specify the following child elements as part of the snapshotStores
array:
Element | Description |
---|---|
name | Name of the snapshot store, which is also used in the mount path |
storage | A storage object |
The child elements that you can specify as part of the storage
object are the same elements that you can define in the Storage section.
Monitoring
By default, monitoring capabilities for Elasticsearch are disabled. To enable them, deploy your Elasticsearch
custom resource as shown in the following example:
spec:
elasticsearch:
monitoring: {}
This feature deploys resources necessary to export the Elasticsearch metrics from the running instance to a Prometheus-friendly format and exposes an endpoint with these metrics. It also deploys a resource to gather these metrics into OpenShift Monitoring.
To enable the necessary monitoring in OpenShift as of version 4.6, follow the instructions in Enabling monitoring for user-defined projects to set up the correct ConfigMap configurations.
After the monitoring is enabled, go to Monitoring>Metrics to view the extracted metrics.
The following metrics for Elasticsearch can be queried.
Name | Type | Cardinality | Help |
---|---|---|---|
elasticsearch_breakers_estimated_size_bytes | gauge | 4 | Estimated size in bytes of breaker |
elasticsearch_breakers_limit_size_bytes | gauge | 4 | Limit size in bytes for breaker |
elasticsearch_breakers_tripped | counter | 4 | tripped for breaker |
elasticsearch_cluster_health_active_primary_shards | gauge | 1 | The number of primary shards in your cluster. This is an aggregate total across all indices. |
elasticsearch_cluster_health_active_shards | gauge | 1 | Aggregate total of all shards across all indices, which includes replica shards. |
elasticsearch_cluster_health_delayed_unassigned_shards | gauge | 1 | Shards delayed to reduce reallocation overhead |
elasticsearch_cluster_health_initializing_shards | gauge | 1 | Count of shards that are being freshly created. |
elasticsearch_cluster_health_number_of_data_nodes | gauge | 1 | Number of data nodes in the cluster. |
elasticsearch_cluster_health_number_of_in_flight_fetch | gauge | 1 | The number of ongoing shard info requests. |
elasticsearch_cluster_health_number_of_nodes | gauge | 1 | Number of nodes in the cluster. |
elasticsearch_cluster_health_number_of_pending_tasks | gauge | 1 | Cluster level changes, which have not yet been run |
elasticsearch_cluster_health_task_max_waiting_in_queue_millis | gauge | 1 | Max time in millis that a task is waiting in queue. |
elasticsearch_cluster_health_relocating_shards | gauge | 1 | The number of shards that are currently moving from one node to another node. |
elasticsearch_cluster_health_status | gauge | 3 | Whether all primary and replica shards are allocated. |
elasticsearch_cluster_health_timed_out | gauge | 1 | Number of cluster health checks timed out |
elasticsearch_cluster_health_unassigned_shards | gauge | 1 | The number of shards that exist in the cluster state, but cannot be found in the cluster itself. |
elasticsearch_filesystem_data_available_bytes | gauge | 1 | Available space on block device in bytes |
elasticsearch_filesystem_data_free_bytes | gauge | 1 | Free space on block device in bytes |
elasticsearch_filesystem_data_size_bytes | gauge | 1 | Size of block device in bytes |
elasticsearch_filesystem_io_stats_device_operations_count | gauge | 1 | Count of disk operations |
elasticsearch_filesystem_io_stats_device_read_operations_count | gauge | 1 | Count of disk read operations |
elasticsearch_filesystem_io_stats_device_write_operations_count | gauge | 1 | Count of disk write operations |
elasticsearch_filesystem_io_stats_device_read_size_kilobytes_sum | gauge | 1 | Total kilobytes read from disk |
elasticsearch_filesystem_io_stats_device_write_size_kilobytes_sum | gauge | 1 | Total kilobytes written to disk |
elasticsearch_indices_docs | gauge | 1 | Count of documents on this node |
elasticsearch_indices_docs_deleted | gauge | 1 | Count of deleted documents on this node |
elasticsearch_indices_docs_primary | gauge | Count of documents with only primary shards on all nodes | |
elasticsearch_indices_fielddata_evictions | counter | 1 | Evictions from field data |
elasticsearch_indices_fielddata_memory_size_bytes | gauge | 1 | Field data cache memory usage in bytes |
elasticsearch_indices_filter_cache_evictions | counter | 1 | Evictions from filter cache |
elasticsearch_indices_filter_cache_memory_size_bytes | gauge | 1 | Filter cache memory usage in bytes |
elasticsearch_indices_flush_time_seconds | counter | 1 | Cumulative flush time in seconds |
elasticsearch_indices_flush_total | counter | 1 | Total flushes |
elasticsearch_indices_get_exists_time_seconds | counter | 1 | Total time get exists in seconds |
elasticsearch_indices_get_exists_total | counter | 1 | Total get exists operations |
elasticsearch_indices_get_missing_time_seconds | counter | 1 | Total time of get missing in seconds |
elasticsearch_indices_get_missing_total | counter | 1 | Total get missing |
elasticsearch_indices_get_time_seconds | counter | 1 | Total get time in seconds |
elasticsearch_indices_get_total | counter | 1 | Total get |
elasticsearch_indices_indexing_delete_time_seconds_total | counter | 1 | Total time indexing delete in seconds |
elasticsearch_indices_indexing_delete_total | counter | 1 | Total indexing deletes |
elasticsearch_indices_indexing_index_time_seconds_total | counter | 1 | Cumulative index time in seconds |
elasticsearch_indices_indexing_index_total | counter | 1 | Total index calls |
elasticsearch_indices_merges_docs_total | counter | 1 | Cumulative docs merged |
elasticsearch_indices_merges_total | counter | 1 | Total merges |
elasticsearch_indices_merges_total_size_bytes_total | counter | 1 | Total merge size in bytes |
elasticsearch_indices_merges_total_time_seconds_total | counter | 1 | Total time spent merging in seconds |
elasticsearch_indices_query_cache_cache_total | counter | 1 | Count of query cache |
elasticsearch_indices_query_cache_cache_size | gauge | 1 | Size of query cache |
elasticsearch_indices_query_cache_count | counter | 2 | Count of query cache hit/miss |
elasticsearch_indices_query_cache_evictions | counter | 1 | Evictions from query cache |
elasticsearch_indices_query_cache_memory_size_bytes | gauge | 1 | Query cache memory usage in bytes |
elasticsearch_indices_query_cache_total | counter | 1 | Size of query cache total |
elasticsearch_indices_refresh_time_seconds_total | counter | 1 | Total time spent refreshing in seconds |
elasticsearch_indices_refresh_total | counter | 1 | Total refreshes |
elasticsearch_indices_request_cache_count | counter | 2 | Count of request cache hit/miss |
elasticsearch_indices_request_cache_evictions | counter | 1 | Evictions from request cache |
elasticsearch_indices_request_cache_memory_size_bytes | gauge | 1 | Request cache memory usage in bytes |
elasticsearch_indices_search_fetch_time_seconds | counter | 1 | Total search fetch time in seconds |
elasticsearch_indices_search_fetch_total | counter | 1 | Total number of fetches |
elasticsearch_indices_search_query_time_seconds | counter | 1 | Total search query time in seconds |
elasticsearch_indices_search_query_total | counter | 1 | Total number of queries |
elasticsearch_indices_segments_count | gauge | 1 | Count of index segments on this node |
elasticsearch_indices_segments_memory_bytes | gauge | 1 | Current memory size of segments in bytes |
elasticsearch_indices_settings_stats_read_only_indices | gauge | 1 | Count of indices that have read_only_allow_delete=true |
elasticsearch_indices_shards_docs | gauge | 3 | Count of documents on this shard |
elasticsearch_indices_shards_docs_deleted | gauge | 3 | Count of deleted documents on each shard |
elasticsearch_indices_store_size_bytes | gauge | 1 | Current size of stored index data in bytes |
elasticsearch_indices_store_size_bytes_primary | gauge | Current size of stored index data in bytes with only primary shards on all nodes | |
elasticsearch_indices_store_size_bytes_total | gauge | Current size of stored index data in bytes with all shards on all nodes | |
elasticsearch_indices_store_throttle_time_seconds_total | counter | 1 | Throttle time for index store in seconds |
elasticsearch_indices_translog_operations | counter | 1 | Total translog operations |
elasticsearch_indices_translog_size_in_bytes | counter | 1 | Total translog size in bytes |
elasticsearch_indices_warmer_time_seconds_total | counter | 1 | Total warmer time in seconds |
elasticsearch_indices_warmer_total | counter | 1 | Total warmer count |
elasticsearch_jvm_gc_collection_seconds_count | counter | 2 | Count of JVM GC runs |
elasticsearch_jvm_gc_collection_seconds_sum | counter | 2 | GC run time in seconds |
elasticsearch_jvm_memory_committed_bytes | gauge | 2 | JVM memory currently committed by area |
elasticsearch_jvm_memory_max_bytes | gauge | 1 | JVM memory max |
elasticsearch_jvm_memory_used_bytes | gauge | 2 | JVM memory currently used by area |
elasticsearch_jvm_memory_pool_used_bytes | gauge | 3 | JVM memory currently used by pool |
elasticsearch_jvm_memory_pool_max_bytes | counter | 3 | JVM memory max by pool |
elasticsearch_jvm_memory_pool_peak_used_bytes | counter | 3 | JVM memory peak used by pool |
elasticsearch_jvm_memory_pool_peak_max_bytes | counter | 3 | JVM memory peak max by pool |
elasticsearch_os_cpu_percent | gauge | 1 | Percent CPU used by the OS |
elasticsearch_os_load1 | gauge | 1 | Short term load average |
elasticsearch_os_load5 | gauge | 1 | Midterm load average |
elasticsearch_os_load15 | gauge | 1 | Long term load average |
elasticsearch_process_cpu_percent | gauge | 1 | Percent CPU used by process |
elasticsearch_process_cpu_time_seconds_sum | counter | 3 | Process CPU time in seconds |
elasticsearch_process_mem_resident_size_bytes | gauge | 1 | Resident memory in use by process in bytes |
elasticsearch_process_mem_share_size_bytes | gauge | 1 | Shared memory in use by process in bytes |
elasticsearch_process_mem_virtual_size_bytes | gauge | 1 | Total virtual memory used in bytes |
elasticsearch_process_open_files_count | gauge | 1 | Open file descriptors |
elasticsearch_snapshot_stats_number_of_snapshots | gauge | 1 | Total number of snapshots |
elasticsearch_snapshot_stats_oldest_snapshot_timestamp | gauge | 1 | Oldest snapshot timestamp |
elasticsearch_snapshot_stats_snapshot_start_time_timestamp | gauge | 1 | Last snapshot start timestamp |
elasticsearch_snapshot_stats_snapshot_end_time_timestamp | gauge | 1 | Last snapshot end timestamp |
elasticsearch_snapshot_stats_snapshot_number_of_failures | gauge | 1 | Last snapshot number of failures |
elasticsearch_snapshot_stats_snapshot_number_of_indices | gauge | 1 | Last snapshot number of indices |
elasticsearch_snapshot_stats_snapshot_failed_shards | gauge | 1 | Last snapshot failed shards |
elasticsearch_snapshot_stats_snapshot_successful_shards | gauge | 1 | Last snapshot successful shards |
elasticsearch_snapshot_stats_snapshot_total_shards | gauge | 1 | Last snapshot total shard |
elasticsearch_thread_pool_active_count | gauge | 14 | Thread Pool threads active |
elasticsearch_thread_pool_completed_count | counter | 14 | Thread Pool operations completed |
elasticsearch_thread_pool_largest_count | gauge | 14 | Thread Pool largest threads count |
elasticsearch_thread_pool_queue_count | gauge | 14 | Thread Pool operations queued |
elasticsearch_thread_pool_rejected_count | counter | 14 | Thread Pool operations rejected |
elasticsearch_thread_pool_threads_count | gauge | 14 | Thread Pool current threads count |
elasticsearch_transport_rx_packets_total | counter | 1 | Count of packets received |
elasticsearch_transport_rx_size_bytes_total | counter | 1 | Total number of bytes received |
elasticsearch_transport_tx_packets_total | counter | 1 | Count of packets sent |
elasticsearch_transport_tx_size_bytes_total | counter | 1 | Total number of bytes sent |
elasticsearch_clusterinfo_last_retrieval_success_ts | gauge | 1 | Timestamp of the last successful cluster info retrieval |
elasticsearch_clusterinfo_up | gauge | 1 | Up metric for the cluster info collector |
elasticsearch_clusterinfo_version_info | gauge | 6 | Constant metric with ES version information as labels |
Audit logging
Enable the custom security plug-in that is provided with the Elasticsearch cluster in IBM Automation foundation to log authorization audit records in Cloud Auditing Data Format (CADF).
The authorization audit logging is disabled by default. To enable it, send a REST request to the Elasticsearch system.
curl -X PUT -u "<USERNAME>:<PASSWORD>" "http://<you_elasticsearch_host>/_cluster/settings" \
-H 'Content-Type: application/json' \
-d'{
"transient": {
"logger.com.ibm.elasticsearch.audit": "AUDIT"
}
}'
To turn off the logging for authorization, send a REST request to the Elasticsearch system.
curl -X PUT -u "<USERNAME>:<PASSWORD>" "http://<you_elasticsearch_host>/_cluster/settings" \
-H 'Content-Type: application/json' \
-d'{
"transient": {
"logger.com.ibm.elasticsearch.audit": "OFF"
}
}'
The audit logs for each container are written to the base logs directory in the <cluster_name>_audit.json
file. For example, if the Elasticsearch instance is called elasticsearch-sample
, the log file is /usr/share/elasticsearch/storage/logs/elasticsearch-sample-elasticsearch-cluster_audit.json
.
See the following example of a CADF authorization audit log message:
{
"outcome": "success",
"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event",
"eventType": "activity",
"eventTime": "2021-01-12T12:33:21.409062Z",
"action": "authenticate",
"requestPath": "/customer/_doc/1",
"id": "elasticsearch:ccb7c058-0c32-443c-a6ee-acab813ad978",
"severity": "normal",
"initiator": {
"id": "elasticsearch:d367b08e-af48-456f-9f3b-4e325c85a098",
"name": "elasticsearch-admin",
"typeURI": "service/security/account/user",
"host": {
"agent": "PostmanRuntime/7.26.8",
"address": "/127.0.0.1"
},
"credential": {
"type": "user"
}
},
"target": {
"id": "elasticsearch:elasticsearch-sample-elasticsearch-es-master-data-0",
"name": "elasticsearch-sample-elasticsearch-es-master-data-0",
"typeURI": "service/security/account/user"
},
"observer": {
"name": "ElasticSearchSecurityPlugin",
"id": "userActivity",
"typeURI": "service/security/elasticsearch"
},
"reason": {
"reasonCode": 200,
"reasonType": "OK"
}
}
JVM options
The JVM that is being used within the Elasticsearch container is set up using the default Elasticsearch JVM settings. By default, Elasticsearch tells the JVM to use a heap with a minimum and maximum size of 1 GB. When you move to production, it
is important to configure heap size to ensure that Elasticsearch has enough heap available. The JVM options can be specified by providing an ES_JAVA_OPTS
environment variable for each of the containers.
The following example demonstrates the use of AutomationBase
custom resource to set the ES_JAVA_OPTS
environment variable, which specifies the Java minimum and maximum heap size for the master-data
containers.
spec:
elasticsearch:
nodegroupspecs:
- name: master-data
replicas: 3
storage: {}
template:
pod:
spec:
containers:
- env:
- name: ES_JAVA_OPTS
value: '-Xms2g -Xmx2g'
name: elasticsearch
resources:
limits:
cpu: 1100m
memory: 5Gi
requests:
cpu: 900m
memory: 3Gi
Note: When you set the Xms (minimum heap size) and Xmx (maximum heap size) settings, you must set these to be equal to each other. Also, you must set Xmx and Xms to no more than 50% of your physical RAM.
For more information, see Setting the heap size documentation.
Elasticsearch on multi-zone clusters
The Elasticsearch included in IBM Automation Foundation works fine on multi-zone clusters. Although it works fine and provides resilience against whole-zone failures, it may not keep up optimal performance. The performance depends on low latency and high bandwidth connection between data centres hosting the nodes.
For more details, please refer the link here.
The cross cluster replication feature, required to implement the performance optimised multi-zone deployment is not available in the current version of Elasticsearch.
Support for the latest Elasticsearch version
IBM Automation Foundation v1.3.0
with Elasticsearch Operator version 1.3.0
has included a new Operand version for ElasticSearch
CRs of 2.0.0
.
The use of the 2.0.0
operand version (and the corresponding v2
operand channel) will utilise the latest Elasticsearch (ELv2 v17.15.1). The v2
operand needs to be set on the elasticsearch
element
in the Automationbase CR. For example:
apiVersion: base.automation.ibm.com/v1beta1
kind: AutomationBase
metadata:
name: iaf-automationbase-instance
namespace: acme-iaf
spec:
elasticsearch:
license:
accept: true
version: v1.0
monitoring: {}
nodegroupspecs:
- name: master-data
replicas: 3
tls: {}
version: v2
kafka: {}
license:
accept: true
tls: {}
version: v1
If the v1
channel is used, Elasticsearch v7.8.0 will continue to be used.
Note: Once the v2
operand is used, it is not possible to easily move to v1
. This is due to the resources created by Elasticsearch 7.15.1 is not being valid on 7.8.0. While upgrading from v1
to v2
operator channel, it is highly recommended to make a backup of the Elasicsearch database prior to modifying the operand to use v2
.