Operator ElasticsearchIndex

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.ibm.streamsx.elasticsearch/op$com.ibm.streamsx.elasticsearch$ElasticsearchIndex.svg

The ElasticsearchIndex operator receives incoming tuples and stores the tuple attributes name-value pairs as JSON documents in a specified index of an Elasticsearch database. It uses the Elasticsearch REST API to connect to the server.

The operator requires a hostname and hostport or a list of hostname/port pairs of an Elasticsearch cluster to connect to. By default, the server is 'localhost', and the port is 9200. This configuration can be changed via the 'nodeList' parameter.

An index to write the documents to must be specified by either using the 'indexName' or the 'indexNameAttribute' parameter. The document id can be specified by parameters. If the id is not specified, it is automatically generated by the Elasticsearch server. A timestampName can optionally be specified for adding timestamps to the indexed document. This can help with time-based document queries. Once the data is outputted to Elasticsearch, the user can query the database and create custom graphs to display this data with graphing tools such as Grafana and Kibana.

Details on index creation
Behavior in a consistent region
Example for guaranteed processing with exactly-once semantics

Summary

Ports
This operator has 1 input port and 0 output ports.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 27 parameters.

Optional: appConfigName, bulkSize, connectionTimeout, documentAttribute, hostName, hostPort, idName, idNameAttribute, indexName, indexNameAttribute, maxConnectionIdleTime, nodeList, password, readTimeout, reconnectionPolicyCount, sslDebug, sslEnabled, sslTrustAllCertificates, sslTrustStore, sslTrustStorePassword, sslVerifyHostname, storeTimestamps, timestampName, timestampValueAttribute, typeName, typeNameAttribute, userName

Metrics
This operator reports 4 metrics.

Properties

Implementation
Java

Input Ports

Ports (0)

Port that ingests tuples which are converted to JSON format and than stored in Elasticsearch as documents. Each attribute in the input schema will become an document attribute, the name of the JSON attribute will be the name of the Streams tuple attribute, the value will be taken from the attributes value. Some attributes may get special treatment to serve as document id, index name or timestamp provider. See parameteters 'indexNameAttribute', 'idNameAttribute', 'typeNameAttribute', 'timeStampValueAttribute' for details. These attributes will not become document attributes in the indexed Elasticsearch document. The following SPL attribute types are suppoted by the operator: boolean,rstring,ustring,bstring, all float types, all int and uint types. If the input port schema uses other types, the operator will not start and emit an error message for the attribute with the unsupported type.

Properties

Parameters

This operator supports 27 parameters.

Optional: appConfigName, bulkSize, connectionTimeout, documentAttribute, hostName, hostPort, idName, idNameAttribute, indexName, indexNameAttribute, maxConnectionIdleTime, nodeList, password, readTimeout, reconnectionPolicyCount, sslDebug, sslEnabled, sslTrustAllCertificates, sslTrustStore, sslTrustStorePassword, sslVerifyHostname, storeTimestamps, timestampName, timestampValueAttribute, typeName, typeNameAttribute, userName

appConfigName

Specifies the name of the application configuration that contains Elasticsearch connection related configuration parameters. The keys in the application configuration have the same name as the operator parameters. The following keys are supported: userName, password, hostName, hostPort, nodeList, reconnectionPolicyCount, sslEnabled, sslDebug, sslTrustAllCertificates, sslVerifyHostname, sslTrustStore, sslTrustStorePassword. If a value is specified in the application configuration and as operator parameter, the application configuration parameter value takes precedence.

Properties
bulkSize

Specifies the size of the bulk to submit to Elasticsearch. The default value is 1. When operator is part of consistent region, this parameter is ignored.

Properties
connectionTimeout

The timeout for waiting on establishment of the TCP connection to the server node. Specified in milliseconds. The default value is 20000 (20 seconds). This parameter can be overwritten by the application configuration.

Properties
documentAttribute

Specifies the name of an attribute in the input tuple, containing the document in JSON format to be inserted to. The parameter 'storeTimestamps' must not be set in conjunction with the 'documentAttribute' parameter.

Properties
hostName

Specifies the hostname of the Elasticsearch server. The default is 'localhost'. If you specify a protocol prefix like 'http://' or 'https://' it is ignored, because the protocol is determined by the parameter 'sslEnabled'. If that is set to true, HTTPS is used, otherwise HTTP is used. This parameter can be overwritten by the application configuration. NOTE: this parameter is deprecated, use the 'nodeList' parameter instead.

Properties
hostPort

Specifies the REST port of the Elasticsearch server. The default port is 9200. This parameter can be overwritten by the application configuration. NOTE: this parameter is deprecated, use the 'nodeList' parameter instead.

Properties
idName

Specifies the name for the _id field of the document. If not specified, the _id field is auto-generated by the Elasticsearch server. This parameter is ignored if the 'idNameAttribute' parameter is specified.

Properties
idNameAttribute

Specifies the name of an attribute in the input tuple, containing the _id of the document to index. If neither this parameter nor the 'idName' parameter is set, the document _id field is auto-generated by the Elasticsearch server.

Properties
indexName

Specifies the name of the Elasticsearch index, the documents will be inserted to. If the index does not exist in the Elasticsearch server, it will be created by the server. However, you should create and configure indices by yourself before using them, to avoid automatic creation with properties that do not match the use case. For example unsuitable mapping or number of shards or replicas. This parameter will be ignored, if the 'indexNameAttribute' parameter is set.

Properties
indexNameAttribute

Specifies the name of an attribute in the input tuple, containing the index name to insert the document to. It is not recommended to use this parameter because all documents created by an instance of the operator will have the same structure, so it might not be very useful to insert these documents into different indices. If you need to insert documents with the same structure to different indices, this can always be achieved by using multiple 'Elasticsearchindex' operator instances with different 'indexName' parameters and a Split operator in front of them, to route the douments. This parameter might be removed in the future.

Properties
maxConnectionIdleTime

If the TCP connection to a server node is not used for that time, it is closed. Specified in milliseconds. The default value is 1500 (1.5 seconds). This parameter can be overwritten by the application configuration.

Properties
nodeList

Specifies a list of Elasticsearch nodes to use for operations. The nodes must be part of the same cluster. The format is a comma separated list of hostname:port entries. For example: 'host1:9200,host2:9200'. This parameter is ignored if one of the parameters hostName or hostPort are also specified. If none of these parameters are specified, the default nodelist is 'localhost:9200'. This parameter can be overwritten by the application configuration.

Properties
password

The password used for HTTP basic authentication. If parameter 'sslEnabled' is false, the password is transmitted in cleartext. This parameter can be overwritten by the application configuration.

Properties
readTimeout

The timeout for waiting for a REST response from the server node. Specified in milliseconds. The default value is 5000 (5 seconds). This parameter can be overwritten by the application configuration.

Properties
reconnectionPolicyCount

Specifies the number of reconnection attemps to th Elasticsearch server, upon disconnection. If more than one node is specified in the 'nodeList' parameter, all remaining nodes are tried immediately, before the reconnection count starts. If no node responds, the operator will wait for one second and try to reconnect to a node. During reconnection, nodes are tried in a round robin fashion.This parameter can be overwritten by the application configuration.

Properties
sslDebug

If SSL/TLS protocol debugging is enabled, all protocol data and information is logged to the console. Use this to debug TLS connection problems. The default is 'false'. This parameter can be overwritten by the application configuration.

Properties
sslEnabled

Indicate if SSL/TLS shall be used to connect to the nodes. The default is 'false'. This parameter can be overwritten by the application configuration.

Properties
sslTrustAllCertificates

If set to true, the SSL/TLS layer will not verify the server certificate chain. WARNING: this is unsecure and should only be used for debugging purposes. The default is 'false'. This parameter can be overwritten by the application configuration.

Properties
sslTrustStore

Specifies the name of a file containing trusted certificates. The format is the common Java truststore format, and you can use the JAVA keytool command to create and manage truststore files. Use this parameter if the Elasticsearch server certificate is signed by a CA that is not trusted per default with your current Java version, or uses a self-signed certificate. This parameter can be overwritten by the application configuration.

Properties
sslTrustStorePassword

Specify the password used to access the Truststore file, specified in the 'sslTrustStore' parameter. This parameter can be overwritten by the application configuration.

Properties
sslVerifyHostname

If set to false, the SSL/TLS layer will not verify the hostname in the server certificate against the actual name of the server host. WARNING: this is unsecure and should only be used for debugging purposes. The default is 'true'. This parameter can be overwritten by the application configuration.

Properties
storeTimestamps

Enables storing timestamps. If enabled, either the current time or the timestamp contained in an attribute of the input tuple. will be added to the document. The default value is 'false'.

Properties
timestampName

If parameter 'storeTimestamps' is true, this parameter specifies the name of the document attribute that will contain the timestamp. The timestamp is generated in the format 'yyyy-MM-ddTHH:mm:ss.SSSZZ' in Java SimpleDate notation.

Properties
timestampValueAttribute

If parameter 'storeTimestamps' is true, this parameter specifies an attribute of type int64 in the input tuple containing the timestamp value in Unix format with milliseconds. If the parameter is not specified, the current time is used as timestamp value.

Properties
typeName

Specifies the name of the mapping type for the document within the Elasticsearch index. If no type is specified the default type of '_doc' is used. This parameter will be ignored, if the 'typeNameAttribute' parameter is set. Because different mapping types for a single index are not allowed anymore in Elasticsearch version 6, and mapping types will be completely removed in ES7, it is recommended to not use this parameter. It might be removed in future versions of the toolkit.

Properties
typeNameAttribute

Specifies the name of an attribute in the input tuple, containing the type mapping of the document to index. As different types per index is not allowed in ES6 anymore, and types will be removed from ES7 completely, it is not recommended to use this parameter. It will be removed in future versions of the toolkit. See also the 'typeName' parameter.

Properties
userName

The username used for HTTP basic authentication. If parameter 'sslEnabled' is false, the username is transmitted in cleartext. This parameter can be overwritten by the application configuration.

Properties

Metrics

isConnected - Gauge

Describes whether we are currently connected to Elasticsearch server. This is set to 0 after all cluster nodes became unreachable and the maximum reconnection attempts were unsuccessful. Otherwise the value is 1.

numInserts - Counter

The number of times a record has been written to the Elasticsearch server.

reconnectionCount - Counter

The number of times the operator has tried reconnecting to the cluster since the last successful connection. If there are multiple nodes in the cluster, reconnecting to a different node in the cluster is not counted here.

totalFailedRequests - Counter

The number of failed inserts/gets over the lifetime of the operator.

Libraries

Operator class library
Library Path: ../../impl/lib/com.ibm.streamsx.elasticsearch.jar, ../../opt/downloaded/*