Creating an index mapping

Index mapping is a process of defining how a document must be mapped in the search engine. For example, defining the data type and index characteristics of a field. A correct mapping is necessary to index fields and return results as expected.

Mapping allows you to define your fields of type text, numeric, date, boolean, an object type which groups multiple fields in it or an array of any of these types with the data types supported in the search engine. Along with the data type, mapping allows you to set the field characteristics such as searchable, returnable, sortable, how to store, date format, whether the strings must be treated as full text fields, and so on.

Order Service uses Elasticsearch as the search engine. In Elasticsearch, a document in the index contains fields of different data types. You can define explicit mapping at an index level.

This kind of auto-detection of data type and defaulting of indexing properties can result in indexing failures, inaccurate data, or unexpected index search results as illustrated in the following scenarios:

  • Suppose you do not specify explicit mapping for a field, and the input text is 2016-01-26, which resembles a date but is not intended to be a date in the first document to index. Elasticsearch interprets and sets the type as date. If the value of this field in the next document is specified as 2016 01 27, spaces instead of dashes or 20160127 without dashes, Elasticsearch returns an exception indicating that the input text does not conform to the date format, and indexing fails. If, however, 2016-01-26 was specified as a string type in a mapping file, then Elasticsearch would not interpret 2016-01-26 as a date and the index failure would not have occurred.
  • Suppose no explicit mapping is assigned for a string field, Elasticsearch interprets it as a text field and assigns the characteristics to analyzed. The analyzed specification parses the input data by using the built-in standard analyzer that tokenizes the data with a space as a delimiter, converts each of the tokens to lowercase, and stores the data. This may not be how you want to parse your data, resulting in an unexpected or inaccurate search results or no results at all. The ‘text’ data type is intended to support case-insensitive search and search by a part of the actual data. The Sterling™ Order Management System Software APIs do not support search by a part of the field value. So, if you want to get the Sterling Order Management System Software API behavior you must define the field data type as ‘keyword’.
  • Elasticsearch results have two parts - Hits and Aggregations. Hits return the original documents indexed. But, in the case of term aggregations, results are derived from searchable terms. For a keyword field, a searchable term is same as the original value, whereas for a text field it is any one of the parts derived from the original value based on the analyzer. Each of part value becomes a candidate for aggregation and that leads to incorrect figures. So, for any string field which you plan to put aggregation in future when the Order Search supports it must have keyword type.

Because of these reasons, Sterling Order Management System Software recommends that you define explicit mapping. To index and search the order documents, a pre-defined schema consisting of mappings and settings to be created. This documentation helps in generating the schema as per the business requirements.

Field types

You can configure the data type of a field such as string or boolean and its intent by using the type attribute. For example, "type": "boolean". The common data types supported by Order Search are as follows:

  • String - You can save a string field as of type, text or keyword. The text type is used to index full-text values, such as the description of a product. These fields are analyzed by an analyzer to convert the string into a list of individual terms before being indexed. The text fields are best suited for unstructured but human-readable contents. The keyword field type is used for structured content such as IDs, email addresses, hostnames, status codes, or zip codes, and the entire content of the field is indexed and searched as one unit.
  • Numeric – You can use the numeric field types to define fields that are holding numeric data. The various numeric field types supported includes long, integer, short, byte, double, float.
  • Date – A field to hold a date type can be defined using the date type. This field can hold formatted date strings.
  • Boolean – This field accepts the JSON values true and false. But, can also accept strings that are interpreted as either true or false.
  • Object – You can use this field type for fields consisting of JSON objects, which can contain subfields.
  • Arrays - This is a nested field type can be used for arrays of objects to be indexed in a way that they can be queried independently of each other.

Characteristics

You can group the field types under one or more of the following characteristics:

  • Searchable - A searchable field is one which is indexed and the document containing the field can be searched and retrieved by the value of the field. The behavior of a searchable field varies based on whether the field is defined as analyzed or non-analyzed.
  • Returnable - A returnable field is one which is stored and the field value can be returned as part of the search response.
  • Sortable - A sortable field is one, based on which the search results can be sorted in a particular order, either desc or asc. The search results can be ordered by one or more sortable fields.

Default behavior

The following table describes he default behavior of the various field types:

Field Type Searchable Analyzed Returnable Sortable
Text Yes Yes No No
Keyword Yes No No Yes
Numeric Yes No No Yes
Boolean Yes No No Yes
Date Yes No No Yes
Note: The individual fields of an object or nested field can be searchable, analysed, returnable or sortable based on the type and mapping parameters of these fields. In order to make a sub-field of a nested, returnable, or sortable field, to set the parameter, include_in_parent=true.

Mapping

The process of defining mappings is made flexible to suit your business needs. Mappings consist of a set of searchable and returnable fields such as shopper's name, phone number, address, email, item information, and other order-related information to feed an e-commerce application.

In Order Search, you must specify the mapping type for each of the fields in the index document, create a JSON comprising of all the fields, and pass it to the createSearchIndex API to create an index in Elasticsearch based order-search repository. You can define a particular field as searchable and returnable in the schema as shown in the following example.

"BuyerUserId":{
   "type" : "keyword",
   "index" : true,
   "store" : true
}

Here, the BuyerUserId field is defined as of type of keyword that is searchable by specifying index:true and returnable by specifying store:true. You can search by the BuyerUserIdfield and retrieve it as part of the response.

The following sample explains the mapping:

{
    "index":{
       "id":"order",
       "mappings":{
          "BillToID":{
             "type":"keyword",
             "index":true,
             "store":true
          },
          "OrderDate":{
             "type":"date",
             "index":true,
             "store":true
          },
          "OrderName":{
             "type":"text",
             "index":true,
             "store":true,
             "analyzer":"whitespace"
          },
          "OriginalTotalAmount":{
             "type":"double",
             "index":true,
             "store":true
          },
          "PersonInfoBillTo":{
             "type":"object",
             "properties":{
                "AddressLine1":{
                   "type":"text",
                   "index":true
                },
                "City":{
                   "type":"keyword",
                   "index":true,
                   "store":true,
                   "fields":{
                      "asText":{
                         "type":"text",
                         "index":true
                      }
                   }
                },
                "EMailID":{
                   "type":"keyword",
                   "index":true,
                   "store":true
                }
             }
          },
          "OrderLine":{
             "type":"nested",
             "properties":{
                "ShipNode":{
                   "type":"keyword",
                   "index":true,
                   "store":true
                },
                "OrderedQty":{
                   "type":"double",
                   "index":false,
                   "store":true
                },
                "ItemId":{
                   "type":"keyword",
                   "index":true,
                   "store":true
                }
             }
          }
       }
    }
 }
Note: In the mapping, several fields are specified with index:true and store:true. This indicates that you can search such fields and also retrieve in the search result. The field City is defined with a keyword type but duplicated to City.asText’, which is defined as text type. This helps to perform exact match of the city name or a case insensitive match of any parts of a city name. The OrderLine.OrderedQty field is mapped as index:false and store:true. This means that you do not want to search orders based on the order line level quantity.
By default, the following fields are added to the mapping:
  • orderId : { orderNo, documentType, enterpriseCode, id } -

    All the sub fields of orderId are searchable and returnable. These fields help you in identifying an order document uniquely.

  • isHistory – Is a returnable field.

To map a field as non-searchable set index:false in the mapping specification. Similarly, by setting store:true, a field can be made returnable.

By default, the fields with text type are not sortable. Overriding this impacts performance, and therefore, not recommended. Instead, use the keyword data type.

The fields with other data types are sortable by default that can be made non-searchable by setting doc_values:false.

For more information about the various field data types that are available in Elasticsearch, see the Elasticsearch documentation.

Settings

To customize the index behavior, Order Search supports several configurable settings that are provided by Elasticsearch. The settings are classified as dynamic and static based on whether they can be modified after the index is created. The static settings can be set only at index creation time and cannot be modified after that. The dynamic settings can be modified on a live index as well.

Note: The restriction on these settings is that you can configure the static settings only while creating an index. Once the index is created, you cannot update the index with the new static setting or change the existing ones.

For the list of settings provided by Elasticsearch, see the Elasticsearch documentation.

Best practices

The field mappings and the static settings of an index cannot be modified once created. Hence, it is recommended to assess the search requirements and identify the data type of the order outline data available from Sterling Order Management System Software while mapping the index fields. Before creating the index, you must design the mappings and settings with these considerations.

The common mistakes are around the string and object handling. A ‘String’ data has two variants as mentioned, text and keyword. You must select the right mapping based on your search requirements. The order-related data from the subordinate entities in Sterling Order Management System Software appear in the order outline document as JSON Objects. Some of these entities have one-to-one relationship with the order. Whereas, the others do have one-to-many relationship. Elasticsearch supports two variants for JSON object. They are object and nested. Use the object type for entities that have one-to-one relationship such as PersonInfoBillTo provided in the sample mapping. But for orderLine, the type must be ‘nested’

A mapping specification is irrevocable, and therefore, if any incorrect mapping is applied there is no way to rectify that. You have to contact your Elasticsearch administrator to dispose the existing index and start the process of mapping and index creation in Order Search and then migrate your order data again from Sterling Order Management System Software. But for history order in Sterling Order Management System Software, it might not be possible to run the migration process again.

Adding fields to an existing index mapping is not a problem as Elasticsearch allows that. You can plan such additions required to meet your search or business requirements and make changes to the index by using the updateSearchIndex API. All new orders and the existing orders in the index, which are getting modified are indexed with the new fields. But, if you need the new fields also for the old orders, run the migration process from Sterling Order Management System Software again.