Generating JSON mapping files

Index mapping is the process of defining how a document should be mapped in the search engine; for example, defining it's data type and defining it's searchable characteristics, such as which fields are searchable, returnable, and if/how they are tokenized. A correct mapping is absolutely necessary to index them correctly and return the results that are expected.

IBM® Sterling Order Management System Software supports Elasticsearch as the search engine.

In Elasticsearch, a document in the index contains fields of different types (for example, string, number, and date), and an explicit mapping can be defined on an index level.

In Sterling Order Management System Software, an index is created on Elasticsearch when a document is submitted for indexing for the first time. You need to define these mappings and generate mapping files for the indexes defined. If mapping is not present or is incorrect, it will lead to unexpected and inaccurate results or no result at all. This is because, if you fail to define explicit mapping, Elasticsearch uses the default mapping properties. That is, Elasticsearch will first parse the input text to determine its type; for example, whether it’s an integer number, decimal number, date, boolean, or string. After Elasticsearch determines the type, it will set the default indexing properties for that type; for example, for a string type, the default indexing properties are store=true and index=analyzed.

This kind of auto-detection of data type and defaulting of indexing properties can result in indexing failures or inaccurate or unexpected index search results, as illustrated in the following examples:

  • Suppose no explicit mapping is specified for a field and the input text is 2016-01-26 (which resembles a date, but is not intended to be a date) in the first document to index. Elasticsearch interprets and sets this as the type date. If the value of this field in the next document is 2016 01 27 (spaces instead of dashes) or 20160127 (no dashes), Elasticsearch returns an exception indicating that the input text does not conform to the date format, and indexing fails. If, however, 2016-01-26 was specified as a string type in a mapping file, then Elasticsearch would not interpret 2016-01-26 as a date and the index failure would not have occurred.
  • Suppose no explicit mapping is assigned for the field ColonyId, Elasticsearch interprets it as a string and assigns the property index=analyzed. The analyzed specification parses the input data using the analyzer specified. Assume no custom analyzer is given and, as a result, Elasticsearch will use the built-in standard analyzer that tokenizes the data with a space as a delimiter, converts each of the tokens to lowercase, and stores the data. For example, the ColonyIds "Colony1" will be stored as "colony1", and "Store1 Colony" will be stored as "store1" and "colony" (two tokens). This may not necessarily be how you want to parse your data, resulting in unexpected or inaccurate search results or no results at all. The keyword analyzed is intended to support case-insensitive search and search by a part of the actual data. The Sterling Order Management System Software APIs do not support search by a part of the field value. The colony identification search also does not support partial data search.
  • Elasticsearch results have two parts - Hits and Aggregations. Hits return the original documents indexed. In the previous scenario, this is "Colony1" and "Store1 Colony". But in the case of term aggregations, results are derived from searchable terms. For a not_analyzed field, a searchable term is the same as the original value, whereas for an analyzed field it is any one of the tokens derived from the original value using the analyzer; that is, "colony1" or "store1" or "colony". Sterling Order Management System Software depends on the aggregations alone for shard identification search and, therefore, index=analyzed is strictly no for the field ColonyId.

Important: Because of these reasons, Sterling Order Management System Software recommends that you define explicit mapping.

Sterling Order Management System Software provides a tool, generateIndexFieldMapping, to generate a mapping file for defined indexes. You can specify various mapping specifications and it's default values in the property file, elasticsearch.properties, and these settings will be applied on each of the indexable fields when generating the mapping XML file.

The indexable fields are listed in the indexConfigProperties.xml file. The generateIndexFieldMapping tool, when run on generateXML mode, updates this XML file with the latest definition for indexes and the mapping specifications given in the elasticsearch.properties file. Whenever you modify the index definition, you need to run entitydeployer, and then run this tool to update this XML with the latest index definition.

The generateIndexFieldMapping tool enables the following modes:

  1. generateXML - Running the tool in this mode reads the entity definition and the mapping type specifications in elasticsearch.properties and generates the indexConfigProperties.xml file. The generated file is located at INSTALL_DIR\repository\xapi\template\merged\resource\indexconfig\elasticsearch\indexConfigProperties.xml. You must manually modify this XML file to specify the data type, because the default value defined for the type specification in the property file is blank. Please note that for each field, the specifications given in elasticsearch.properties are considered only when that type is not specified or it's value is void in the XML file. So any subsequent runs to generate indexConfigProperties.xml will never reset the values that are already present in the XML file.
  2. validate - This mode validates where mapping specification is given for all the fields in the indexConfigProperties.xml file. It provides a list of indexable fields that have missing mapping values.
  3. generateMappingFile - This tool validates the indexConfigProperties.xml file and generates JSON mapping files for each index for use by Elasticsearch. The mapping files that are generated are located at INSTALL_DIR\repository\indexconfig\elasticsearch\mappings\<index_name>\<mapping_file>.

You must copy the contents of INSTALL_DIR\repository\indexconfig\elasticsearch\mappings\ to ELASTICSEARCH_HOME\config\mappings, where ELASTICSEARCH_HOME is the directory in which the Elasticsearch server instance has been copied.

Important: The mapping JSON files are read by the Elasticsearch Server only and, therefore, it is very important that these files are placed under the Elasticsearch deployment directory.