Introducing the search index server

You can review the situations and scenarios in which a search index server might meet your business needs.

Overview - indexing and searching of distributed transactional data

IBM® Sterling™ Order Management System Software supports deployments with multiple database shards for storing transactional data. The transactional data (for example, order or shipment) that is owned by an enterprise can be present in multiple database shards. Such shards are identified by ColonyIds. Consider the following scenarios.

Let’s say that, for an enterprise, the orders placed online and through different stores are stored in different database shards. A user who purchased an item online may want to return the same item, but to a store. Similarly, a user may want to get a list of his orders, regardless of the store in which he placed the orders. The application running at the store needs visibility to the orders that are present on both the online shard and the other store shards. This visibility is required during returns when, for example, a customer is at a store and wants to return items that were purchased either online or at another store. These kind of operations are very common in a cross-channel selling environment, and visibility of orders across channels is very important. As the transactional data for an enterprise is present in more than one transaction shard, by specifying an EnterpriseCode in the input, the API should be able to identify all of the colonies where the data for this enterprise is stored – in this case, the online shard, as well as all the store shards. Another example is a customer who wants to see all of his orders, and a search for orders using attributes such as a person’s name, billToID, phoneNo, and/or itemID should be able to identify the colonies where the matching orders are stored.

In all such cases, for any given criteria to search for orders, we need to do that search in all shards that can possibly hold the data requested. For example, if the search criteria identifies the EnterpriseCode, we need to do that search in all shards for that enterprise. If the search criteria identifies a store, we need to do the search in the store’s shard. If only the customer’s ’First Name’ and/or ’Phone No’ is given, we need to do the search on all shards.

In certain scenarios, the above search can cause considerable performance impact. For example, if an enterprise has 20 stores spread across four regions, and if sharding is done by region such that all stores in a region share the same shard, and a given customer stays at one region, it is most likely to find that customer’s order in that region’s shard. Now, if the search criteria is the customer’s ’Phone No’, that search will have to be done on all four shards, because there is no way of knowing where to expect that customer’s orders. This problem can get particularly accentuated in multitenant scenarios, such as in cases of hosted deployments – if the EnterpriseCode to search within is not known, the search will have to be performed across enterprises. This can result in the search to be done over several shards.

One way to optimize our approach to solving this multi-shard problem is to store the commonly used searchable attributes of an order and its related entities, such as information about line items, addresses and payment methods, in a central location along with the colonyId, which is an identifier to the physical location of the database shard where the order is present. By querying the central location with the searchable attributes to get the colonyIds corresponding to the orders matching the criteria, the system can search all the relevant shards to get the complete information for the enterprise or customer. In the case of the previous example, a search can, now, firstly be done on this central location with the customer’s ’Phone No’. This search will return with the colonyId of the region’s shard discussed above, because that is where this customer’s orders reside. Now, all orders having this customer’s phone number will have to be retrieved only from this shard, thus skipping this search from the other three irrelevant shards.

Sterling Order Management System Software uses Elasticsearch version 1.7.1 as the indexing and search solution, which is used as the ’central location’ described above. Elasticsearch is an Apache Lucene based indexing solution that provides an indexing facility for transactional data spread across multiple database shards, and has near real-time availability, high scalability, and allows for fast retrieval of data. The applicable shards are determined by querying the indexing solution, and the transactional data is retrieved by invoking the API on all the shards returned from the indexing solution.

The concept here is that when an order is created or updated in the system, it is also indexed in Elasticsearch with searchable attributes from tables such as Order Header, Order Line, and Payment, along with the entity’s colonyId. When a user searches for an order by giving a criteria that contains these attributes, the application queries Elasticsearch, and the latter returns all the applicable colonies that match the criteria. The API is then invoked on each of the applicable colonies, along with the same search criteria that was provided by the user. The results from all the transactional shards are then aggregated, sorted, and paginated to provide the user with the results for his search.

The indexing solution can be deployed as a server, or a cluster of servers, and performs the following functions:

Builds the index - Takes a set of attributes of an entity, along with the entity’s colonyId, and adds them to the index.
Searches the index - Takes a set of attributes of an entity or from an API’s input and returns the colonies that match the entity.

Indexing solution - Elasticsearch

Elasticsearch, which is the indexing solution supported by Sterling Order Management System Software, is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene.

The advantages of using Elasticsearch include the following:

The setup of Elasticsearch is fast and easy.
You can index data by simply using JSON over HTTP.
It is always available. You can start with one machine, and then scale to hundreds of machines as your business grows.
It delivers virtually real-time search results.
All of the capabilities of Lucene are easily exposed through simple configurations and plug-ins.
Each index is fully sharded with a configurable number of shards. Note that the ’shard’ being referred to here is that of the index, not of the database.
Each shard can have zero or more replica, and replica can be easily added.
An instance can support more than one index and more than one type of index.
Configurations can be set at the index level.
A search can be performed across indices in one operation.
There is no need for any upfront shard definition, although it can be defined per type for indexing performance and customizations.
All operations can be executed using a Java client object, and these operations can be accumulated and executed in bulk.

Elasticsearch can be started from more than one server. Each of these servers, called nodes, is configured with a unique node name, and is associated with a common logical name called cluster name. All of the nodes that belong to a cluster must be located on the same network.

Each of these nodes interact with eachother for rebalancing and routing, as required. When indexing occurs, data is stored in one of the nodes. If replica is also configured, then the data is replicated to the applicable number of nodes as well. When a search request arrives, the Elasticsearch server fetches data from these nodes and returns the aggregated results.

For more detailed information about Elasticsearch, refer to the Elasticsearch documentation.