Sterling search index - exception handling

The Sterling Search Index has some built-in exception handling capabilities.

As Elasticsearch is a third-party system, a number of different problems can arise over time. For example, the Elasticsearch servers may be down or unreachable. There may be Elasticsearch specific problems as well, such as some of the index shards may be unavailable, or the node ‘master’ may not have been elected yet. These can result in search and/or index operations failing. This brings up the following considerations:

  • Since the search server is used primarily for optimization, its unavailability, or index/search failures should not have any functional impact. For this reason, Sterling Search Index is made robust enough to work around these problems to provide seamless business functionality.
  • Failed/lost index updates need to be tracked and re-attempted.
  • Failed index updates should be reported to you, so that you can take corrective actions, if needed.

Handling failed/lost index updates

As briefly explained in the previous section, YFS_Awaiting_Index stores a reference to each and every indexing attempt. The following scenarios arise:

  1. JVM crash: Since the index-update operation is done asynchronously through the PeriodicBatchIndexer service, if the JVM crashes or is forcibly killed, some of the updates will be lost.
  2. Intermittent indexing failures: Indexing operation fails intermittently due to problems with the search server or network and so forth.
  3. Consistent indexing failures: Due to some issue, indexing operation fails continuously and reproducibly.

In cases 2 or 3 above, the exception is notified to you. Depending upon the problem, you may need to take a corrective action to fix the problem. In any of these cases though, records in YFS_Awaiting_Index will track which entity record needs to be re-indexed. This re-indexing is performed by an automatically triggered SSI_DELAYED_SYNC agent, periodically.

Exception notification

Any exception that occurs during an index/search operation needs to be notified to you. Other than the exception scenarios discussed above, the Sterling Search Index has other preemptive mechanisms as well. For example, upon continuous index/search failures, it will disable the corresponding operation altogether until corrective actions are taken. Further, if any change has been made to the index configuration, the Sterling Search Index will automatically disable searching from the index, because the index may potentially be in an inconsistent state. Such decisions, taken automatically by the Sterling Search Index, may require you to take a corrective action. In the first example, you need to fix the underlying problem and re-enable the search/index operation. In the second case, you need to evaluate if the SSI_MASS_SYNC agent needs to be run, run it if needed, and mark the index to now be in a Synchronized state. Therefore, these messages need to be notified as well.

Such exceptions/warnings can be broadly categorized as follows:

  • Index status messages
  • Index operational warnings
  • Unexpected errors
  • Connectivity issues
Note: Some of the exceptions (mainly the exceptions that occur during the index build process) are logged against the name of either IndexManager or PeriodicBatchIndexer. This is because the indexing operations are not performed by the user who created or modified the order or shipment in the system, but are triggered from Java classes with the above names in a separate thread in the JVM.

The exceptions/warnings from the Sterling Search Index are published in two different ways:

  • By raising an alert
  • By raising an event

Raising alerts

The alerts raised from the Sterling Search Index are similar to any other alerts raised for operational exceptions. All exceptions will be logged with ExceptionType=’IndexException’. Warnings will be logged with ExceptionType=’Warning_Message’. They are created in the YFS_Inbox table in the DEFAULT colony, or in any other colony depending upon the transactional context.

These alerts are neither directed to a user nor any alert queue, and they are not consolidated.

You can list the alerts in the Alert Console by specifying Alert Type as IndexException or ‘Warning_Message’. These alerts may not be enterprise specific, therefore it is best to avoid ‘EnterpriseCode’ as a search criteria.

Raising events

The operational exceptions and status transitions from the Sterling Search Index can be published by enabling the ON_FAILURE event defined for the transaction SSI_INDEX_NOTIFICATION. You can find this transaction under the General Process Type Repository.

This event, if enabled, can publish the information shown in the following template.

<Index IndexName="" EnterpriseCode="" SearchWorking="" IndexWorking="" ErrorCode="" 
   ErrorDescription="" Reason="" Comments="" ReferenceName="" ReferenceValue="" SearchCriteria="">
   <StackTrace/>
</Index>

This is configurable, and the event template SSI_INDEX_NOTIFICATION.ON_FAILURE.xml is present in the INSTALL_DIR/repository/xapi/template/merged/event directory.

Sterling™ Order Management System Software does not provide any event handlers (conditions, actions, or services) to trigger a process when this event is raised. However, it provides the following condition builder attributes that help to create a condition for the data published by the event.

Condition builder attributes Use this attribute to create a condition
EnterpriseCode For a specific EnterpriseCode
IndexName For a specific IndexName
IsSearchWorking To track the transition of the SearchWorking attribute
IsIndexingWorking To track the transition of the IndexWorking attribute

Use the {Enter Your Own Attribute} facility to customize condition builder attributes for the other attributes published by the event.

You can define actions to publish the event output to the database, create an alert, send an email and so forth, and define event handlers by providing conditions that determine the types of actions that are performed when this event is raised.

The following table describes the attributes published for the exceptions from the Sterling Search Index.

Attributes Description
IndexName The name of the Index (for example, Order or Shipment).
EnterpriseCode The EnterpriseCode published if it is available when an exception occurred.
SearchWorking Displaying this as N indicates that SearchWorking is flipped to N by the system and no further searching will be done on the index.
IndexWorking Displaying this as N indicates that IndexWorking is flipped to N by the system and no further indexing will be done on the index.
ErrorCode The error code for the error.

You can view the description and cause of these errors raised in Sterling Order Management System Software, as well as the actions to troubleshoot them. To view the Sterling Order Management System Software system error descriptions:

  1. From the menu bar in the Applications Manager, choose Help > Troubleshooting. The Error Search window displays.
  2. Enter the applicable criteria and search for error codes. A list of error codes and their descriptions display.
  3. Choose to view the cause of the error and the action to troubleshoot it.
ErrorDescription The error description of the error code.
Reason In case the error is thrown from the Search Index Server, this field provides the innermost cause of the failure.

In case of notification, this field provides the notification message.

Comments Provides additional information about the scenario, whether an exception scenario or a notify scenario.
ReferenceName This field also provides additional information. Usually, it contains the name of the reference entity (for example, YFS_ORDER_HEADER, YFS_SHIPMENT, or YFS_AWAITING_INDEX).
ReferenceValue This field also provides additional information. Usually, it contains the value of the reference entityKey (for example, value of OrderHeaderKey, ShipmentKey or AwaitingIndexKey). It indicates that an exception occurred when processing the corresponding key.
SearchCriteria The search criteria passed to the Search Index Server. This is a JSON document for Elasticsearch.
StackTrace The entire stack trace of the exception. This is not present in the out-of-the-box template for the ON_FAILURE event. You can extend the template to publish stack trace also in the output of the event.