Sterling search index - exception handling
The Sterling Search Index has some built-in exception handling capabilities.
As Elasticsearch is a third-party system, a number of different problems can arise over time. For example, the Elasticsearch servers may be down or unreachable. There may be Elasticsearch specific problems as well, such as some of the index shards may be unavailable, or the node ‘master’ may not have been elected yet. These can result in search and/or index operations failing. This brings up the following considerations:
- Since the search server is used primarily for optimization, its unavailability, or index/search failures should not have any functional impact. For this reason, Sterling Search Index is made robust enough to work around these problems to provide seamless business functionality.
- Failed/lost index updates need to be tracked and re-attempted.
- Failed index updates should be reported to you, so that you can take corrective actions, if needed.
Handling failed/lost index updates
As briefly explained in the previous section, YFS_Awaiting_Index stores a reference to each and every indexing attempt. The following scenarios arise:
- JVM crash: Since the index-update operation is done asynchronously through the PeriodicBatchIndexer service, if the JVM crashes or is forcibly killed, some of the updates will be lost.
- Intermittent indexing failures: Indexing operation fails intermittently due to problems with the search server or network and so forth.
- Consistent indexing failures: Due to some issue, indexing operation fails continuously and reproducibly.
In cases 2 or 3 above, the exception is notified to you. Depending upon the problem, you may need to take a corrective action to fix the problem. In any of these cases though, records in YFS_Awaiting_Index will track which entity record needs to be re-indexed. This re-indexing is performed by an automatically triggered SSI_DELAYED_SYNC agent, periodically.
Exception notification
Any exception that occurs during an index/search operation needs to be notified to you. Other than the exception scenarios discussed above, the Sterling Search Index has other preemptive mechanisms as well. For example, upon continuous index/search failures, it will disable the corresponding operation altogether until corrective actions are taken. Further, if any change has been made to the index configuration, the Sterling Search Index will automatically disable searching from the index, because the index may potentially be in an inconsistent state. Such decisions, taken automatically by the Sterling Search Index, may require you to take a corrective action. In the first example, you need to fix the underlying problem and re-enable the search/index operation. In the second case, you need to evaluate if the SSI_MASS_SYNC agent needs to be run, run it if needed, and mark the index to now be in a Synchronized state. Therefore, these messages need to be notified as well.
Such exceptions/warnings can be broadly categorized as follows:
- Index status messages
- Index operational warnings
- Unexpected errors
- Connectivity issues
The exceptions/warnings from the Sterling Search Index are published in two different ways:
- By raising an alert
- By raising an event
Raising alerts
The alerts raised from the Sterling Search Index are similar to any other alerts raised for operational exceptions. All exceptions will be logged with ExceptionType=’IndexException’. Warnings will be logged with ExceptionType=’Warning_Message’. They are created in the YFS_Inbox table in the DEFAULT colony, or in any other colony depending upon the transactional context.
These alerts are neither directed to a user nor any alert queue, and they are not consolidated.
You can list the alerts in the Alert Console by specifying Alert Type as IndexException or ‘Warning_Message’. These alerts may not be enterprise specific, therefore it is best to avoid ‘EnterpriseCode’ as a search criteria.
Raising events
The operational exceptions and status transitions from the Sterling Search Index can be published by enabling the ON_FAILURE event defined for the transaction SSI_INDEX_NOTIFICATION. You can find this transaction under the General Process Type Repository.
This event, if enabled, can publish the information shown in the following template.
<Index IndexName="" EnterpriseCode="" SearchWorking="" IndexWorking="" ErrorCode=""
ErrorDescription="" Reason="" Comments="" ReferenceName="" ReferenceValue="" SearchCriteria="">
<StackTrace/>
</Index>This is configurable, and the event template SSI_INDEX_NOTIFICATION.ON_FAILURE.xml is present in the INSTALL_DIR/repository/xapi/template/merged/event directory.
Sterling™ Order Management System Software does not provide any event handlers (conditions, actions, or services) to trigger a process when this event is raised. However, it provides the following condition builder attributes that help to create a condition for the data published by the event.
| Condition builder attributes | Use this attribute to create a condition |
|---|---|
| EnterpriseCode | For a specific EnterpriseCode |
| IndexName | For a specific IndexName |
| IsSearchWorking | To track the transition of the SearchWorking attribute |
| IsIndexingWorking | To track the transition of the IndexWorking attribute |
Use the {Enter Your Own Attribute} facility to customize condition builder attributes for the other attributes published by the event.
You can define actions to publish the event output to the database, create an alert, send an email and so forth, and define event handlers by providing conditions that determine the types of actions that are performed when this event is raised.
The following table describes the attributes published for the exceptions from the Sterling Search Index.
| Attributes | Description |
|---|---|
| IndexName | The name of the Index (for example, Order or Shipment). |
| EnterpriseCode | The EnterpriseCode published if it is available when an exception occurred. |
| SearchWorking | Displaying this as N indicates that SearchWorking is flipped to N by the system and no further searching will be done on the index. |
| IndexWorking | Displaying this as N indicates that IndexWorking is flipped to N by the system and no further indexing will be done on the index. |
| ErrorCode | The error code for the error. You can view the description and cause of these errors raised in Sterling Order Management System Software, as well as the actions to troubleshoot them. To view the Sterling Order Management System Software system error descriptions:
|
| ErrorDescription | The error description of the error code. |
| Reason | In case the error is thrown from the Search
Index Server, this field provides the innermost cause of the failure. In case of notification, this field provides the notification message. |
| Comments | Provides additional information about the scenario, whether an exception scenario or a notify scenario. |
| ReferenceName | This field also provides additional information. Usually, it contains the name of the reference entity (for example, YFS_ORDER_HEADER, YFS_SHIPMENT, or YFS_AWAITING_INDEX). |
| ReferenceValue | This field also provides additional information. Usually, it contains the value of the reference entityKey (for example, value of OrderHeaderKey, ShipmentKey or AwaitingIndexKey). It indicates that an exception occurred when processing the corresponding key. |
| SearchCriteria | The search criteria passed to the Search Index Server. This is a JSON document for Elasticsearch. |
| StackTrace | The entire stack trace of the exception. This is not present in the out-of-the-box template for the ON_FAILURE event. You can extend the template to publish stack trace also in the output of the event. |