Handling exceptions

Archive Service has some built in exception handling capabilities that helps in reducing the chances of continuous failures during the order archival process.

As Archive Service is deployed outside Sterling Order Management System Software, you might encounter problems over time, for example Order Service might be down or unreachable. There might be Archive Service validation errors or Cassandra-specific problems as well, such as some database implementation error like issue in query syntax or database runtime errors like unavailable connection pool, and so on. These can result in failure of archival operations. This results in the following considerations:

  • Orders for which archival has failed need to be tracked and re-attempted.
  • Orders for which archival has failed must be reported so that you can take corrective actions, if needed.

Handling exceptions due to configuration issues

As explained in previous section, Order Service provides few configurations by using which you can choose to further slice the order’s direct subordinate entities to be archived as a separate part, and for each part you can define the data size to be stored in Archive Service as needed for your business.

Any issue with the configuration may lead to continuous failure in the archival process. To avoid such failures, the order archival process performs the following validations before attempting the archival of records staged in the OSI_AWAITING_ARCHIVE table for archival:
  • Properties Validation: Archive Service validates the properties that are configured before using the properties for archival processing. All the properties which are non-modifiable at runtime are validated before fetching any records for archival. In case of any incorrect configurations, you are notified to take the corrective actions. The properties that are modifiable at runtime are validated during the Archive Service process and the archival processing fails due to incorrect configuration. Any issues due to incorrect configuration are notified so that you can take appropriate actions as follows:
    • Properties which are non-modifiable at runtime, correct the property value and restart the agent server.
    • Properties which are modifiable at runtime, correct the property value. The agent pick ups the updated value in the next trigger. In this case, the agent restart is not required.
  • Connection validation:The archival process validates the connection with Order Service even before attempting the archival of the records staged for archival in the OSI_AWAITING_ARCHIVE table. Any failure due to invalid Order Service URL, Order Service may be down, or unreachable are reported to take the corrective action.

Handling runtime exceptions during Archive Service

Any runtime exception that occurs during the Archive Service process for a history order immediately stops further processing for the order and rollbacks transaction in Sterling Order Management System Software. Archival of the failed orders are reattempted in the next trigger of the archival processing. This ensures that no data is lost in Sterling Order Management System Software in case of failure during archival of any of the parts of the history order. The error details are logged and reported for the specific exception.

  • For any intermittent failures received from Archive Service, Archive Service stops processing of orders. Such orders are picked up in the next trigger of the archival.
  • For failures received from Archive Service that are not expected to occur for every orders and could be related to specific type of order which may require either the fix from product side or some correction to the Order data, the Archive Service agent updates the LAST_FAILED_DATE column to current date in the OSI_AWAITING_ARCHIVE table for the failed orders. This ensures that such records are not picked up again for processing for the next 30 days. This helps in reducing the change of continuous failure in the archival process.
  • If order archival fails due to order not present in Order Search, Archive Serviceprocessing inserts a record for that order in YFS_AWAITING_INDEX table. This record is picked up by the SSI_DELAYED_SYNC agent for indexing that order in Order Search. Archive Service does not remove the record from the OSI_AWAITING_ARCHIVE table for that order and this order is picked up for archival in next trigger of the agent.

Exception notification

Any exception that occurs due to validation failures or any failures during archival operation are notified.

The exceptions from the Archive Service agent are published by raising an alert or raising an event.

Raising alerts

Alerts are raised for any failure in order archival processing. The alerts raised by the Archive Service processing agent are like any other alerts raised for operational exceptions. All exceptions are logged with ExceptionType=’AGENTEXCEPTION’. They are created in the YFS_Inbox table.

Alerts raised due to property or connection validation failure, which is performed before fetching the jobs for archival are consolidated.

Alerts raised during archival failure for a history order are not consolidated and contain order-specific details

Raising events

The Archive Service agent raises ON_FAILURE event for any failures that require manual intervention. This event, if enabled, can publish the information as illustrated in the following template:

<OrderArchive FailureType="" FailureStatus="" FailureCode="" ErrorCode="" ErrorDescription="" ErrorRelatedMoreInfo="" OccuredOn="" Comments="">      
   <Order DocumentType="" EnterpriseCode="" OrderHeaderKey="" OrderNo="" Id="">            
      <Part Name="" />        
   </Order>     
   <ErrorReferences>
      <ErrorReference Name="" Value="" />         
   </ErrorReferences>
   <StackTrace/>
</OrderArchive>

This is configurable and the event template OSI_ORDER_ARCHIVE.ON_FAILURE.xml is present in the <INSTALL_DIR>/repository/xapi/template/merged/event directory.

Sterling Order Management System Software does not provide any default event handlers such as conditions, actions, or services to trigger a process when this event is raised.

You can define actions to publish the event output to the database, create an alert, send an email, and so forth, and define event handlers by providing conditions that determine the types of actions that are performed when this event is raised.

The following table describes the attributes published for the exceptions from the Archive Service agent.

Attributes Description
FailureType There can be three types of failures that can occur in Archive Service agent:
  • Configurational: Any failures due to incorrect property configuration.
  • Connection: Initial connection failures due to Archive Service is down, incorrect URL, or any other connectivity errors.
  • Transactional: Failures that occur during archival of a part of history order.
FailureStatus HTTP REST status received in failure response. This will not be published for configurational failures.
FailureCode A standard failure code received from Archive Service. This is published only if the agent is able to connect to Archive Service and receive a failure response.
ErrorCode The error code for the error.

You can view the description and cause of the errors, as well as the actions to troubleshoot them in Sterling Order Management System Software.

ErrorDescription Provides the error description of the error code.
ErrorRelateMoreInfo Provides the innermost cause of the failure.
OccuredOn Date on which failure occurred.
Comments Provides additional information about the scenario.
Order This element contains below attribute details of the history order for which attempt to archive failed.
  • OrderNo
  • DocumentType
  • EnterpriseCode
  • OrderHeaderKey
  • Id: unique order identifier in Archive Service.
  • Part/@Name: part name for a order part for which archival attempt failed.
ErrorReference Contains name-value pair to provide additional information about the error occurred. It includes any context specific information, if available.
  • Name: contains the name of the reference attribute.
  • Value: contains the value of the reference attribute.
StackTrace

The entire stack trace of the exception. This is not present in the application-provided template for the ON_FAILURE event. You can extend the template to publish stack trace in the output of the event.