High availability

The high availability mailboxing capability enables you to deploy a B2B platform that can minimize downtime plus offer disaster recovery capabilities.

Availability refers to the proportion of time during which a particular service or a system in Global Mailbox is operating at functional capacity. For instance, we consider the payload replication service available only when users can successfully upload new payloads to Global Mailbox. The metadata replication service, on the other hand, is available only when sufficient number of Cassandra instances are online to service read and write requests.

Active/Active availability is achieved by having multiple systems running parallel operations simultaneously. When one system becomes unresponsive, other components take the requests and users do not experience an outage.

The system maximizes availability to send or receive files given multiple Global Mailbox nodes and properly configured load balancers between them.

Operational characteristics related to data center outage and disaster recovery paths:
  • System re-synchronizes payload and metadata after an outage as soon as possible to re-establish data consistency
  • If the system cannot achieve immediate (synchronous) replication, the administrators can choose to run in delayed (asynchronous) replication mode while they correct the problems

The following failure scenarios and their recovery paths are mitigated by Global Mailbox:

  • Data center outage
    • Data center outage during mid-transfer to or from trading partner
    • Data center outage before acknowledgment received
  • Network connectivity Issues
    • System cannot connect to another data center
    • System cannot connect to metadata database
    • System cannot connect to payload storage
  • Read/Write issues
    • System cannot read requested metadata, mailbox or metadata does not exist
    • System cannot retrieve payload, metadata points to non-existent payload
    • System cannot write metadata
    • System cannot write payload
    • System does not receive write acknowledgment from payload store
    • System does not receive write acknowledgment from metadata database
      • Due to inability to achieve requested write consistency
      • Due to timeout or other network issues
    • Payload data is corrupted during write
  • Event framework issues
    • Application does not receive event notification from Global Mailbox
    • System cannot connect to message queue component
  • Sterling B2B Integrator issues
    • Sterling B2B Integrator node is down or unavailable
    • Sterling B2B Integrator cluster is down or unavailable
    • The data center of the Sterling B2B Integrator node that starts processing a message goes down before the processing is complete:
      • Global Mailbox supports replaying any event that happened in the past X hours.