Disaster recovery
The Pre-Production environment is sized to house both the regular Pre-Production Sterling Order Management instance (for performance and production support tasks) and the dormant DR instance (as a contingency). Both instances share virtual machines for web, application, and database resources. During a declared disaster, IBM quarantines your Pre-Production environment, which is fully devoted to restore the business continuity. The disaster recovery architecture includes all the servers, network, scripts, and databases that are involved in the data backup and switches between Pre-Production and Production environments, as needed. The Pre-Production data center is housed within a different IBM SoftLayer® data center than your Production environment data center, usually in a different city or geographical location.
As part of the disaster recovery plan, operational and transactional data within your Production environment, such as orders, are periodically replicated throughout the day and backed up to the disaster recovery instance. Your Production environment web and your application data are backed up hourly to the disaster recovery instance. The application data includes file system artifacts, such as CSS, images, static content, and SaaS extension artifacts. IBM also backs up key environment and site data daily, such as infrastructure and configuration data, extensions, and files. Backups of your Production environment databases are also completed daily. Local backups, which can be used for small scale recovery events, are also completed and moved to a remote storage location. Transaction logs are maintained in both, your live and disaster recovery data centers.
Production environment data, including web and application data, is replicated and backed up through a private IBM SoftLayer network between your Production and Pre-Production environments. Your disaster recovery databases, which are always maintained at a near-ready state, use this network to replicate data in a near-synchronous mode by using high availability disaster recover (HADR) option.
Service level objectives (SLO)
- The service level objective (SLO) for business continuation that is offered for IBM Sterling Order Management is 4 hours for Recovery Point Objective (RPO) and 8 hours for Recovery Time Objective (RTO).
- Additionally, if you purchase options for the SLO improvement, the expected RPO is 2 hours and RTO is within 4 hours.
Process
During an identified disaster, the following steps are completed as part of the disaster recovery process:- In the unlikely event that your Production environment or primary data center experiences a severe problem, which, after investigation, is deemed irreversible, IBM declares that a disaster occurred. IBM then begins implementing the disaster recovery process.
- IBM issues an alert to you and to any other relevant parties, such as your business partners, if you are using a business partner to support your services.
- IBM activates the disaster recovery process to switch your Pre-Production environment into a
temporary Production environment. When your Pre-Production environment is being used as a temporary
Production environment, the Pre-Production environment is not available. When your normal Production
environment is restored, your Pre-Production environment becomes available again.
As part of this activation, IBM activates the disaster recovery application servers on your backed-up production code base. IBM also validates that the network file systems for your site are mounted and available.
To make your site available for users, IBM deactivates the Production environment web servers within the global load balancers. IBM then activates the disaster recovery web servers within the global load balancers. When this switch is completed, IBM notifies you that your site is available.
- You and your business partners can conduct disaster recovery simulation exercises and tests. To
determine how to best test your disaster recovery process, work with IBM to create your test plan
and complete your testing. Verify that your site functions and settings work on your active disaster recovery instance site. Complete the following tasks:
- (IBM Sterling Order Management) Access the application UI including any channel applications.
- Test the data integrity of your disaster recovery database. Request disaster recovery database queries to help validate the data integrity.
- Test your network Telnet protocol to confirm the network path.
- Complete an order transaction process through your store to your on-premises backend systems that return data or confirmation of the process to your store. Ensure that you thoroughly plan this transaction process as it can insert unwanted order data into your customer backend system.
- Your Production environment is restored. If your Production environment cannot be restored, your disaster recovery Production environment becomes your permanent Production environment and a new Pre-Production environment is created.
Limitations
You are unable to use the Pre-Production environment when your Production environment is in disaster recovery mode. Ensure that you disable any integrations that were connected to the Pre-Production environment while the Disaster Recovery environment is active.