Disaster recovery
During the ongoing operation of Sterling Intelligent Promising, in the unlikely event that your production environment becomes unavailable or unusable and the problem or behavior that rendered your environment unavailable or unusable is irreversible, the disaster recovery process is triggered.
The preproduction environment is sized to house both the regular preproduction instances (for performance and production support tasks) as well as the dormant DR instance (as a contingency). Both instances share virtual machines for web, application, and database resources. For this reason, during a declared disaster, IBM quarantines your entire preproduction environment so that there is no interruption to the DR activities, and the preproduction environment is fully devoted to restoring the availability. The disaster recovery architecture includes all of the servers, network, scripts, and databases that are involved in the data backup and switching between preproduction and production environments when needed. The preproduction data center is housed within a different IBM® SoftLayer® data center than your production environment data center.
As part of the disaster recovery process, operational and transactional data within your production environment, such as orders, are routinely replicated throughout the day and backed up to the disaster recovery instance. The Data Replication includes the backing up of your PII and other regulated data. Your production environment web and application data are backed up hourly to the disaster recovery instance. The application data includes file system artifacts, such as CSS, images, static content, and SaaS extension artifacts. IBM also backs up key environment and site data, such as infrastructure and configuration data, extensions, and files daily. Backups of your production environment databases are also completed daily. Local backups, which can be used for small scale recovery events, are also completed and moved to a remote storage location. Transaction logs are maintained in both your live and disaster recovery data centers.
Production environment data, including web and application data, is replicated and backed up through a private IBM SoftLayer network between your production and preproduction environments. Your disaster recovery databases, which are always maintained at a near-ready state, use this network to replicate data in a near-synchronous mode by using high availability disaster recover (HADR) functions.
Objectives
- If you have the Inventory Visibility Essentials edition, the Service Level Objective (SLO) offered for the RTO is within 7 days with an RPO of 48 hours.
- If you have the Inventory Visibility Standard edition, the SLO offered for the RTO is within 48 hours with an RPO of 24 hours.
- Additionally, if you purchase options for SLO improvement for the IBM Sterling Inventory Visibility Standard edition, the expected RTO is within 4 hours, with an RPO of 2 hours.
Process
When a disaster occurs, the following steps are completed during the disaster recovery process:- In the unlikely event that your production environment or primary data center experiences a severe problem, which after investigation is deemed irreversible, IBM declares that a disaster occurred. IBM then begins implementing the disaster recovery process.
- IBM issues an alert to you and any other relevant parties, such as your Business Partners, if you are using a Business Partner to support your services.
- IBM activates the disaster recovery process to switch your preproduction environment into a
temporary production environment. When your preproduction environment is being used as a temporary
production environment, the preproduction environment is not available. When your normal production
environment is restored, your preproduction environment becomes available again.
As part of this activation, IBM activates the disaster recovery application servers on your backed-up production code base. IBM also validates that the network file systems for your site are mounted and available.
To make your site available for users, IBM deactivates the production environment web servers within the global load balancer. IBM then activates the disaster recovery web servers within the global load balancer. When this switch is completed, IBM notifies you that your site is available.
- IBM allows you and your Business Partners to conduct disaster recovery simulation exercises and
tests. To determine the best way to test your disaster recovery process, work with IBM to create
your test plan and complete your testing. Verify that your site functions and settings work on your active disaster recovery instance site. Complete the following tasks:
- Access the application UI including any channel applications.
- Test the data integrity of your disaster recovery database. Request disaster recovery database queries to help validate the data integrity.
- Test your network Telnet protocol to confirm the network path.
- Complete an order transaction process through your store to your on-premises backend systems that return data or confirmation of the process to your store. Ensure that you thoroughly plan this transaction process as it can insert unwanted order data into your customer backend system.
- Your Production environment is restored. If your production environment cannot be restored, your disaster recovery production environment becomes your permanent production environment, and a new preproduction environment is created.
Limitations
You are unable to use the preproduction environment when your production environment is in disaster recovery mode. Make sure to disable any integrations that were connected to the preproduction environment while the Disaster Recovery environment is active.