The world is increasingly unpredictable.

For a digitally dependent business, a small disruptive event can have a ripple effect across your entire company. You may have a backup and recovery plan, but how confident are you that it will deliver during your moment of truth?

Let’s look at the disruptive events that unfolded at MyShoppe. What possibly could have happened to their data center power system? Was it a mechanical fault or a leak at the diesel generator, or both? Did MyShoppe’s business continuity plan not consider equipment redundancy given their 24x7 operation? Let’s discuss the typical challenges, roadblocks and risks modern IT operations face.

The Outage

The power goes out. But who or what is to blame?

A perfect storm scenario that could have been predicted.

MyShoppe* is a global online retailer with two private cloud data centers, DC1 and DC2, housed close to their headquarters that run their core systems, including OLTP applications, CRM and inventory management. DC1 is the production site and DC2 is the recovery site – barely two miles apart.

A turbulent storm causes power outage at MyShoppe’s DC1. The UPS kicks in and keeps the IT infrastructure (compute, storage and network) on. But due to an unexpected mechanical fault, the diesel generator fails to switch on and a domino effect ensues. The HVAC cooling system shuts down. With no power to the cooling system temperatures rise past a critical threshold resulting in hard server shutdowns and random and unpredictable reboots. The servers start malfunctioning and crashing.

MyShoppe didn’t have a proper business continuity strategy or plan in place. One that could have tested such outage scenarios before they ever happened in the real world.

*The story and the characters portrayed in the video are fictional. Viewers and readers are encouraged to discuss specifics with an IBM specialist.

Technical Briefs

Learn how IBM can help you build and implement an integrated DC and resiliency strategy.

The Recovery

When disaster strikes time is money.

Recovery that took days could have been executed in minutes.

MyShoppe had their backup and recovery systems at DC2, about two miles away, and there was no other site to rely on. The uncontrolled server shutdowns and reboots at DC1 corrupted their critical applications and data, which were synchronously replicated to the backup site (DC2), triggering a chain of failures and making the IT Operations team unable to failover to DC2. The backup site was populated with enough application errors and bad data to crash all systems.

If MyShoppe had cloud-based backup and recovery they would have been able to back up and store critical data and applications off-site, protected from local weather or other unseen disruptions.

Technical Briefs

Learn more on how you can leverage IBM’s cloud-based managed backup, data protection and disaster recovery solutions with cloud landing zones.

The Fallout

An outdated plan, a setup for failure from the start.

A failed and manual disaster recovery plan that could have been automated.

MyShoppe’s business continuity-disaster recovery plan was outdated, manually operated and not tested enough. It also failed them during their moment of truth. If only they had implemented an orchestrated data protection and recovery platform that could have automatically verified the integrity of files replicated from DC1 to DC2. Such capabilities could have helped detect anomalies and alert the IT team and application managers, preventing the bad data to be replicated to the backup site.

The IT or business operations team would have never imagined that an event seemingly as small as a diesel generator fuel leak could take down their cooling system, production servers, backup site and eventually their entire business operation for days during their busiest and most profitable time of the year.

What MyShoppe needed was an automated and orchestrated recovery with AI and real-time intelligence to help eliminate human error and improve recoverability. And more importantly to proactively avoid disruptions that lead to lost revenue, brand damage and dissatisfied customers.

Read the technical brief

Learn how you can efficiently automate and easily monitor and manage your DR operations.

The Solution

When outages and disruptions strike, be confident in your backup and recovery plan—your moment of truth.

Together, these IBM services address all aspects of recognized IT risks—conflicting process and governance structures, cyber risk such as threat and vulnerability management, and anomaly detection.

Using IBM Cloud Resiliency Cloud Orchestration clients can automate DR tests, drills, switchover and switchback and manage and monitor these activities at the push of a button.