Posted by Carl Zetie
There's a quote famously attributed to the late Ronald Reagan: "Trust, but verify." I was reminded of it by a sad but sobering story that passed by during the recent holidays: blogging provider JournalSpace
is out of business, its users' data wiped out in a metaphorical instant, the most probable cause being a disgruntled employee who deleted the database.
Strictly speaking, though, that was only the proximate cause. The underlying
failure was that, by all accounts, JournalSpace had confused "redundancy" with "backups". Their only "backup" was a real-time mirror of their database server. That's great for business continuity
-- if the primary drive failed they could switch to the mirror and hardly miss a beat. But it's hopeless for business recovery
-- in this case, it appears that somebody or some process erased the database files; and that erasure was of course immediately mirrored on the other drive. A true backup would have allowed them to recover to an earlier point in time; sadly, they appear not to have had any such backup.
(In fairness, a seriously disgruntled employee might have found a way to sabotage the backups, too. A malicious insider is perhaps one of the hardest threats to guard against, especially in a small company where one person may have extremely broad and unsupervised administrative privileges.)
Apart from being a sad story for the founders, employees and users of the service, there's a very important lesson here for anybody using any hosted or cloud service. How much do you really
know about the dependability of your provider? If you had asked JournalSpace if they had "backups" they would probably have said "yes", because apparently they didn't know the difference. If your cloud or hosted service provider is promising you certain uptime levels, failover performance, or recovery times, do you take them at their word or do you have some solid evidence? Do they regularly test their ability to perform simple restoration of backups, simulate system failure and recovery or even, in the most business-critical cases, simulate failover to a secondary site?
And most importantly, do you have the in-house expertise to audit and verify their claims that they do?