Oh, no! The file/data/disk is gone!
How many times have you find yourself saying that? It’s a comfort to feel you’re not alone, which is why so many people like to seize on statistics like these from the Boston Computing Network, including some subset of the following statements:
- 6% of all PCs will suffer an episode of data loss in any given year. (The Cost Of Lost Data, David M. Smith)
- 30% of all businesses that have a major fire go out of business within a year. 70% fail within five years. (Home Office Computing Magazine)
- 31% of PC users have lost all of their files due to events beyond their control.
- 34% of companies fail to test their tape backups, and of those that do, 77% have found tape back-up failures.
- 60% of companies that lose their data will shut down within 6 months of the disaster.
- 93% of companies that lost their data center for 10 days or more due to a disaster filed for bankruptcy within one year of the disaster. 50% of businesses that found themselves without data management for this same time period filed for bankruptcy immediately. (National Archives & Records Administration in Washington)
- Companies that aren’t able to resume operations within ten days (of a disaster hit) are not likely to survive. (Strategic Research Institute)
- Every week 140,000 hard drives crash in the United States. (Mozy Online Backup)
- Simple drive recovery can cost upwards of $7,500 and success is not guaranteed.
One of the best known, though, is the one that states, “80% of businesses affected by a major incident close within 18 months.” There’s dozens of variations of this on the Internet. It must be true.
Except it’s not. Not really.
“I have read many explanations of where this 80 percent myth originates from, but have never managed to find the original source,” wrote Mel Gosling for Continuity Central in what appears to be 2007. “It has, though, been repeated again and again over the years to frighten executives into developing business continuity plans, and just when I thought that the business continuity profession had decided to stop dragging out such a dubious statistic it has reappeared in all its glory.”
Not that frightening executives into developing business continuity plans is a bad thing, of course. Hey, whatever works.
In discussing the potential source of the quotation, it was tracked down at least as far as Amdahl in 1983. Others said they had heard it 30 years ago (from 2007, which tracks it back to 1977).
In 2009, Gosling went on to follow up on 29 of these and similar statistics, looking for their sources, and determined that in the vast majority of cases, they either couldn’t be sourced or were wildly out of date.
And yet the same data loss statistics still get quoted. Less than a year ago, business backup company Code 42 trotted them out again, attributing “60% of companies that lose their data will go out of business with 6 months of the disaster” to Computer Troubleshooters the previous year. If you go to Computer Troubleshooters, it in turn lists a whole series of statistics, attributed to VaultLogix – but with no source and no date. Moreover, the 60% VaultLogix statistic is quoted by other sites as well.
But while the provenance of the statistic is in doubt, it has what people call “truthiness” (which, incidentally, was the Merriam-Webster Word of the Year in 2006). It feels right. “Truthiness” was defined by the American Dialect Society as “the quality of preferring concepts or facts one wishes to be true, rather than concepts or facts known to be true.” As with urban legends, we’d all like to believe there’s a little boy whose dying wish is to get a lot of postcards or that Bill Gates will give money to people who share a link on Facebook. It fits our preconceived notions, so we jump on it without looking too carefully at where the data might have come from.
Now, does this mean that all such studies and quoted facts are suspect? No, not at all. A 2013 blog post from Backupify presented another list of data loss statistics. While it does include the lovely tautology “Data Loss is the #2 reason for data loss (up from #5 in 2010)” (one wonders what the #1 reason for data loss is, if not Data Loss), on the whole, the statistics it lists have more validity.
What makes these numbers more reliable? First, there’s the fact that they actually have dates attached to them, which gives them some context. Second, it’s possible to track down the actual sources of the statistics. It’s not a single out-of-context fact — or, worse, a whole list of them — passed down through the generations as gospel truth.
You really want to calculate some lost business? Figure out just how many person-hours have been wasted trying to find the source of this mythical statistic.