Posted by: ITKE
disaster recovery, LinuxWorld
This gem didn’t have a home on any of our sites, but I didn’t want the reporting to go to waste. So it’s going underground on the Enterprise Linux Log. On that note, enjoy some session coverage from LinuxWorld 2007!
SAN FRANCISCO – Everyone in IT uses storage in their data center, therefore everyone will one day have to deal with that storage failing. It could happen at anytime.Even in the moments before your LinuxWorld presentation on demystifying data recovery.
That’s what happened to Chris Bross anyway, roughly five minutes before attendees starting filing into his session on “Demystifying Data Recovery” here at the LinuxWorld Conference and Expo.
Bross is an enterprise recovery engineer with Novato, Calif.-based DriveSavers Data Recovery Inc., and the good news for his presentation was that he had brought along a backup USB thumb drive with a copy of his presentation. All too often however, Bross said IT managers and decision makers are not taking the steps necessary to secure and recover the data.
All storage fails eventually
“All storage is going to fail eventually. All hardware breaks. Are you prepared for the inevitable?” Bross said before conducting an informal poll about who had ever lost data.
A smattering of attendees, Bross included, raised their hand (In addition to losing a USB thumb drive, Bross would later admit that one of his two Ubuntu laptops failed during a shipping snafu).
But the informal poll belied a much bigger problem in data back up and recovery in today’s enterprise; one which Bross set out to diagnose and recover much as he and his staff have done hundreds of times back in Novato with hard disks damaged by fire, water and mechanical defects (the latter being demonstrated with a variety of drive-head-on-platter audio clips from real life recovery efforts at the DriveSavers clean room).
Disaster recovery: the numbers
“For all of the effort [systems administrators] put into assigning employees backup tasks, 60% of all corporate data today resides on unprotected PC desktops and laptops,” Bross said, citing industry research from Rochester, N.Y.-based Harris Research.
And when natural disasters strike – and they will, despite the disagreement over the disaster recovery between business executives and IT staffs — the track records of today’s data centers is poor.
According to a study from the University of Texas, U.S. small and medium-sized businesses have shown that when they lose data in a natural disaster, 50% never reopen and 90% are gone in two years. Bross said the hourly cost to “recreate” these battered data centers can run anywhere from $50,000 per hour to $2 million per job at large eCommerce sites.
The reality of reliability
Bross said common knowledge in data centers is that the mean time before failure (MTBF) – or “mean time to failure” – for a typical hard drive is between 500,000 to 1.5 million hours. In an ideal environment, the annual rate of failure of any given drive is .88%.
But two studies from Google Inc. and Carnegie Mellon disagree. In both studies, real world testing of drive reliability found the actual annual replacement rate was actually 3-8%. On top of that revelation was word that failure rates double after the first year of service. For drives older than one year, Bross gave simple advice: “If you experience a drive error of any kind, pull the drive. It’s better to be safe than sorry,” he said.
Those were just mechanical failures though; like natural disasters and virus corruption. Truth be told, studies have shown that user error is by far the biggest contributor to data loss. Fully 60% of all hardware failure is the result of the user, Bross said, which includes malicious/accidental deletion of code, incorrect RAID configuration, accidental reformatting and bad maintenance.
Data protection and the inevitable
Bross concluded his session with a number of tips and best practices for systems administers to use in specific situations.
Hurricanes and floods – “Remember if want to preserve data, you’ll want to make sure that the drive is kept wet,” Bross said. “Storage needs to remain wet. If it dries out, there’s lots of calcification and mineral deposits that can form and cause havoc.” Bross instructs all of his customers to keep wet drives submerged and cool.
Data has been damaged, now what? – Rule number one: Don’t panic. Evaluate the failure and check the status of you backup, Bross said. “Don’t run repair utilities on it. Don’t reformat the volume. Don’t restore backup to the drive in question. Don’t remove drives from a RAID system or rebuild it. Instead, cool heads will prevail and you should evaluate and check the backup drive first, he said.
RAID is not equal to backup! — RAID, by its definition, is a redundant array of disks. “The reality is that a RAID device is only part of the backup solution,” Bross said. “RAID is good for one thing and it’s not as the primary backup application–it’s fault tolerance,” he said.
DIY data recovery – Doing things on your own is good for corruption, deletion or logical corruption of volumes. However, Bross warned that this approach is bad for hardware damage or complex configurations.
Local service providers – This is a good option for transfers, but not for data recovery (this category also encompasses a local expert).
Professional data recovery services — For mission critical data where risk is not an option. Employ clean room facilities as required by drive manufacturers. Bross said that even in these pristine professional conditions “not every patient makes it though this ‘ER for hard drives’.”
Remote access services — Cannot be used with physical device failures. There is the potential risk to customer data that hardware or logical volumes could degrade during diagnosis. Potential benefits are a quick resolution and recovery of data. And there’s no need to ship hardware to lab, either.
Bross said users can avoid needing these data recovery strategies in the first place by making complete backups regularly. “Have a process and a schedule. Assign responsibility and create a chain of command,” he said. “All storage eventually fails. Run home and backup your data now. Be happy you did and sleep well tonight.”