Data backup horror stories

4280 pts.
Backup and Recovery
Backup hardware
Data backup
I recently asked you what advice you would give to someone greener than you in data center security, and Mrdenny and Labnuke99 made some great points about frequently doing checks to make sure your backups are doing what they're supposed to do. What I want to know is, do you have any data backup stories you wish you didn't have to tell? Tapes go awry, backups go missing... we all have these stories, I'm sure. What's yours?

Answer Wiki

Thanks. We'll let you know when a new response is added.

We once had an issue with AS/400 data backup. We do a daily tape backup on the AS/400. We have a tape for each day of the week. Unfortunately, the days of the tape got confused and we suddenly realized that Friday’s data had been “missing”. Little did we know it was sitting on Monday’s tape!

We also had a 6 tape changer. It was starting to give us problems, so my boss immediately ordered a different unit. First, the replacement unit manufacturer had some error, which delayed the item being shipped. Second, the shipping company (I believe it was FedEx) must have had a problem, as the unit said it arrived two days earlier than it actually did. On top of this, literally 15 or 20 minutes before I was about to remove all of the tapes in anticipation of the new device, the tape changer died…with all of the tapes inside. I don’t know if you’ve ever taken apart a tape changer, but to get a tape out, you literally have to remove about 95% of the screws. That was a complete mess. In my frenzy to get the tapes out, I did not document the process of disassembling the unit…so I was at a desk with a mess of parts, a bowl of screws, and a huge headache! Luckily, my memory served me well enough to get it all back together (PHEW!). We then installed the new tape changer…but, this tape changer didn’t want to work. A brand new tape loader, and it won’t work. Turns out the thing is pretty picky, and only works when all 8 tape bays are full…and we only had six. Needless to say, our backup that day didn’t work!

Another whoa of data storage originates from our DVRs. We currently have a few of them…one day, within about 20 minutes of each other, 90% of the hard drives failed. Naturally, we didn’t have that many high duty high capacity drives on hand, so we were a fish out of water. To make matters worse, we looked up the drives on the manufacturers website, and it stated they only had a year warranty. After finding this out, we decided there was nothing we could do, so why not take apart the drives and look just for fun and for educational purposes. After disassembling most of the drives, my boss jumps up and says, “Ah, I just remembered something…”. He scrambled to his desk to find some documents and comes back dragging his feet. “We actually did get these drives with 3 year warranty, I have it all right here.” My coworker contacted the manufacturer and this was, in fact the case. Unfortunately, we had just voided all of the warranties and made accessing a month’s worth of camera data impossible.



Discuss This Question: 1  Reply

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.
  • SbElectric
    One comes to mind, regarding data backups incidents. We set up an elaborate data backup job having multiple steps (with condition code checking for each steps) for various types of data. We dutifully tested each step, checked step completion code of 0 (zero) & displayed proper alert message for the operators if the step did not execute properly. The operators were instructed to check for condition code of ‘0’ for each step. Everything went fine - we tested the restore process by using the backup tapes. Like in any large data centers, there were always some procedure/process change or environmental changes in the center. Luckily most of these changes or caught (or observed) in our large backup job. When some steps did not execute with Condition code “0”, the operators dutifully caught this anomaly and informed us. We made the necessary modifications and life was back to normal. One day due to some security concerts, some databases were put in DMZ zone to isolate from main configuration. These databases contained sensitive and critical information. Operators reported that everything went fine with backup job (all condition codes were “0”}. But as a passing comment, mentioned that the job ran a bit quicker. Being technically savvy (?), I thought this must be ok since there were less contentions in the DMZ! This continued for a few more days. It was summer time … living was easy … and fishes are jumping. My boss has an uncanny sense of “trust but verify” mentality. So one day he called us to restore the library on a different server from the DMZ. Dutifully we loaded the backup tape to restore and it yielded “0“ records restored!! On examining the backup job we noticed the message “not sufficient authority to access the database”. Job step skipped, condition code “0”. The job did not have proper access privileges for the DMZ. Needles to say we learned our lesson of “trust but verify” sheepishly!!
    2,540 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.


Share this item with your network: