According to a notice posted on Facebook’s official blog, a group of disk drives (a RAID group?) on what sounds like a clustered storage systems failed en masse over the weekend, causing 10 to 15% of user-uploaded photos to Facebook not to be available.
You may have noticed in the past day that some photos aren’t appearing or are displaying a “question mark” graphic when you go to view them. We have experienced some problems with our photo storage that affected between 10 to 15 percent of already uploaded photos. Don’t worry: Your photos are safe, and we are working to make them available again as soon as possible. We’ve already repaired about one-third of affected photos and expect to complete repairs on another third tonight.
Here’s what happened, and what we’re doing to fix the problem: During an otherwise routine software upgrade on Friday night, we ran into some problems with our photo storage and a few of the hard drives where we store photos apparently failed all at once. We’re trying to fully understand what happened, since simultaneous hardware failures like this are rare.
As high-profile sotrage outages go, this one doesn’t seem to be as severe as it could have been, at least not compared to other Web 2.0 services disasters like ma.gnolia, which wasn’t able to recover users’ bookmarks when its backups failed in January. According to Facebook’s post, users will not lose their pictures while they try to get the problem diagnosed and repaired, but won’t be able to view them until sometime next week–
We still have all your photos because we store them in a way that maintains multiple copies of the data in case of hardware failures like this. However, even though your photos are safe, we can’t serve photos off the affected storage volumes until they’re repaired. We’re working on them right now, but it will take some time because there’s so much data on them and the repair process largely involves copying huge amounts of data to new drives. This is why some photos aren’t showing up right now.
We’re restoring photos as we repair the hard drives, so some should be working again today and we should be back to normal by early next week. New photo uploads will continue to work properly during the repairs, because we write them to different storage volumes. Thanks for bearing with us while we return things to normal.
Storage Twitterers are skeptical about the cause of the problem. Tim Masters, Co-Founder of StorageMonkeys.com, wrote “Recovery will take until “early next week” after a “hard drive failure”? Wish I had that kind of SLA internally….most of us don’t get the luxury of a week to recover a LUN or a disk shelf…”
Bloggers who aren’t hard-bitten storage guys, meanwhile, had some praise for Facebook’s handling of the issue. “It’s good to know that Facebook maintains backups of all your data for situations like this…” wrote Adam Ostrow at Mashable.
Meanwhile, this isn’t the only tale of consumer-facing storage horror to surface on the Internet today. Gizmodo also reported the saga of Nicole, who was allegedly done wrong on the backup front by Best Buy’s Geek Squad.
“Best Buy charged Nicole $99 to backup her data but then replaced her hard drive without backing up a single byte,” Gizmodo’s Carey writes. “Nicole’s service contract clearly stated that Best Buy would perform the backup before any other service. Now Best Buy is claiming that her old hard drive is their property and that she has no right to the data that they failed to backup or restore.”
To me, Best Buy reserving some kind of property rights on the disk drive sounds like code for “it’s gone to our after-market resale disk drive repository in the sky, and we don’t know where it is.” I don’t think they’re witholding the information deliberately or maliciously (why voluntarily create a PR problem like this one?), but I also don’t think Nicole’s getting her data back.
With more and more digital data protection issues like this one falling into the laps of consumers, we are probably going to eventually–after a long, slow process of learning by painful experience–see an approach to this stuff more like that of enterprise storage and backup experts, none of whom I can imagine uploading a photo to Facebook or bringing a computer hard drive in for service anywhere without making their own backups first.