According to a notice posted on Facebook’s official blog, a group of disk drives (a RAID group?) on what sounds like a clustered storage systems failed en masse over the weekend, causing 10 to 15% of user-uploaded photos to Facebook not to be available.
You may have noticed in the past day that some photos aren’t appearing or are displaying a “question mark” graphic when you go to view them. We have experienced some problems with our photo storage that affected between 10 to 15 percent of already uploaded photos. Don’t worry: Your photos are safe, and we are working to make them available again as soon as possible. We’ve already repaired about one-third of affected photos and expect to complete repairs on another third tonight.
Here’s what happened, and what we’re doing to fix the problem: During an otherwise routine software upgrade on Friday night, we ran into some problems with our photo storage and a few of the hard drives where we store photos apparently failed all at once. We’re trying to fully understand what happened, since simultaneous hardware failures like this are rare.
As high-profile sotrage outages go, this one doesn’t seem to be as severe as it could have been, at least not compared to other Web 2.0 services disasters like ma.gnolia, which wasn’t able to recover users’ bookmarks when its backups failed in January. According to Facebook’s post, users will not lose their pictures while they try to get the problem diagnosed and repaired, but won’t be able to view them until sometime next week–
We still have all your photos because we store them in a way that maintains multiple copies of the data in case of hardware failures like this. However, even though your photos are safe, we can’t serve photos off the affected storage volumes until they’re repaired. We’re working on them right now, but it will take some time because there’s so much data on them and the repair process largely involves copying huge amounts of data to new drives. This is why some photos aren’t showing up right now.
We’re restoring photos as we repair the hard drives, so some should be working again today and we should be back to normal by early next week. New photo uploads will continue to work properly during the repairs, because we write them to different storage volumes. Thanks for bearing with us while we return things to normal.
Storage Twitterers are skeptical about the cause of the problem. Tim Masters, Co-Founder of StorageMonkeys.com, wrote “Recovery will take until “early next week” after a “hard drive failure”? Wish I had that kind of SLA internally….most of us don’t get the luxury of a week to recover a LUN or a disk shelf…”
Bloggers who aren’t hard-bitten storage guys, meanwhile, had some praise for Facebook’s handling of the issue. “It’s good to know that Facebook maintains backups of all your data for situations like this…” wrote Adam Ostrow at Mashable.
Meanwhile, this isn’t the only tale of consumer-facing storage horror to surface on the Internet today. Gizmodo also reported the saga of Nicole, who was allegedly done wrong on the backup front by Best Buy’s Geek Squad.
“Best Buy charged Nicole $99 to backup her data but then replaced her hard drive without backing up a single byte,” Gizmodo’s Carey writes. “Nicole’s service contract clearly stated that Best Buy would perform the backup before any other service. Now Best Buy is claiming that her old hard drive is their property and that she has no right to the data that they failed to backup or restore.”
To me, Best Buy reserving some kind of property rights on the disk drive sounds like code for “it’s gone to our after-market resale disk drive repository in the sky, and we don’t know where it is.” I don’t think they’re witholding the information deliberately or maliciously (why voluntarily create a PR problem like this one?), but I also don’t think Nicole’s getting her data back.
With more and more digital data protection issues like this one falling into the laps of consumers, we are probably going to eventually–after a long, slow process of learning by painful experience–see an approach to this stuff more like that of enterprise storage and backup experts, none of whom I can imagine uploading a photo to Facebook or bringing a computer hard drive in for service anywhere without making their own backups first.
Here are some stories you may have missed this week:
Capacity planning: Users delay top-tier purchases
Venture capitalist talks storage, economy and the cloud
More musical chairs: EqualLogic exec changes roles within Dell
As always, you can find the latest storage news, trends and analysis at http://searchstorage.com/news.
IDC’s storage tracker numbers released today shows an 0.5% year-over-year decline in worldwide external storage sales in the fourth quarter of 2008.
This was the first time external disk sales declined in more than five years, yet it’s hardly a surprise. Everybody knows spending dropped due to the economy late last year, and IDC already told us that PCs and servers revenues fell 1.9% in the fourth quarter. But the dynamics behind the storage decline are interesting.
NAS and iSCSI revenues continued to grow. NAS increased 8.6% over the previous year, and iSCSI was up 62%. However, Fibre Channel revenue declined 3.2%.
With smaller installed bases, NAS and iSCSI have been growing faster than FC, but FC had been growing as well. The decline of FC will likely continue in the short term, as current market conditions favor NAS and iSCSI over more expensive FC SANs.
“End users are looking for more economical ways to meet their growing storage needs as many IT budgets shrink due to the current economic conditions,” IDC research analyst for disk storage Liz Connor said in IDC’s press release. “FC SAN systems fulfill many high-end storage needs, but usually at a higher average price. However, iSCSI and NAS storage solution alternatives offer increased enterprise-level features at lower costs, and compel vendors to consider these technologies. Continued end user education, growing confidence in IP-based storage, increasing product sophistication, as well as a typically lower price point, result in increased adoption of iSCSI and NAS by many budget conscious end users.”
The trends Connor talks about are almost certain to continue for the rest of this year because spending isn’t likely to pick up before late 2008 at the earliest. But there’s no guarantee that FC sales will rise when storage spending does improve. A year from now, enhanced Ethernet will be here to power not only Fibre Channel over Ethernet (FCoE) but improve NAS and iSCSI as well. Emerging storage markets such Web 2.0, film/broadcast, video surveillance, and health care are dominated by organizations that deal with more files than Oracle databases. They can do just fine with NAS or IP SANs. The largest FC vertical, financial services, has been crippled by the economy.
So even when storage revenues start going up again, FC may not follow.
NetApp chief technical architect Val Bercovici let slip on his blog that NetApp is planning a new interface for Windows users of NetApp filers, complete with screenshots and feature details.
The preview is in the second half of Bercovici’s post about an award NetApp won at VMWorld Europe. Apparently the judges deducted points for the management interface, so Bercovici responded with the big reveal of what’s coming soon.
…our newest customers or partners evaluating and deploying their FAS arrays one or two at a time also deserve a modern interface to help them come upto speed. In the 21st century, that interface is most commonly provided by an administrative workstation running Microsoft’s Windows GUI… NSM [NetApp System Manager] using the familiar Microsoft Management Console (MMC) interface with a clean and modern Windows 2008 Server look & feel.
He also includes a screencap of what the interface will look like, pointing out how systems are listed in the navigation tree with active-active pairs grouped together along with a list of their software services. NSM also integrates with the Windows System Tray to pop up ‘bubble alerts’ for health issues with the arrays, snapshot management, and auto discovery for authentication systems like Active Directory, among others.
Existing enterprise NetApp users say it probably won’t have much impact on their environments. “In general, a GUI is a sexy tool, that all vendors like to demonstrate but a “real/old fashioned” system engineer will use it not very often,” wrote Reinoud Reynders, IT manager for the University Hospitals of Leuven in Belgiumin, an email to Storage Soup. “For vendors, it’s very important that they have one (for pre-sales activities), but after that, the use is less important.”
One of the public faces of EqualLogic has a new role within the company that acquired it, Dell officials confirmed today.
John Joseph, formerly VP of marketing for the iSCSI SAN vendor, has been shifted to a new position as Vice President of Enterprise Solutions Marketing. A Dell spokesperson described the shift in an email to SearchStorage:
John has taken on a new role within Dell focused on integrating solutions. This move comes on the heels of Dell’s recent announcement that it will organize itself around three major customer segments – large enterprise, public sector, and small and medium businesses. John’s move further demonstrates the success of the EqualLogic acquisition and its integration into Dell and the importance of storage in Dell’s enterprise business solutions.
Asked for clarification on what exactly “integration” means, the spokesperson offered further,
The role focuses on bringing our different products together (storage, servers, etc.) and making sure we addressing our customer’s data center needs. So yes, he will still have contact with all storage products as well as servers, services, and software. Customers want a ready tested and certified IT solution from Dell and we’re responding.
So Joseph will still have contact with the EqualLogic products, but it seems to have gotten more remote–or at least more mixed in with other duties. Duties, we might note, that seem a bit removed from his previous role in marketing for an iSCSI SAN platform that Dell positions for SMBs and the midrange. Presumably, enterprise solutions require some high-scale, Fibre Channel activities as well.
More importantly, Joseph was, as mentioned above, a public symbol for EqualLogic and is closely associated with that company and its products. His continued presence at the wheel following the acquisition was one of the more encouraging signs I saw for the Dell/EqualLogic integration.
Meanwhile, Fusion-io, (the topic of plenty of coverage yesterday) has more quietly replaced its CEO, Don Basile, with its former senior vice president David Bradford.
Fusion-io didn’t officially announce its CEO change, but quoted Bradford and identified him as its CEO in a Tuesday news release about its OEM deal with Hewlett-Packard. He’s also listed as CEO on their executive bios web page.
Bradford’s bio on the site credits him with persuading Apple co-founder Steve Wozniak to join the startup. This contradicts a published report from Fortune last month which credited Basile with the publicity-generating hire.
I’ve seen this kind of CEO shift happen at other startups as they enter different phases, in this case from developing product to trying to grow revenue. A similar thing even happened during a transition in VMware’s growth patterns last year when Paul Maritz replaced founder Diane Greene. But while Fusion-io is not a publicly traded company and is not required to announce a change in CEO, the storage community on Twitter took notice of the move with some chatter this morning.
Storage end user and blogger Martin Glassborow wrote on Twitter, “The sudden and rather stealth change of CEO is interesting. You wonder if there has been some direction issues.”
StorageIO Group founder and analyst Greg Schulz Tweeted back, “Concur, some normal shuffling of people moving around, however also some attrition, RIFs, and strategy changes as you point out.”
I’ve been in touch with several people at Fusion-io today to try to get to the bottom of the apparent discrepancy over The Woz, but have yet to hear back.
We interviewed Fusion-io Inc. CTO David Flynn for one of our news stories today–here’s some nitty-gritty bonus footage on how the company’s product goes about protecting data, and how that compares to spinning-disk systems.
Beth: So one ioDrive is 320 GB. Is data striped across all the chips or do you have separate data sets?
Flynn: Each one of the Flash modules looks like a volume and you can either stripe them or mirror them to make them look like one volume. Or is you have multiple cards you can aggregate all of those volumes with RAID 10. We have RAID-5 like redundancy on the chips, then RAID between the memory modules. What we’ve come to realize after we introduced FlashBack is that it actually lets you get more capacity.
Most SSDs are 64 GB at most—32 GB, 64 GB. With this technology we put five to 10 times as many chips within our card. That would increase the failure rate because the individual chip’s failure rates add up. With our ability to compensate, we can get to higher capacities, and with that we can increase endurance, because you can spread the data out.
Internally it’s more like RAID 50 because I have eight die in my redundancy chip. There’s one parity die for each package. It’s 24+1 and then that quantity times eight, because there’s eight of those sets. If you were to line it up like disk drives, it would look exactly like that, 24 disk drives and then an extra one, 8 rows. So when we talk about this as a SAN in the palm of your hand we really mean it, because we’ve taken die within the various NAND packages and arrayed them together just like a disk array. It’s also self-healing in that if you have a fault the system reconstructs the data that otherwise might’ve gone missing and moves it to a different spot and turns off the use of the spot that failed. You don’t have to service it. It automatically just maps it out. Like Xiotech’s ISE product—that’s bleeding edge stuff for disk arrays, and it’s built into the silicon here.
What about double parity protection? That’s all the rage in the disk drive world these days. What if more than one die fails at once?
For us to rebuild and heal takes a split second. Having a second failure during that time is not going to happen. It takes so long to rebuild a disk drive—it can take more than a day now—that the probability of a double failure goes up. The other thing is that disk drive failures are often highly correlated—the drives come from the same batch. They tend to fail randomly but close to each other in time. Our portfolio does cover n+m redundancy as well as N+1 because we anticipate a day when we’re putting not hundreds of these die on the boards but thousands and going into the tens and hundreds of thousands.
At the same time the Flash memory has finite write endurance, so they are all going to wear out at some point. So how do you compensate for that?
We account for how many write cycles it’s been through so we can give somebody a running…like an odometer, for tread wear on a tire. You can go five years or 50,000 miles. We warranty it, and you can swap out the modules without needing a new carrier card. Because we have such high capacity we naturally get a longer lifespan. It’ll last for 5 years even if you’re doing nothing but writing constantly. Wear-out has been overrated I think because most of the failures people are seeing have nothing to do with wear-out, they have to do with internal events that cause chips to lose data.
Here’s the four factors. This is the dirty little secret of the NAND world—it’s the newest fab process, which means it has its kinks. It’s the tightest feature size—they’re going to 32 nm. The density of the array of cells is achieved by sharing control lines. And then, fourth, and the real killer, to move the electrons into the floating gate cell it takes 20 volts internally. Most core voltages are well under a volt nowadays.
These four factors mean having a short-out event on one of these tiny little control lines—if you have just one chip it’s no big deal, it’s 40 out of a million. Which for a thumb drive, nobody would notice—it’s more likely to get shorted out in your pocket. But when you put hundreds of them together, now you have hundreds of those 40 out of a million chances to have something go bad, and that actually adds up to be something like one or two percent of these things fielded would have a data loss event. For a normal SSD the way they compensate is to put fewer chips on it or try to sweep the problem under the carpet—what they say if you talk to them is, ‘Well, we screen it very well, we run it in advance to make sure it’s not going to happen.’ You can screen it up front but there’s still probabilities of failure.
Here’s the thing: disk drives wear out, too. The trouble is, it’s unpredictable. One of the strongest motivators to going to solid state technology is the predictability of when you’re going to need to service it. And after a couple of years, you’re going to be able to replace it for a fraction of what it cost initially.
Not a month after an Israeli news source reported that EMC Corp. had been under investigation concerning government contracts in Israel, EMC revealed in its annual report filed with the SEC that it’s under investigation by the Civil Division of Department of Justice (DOJ). The DOJ investigation involves “allegations concerning (i) EMC’s fee arrangements with systems integrators and other partners in federal government transactions, and (ii) EMC’s compliance with the terms and conditions of certain agreements pursuant to which we sold products and services to the federal government, including potential violations of the False Claims Act.”
There’s no relation to the Israeli investigation, according to an EMC spokesperson. In another contrast with that case, in which EMC flatly denied comment, this time the company is flatly denying any wrongdoing will be found by the DOJ. “EMC did not make improper payments to business partners and did not violate the False Claims Act,” wrote the spokesperson in an email to SearchStorage.com. “The matters at issue in this case are historical in nature; some of the allegations relate to events nearly ten years old. We will vigorously defend this case and the many years EMC has spent serving the U.S. Government…”
The SEC filing reads,
The subject matter of this investigation also overlaps with that of a previous audit by the U.S. General Services Administration (“GSA”) concerning our recordkeeping and pricing practices under a schedule agreement we entered into with GSA in November 1999 which, following several extensions, expired in June 2007. We have cooperated with both the audit and the DoJ investigation, voluntarily providing documents and information, and have engaged in discussions aimed at resolving this matter without any admission or finding of liability on the part of EMC.
Storage vendors are announcing new deals in an effort to make their enterprise goods more tempting amid slashed storage budgets. Today, HP confirmed it is extending a 0% financing deal it had previously been offering with its servers to storage.
According to an HP spokesperson, the HP storage products included in this program are:
The move comes after HP reported double-digit revenue declines over most of its lines of business for its first fiscal quarter. The Enterprise Storage and Servers (ESS) group was no exception, with revenue of $3.9 billion, down 18%. Within that, storage revenue fell 7%; overall profit for the group was also down 14%.
HP joined NetApp in reporting earnings declines in a fiscal quarter that included January. (Interesting aside: Dell reported that its storage business, especially its low-end PowerVaults and EqualLogic midrange iSCSI SANs, did relatively well for its first fiscal 2009 quarter, with business up 7% though overall earnings slipped).
But in a recession this deep, some federal interest rates have also been cut to zero in the hopes of getting business moving again. Housing prices are so depressed that theoretically, they should be affordable to a whole new class of buyers. But neither of those things–and so far, all the King’s horses and all the King’s men–haven’t done much for the markets, if only because everyone who still has a job is so afraid they’ll lose it by the end of this year that they aren’t spending, no matter how good the deal is.
Many enterprise storage users seem to be in a similar boat–these financing deals, like low home prices, would be irresistible in better times. Ironically, in bad times, they may not be enough.
Data Domain is bumping up its deduplication speed with an operating system upgrade.
Moving from OS 4.5 to 4.6 will improve the speed of Data Domain systems from 50% to 100% depending on the protocol and network interface, according to the vendor’s VP of product management Brian Biles. The greater speed comes from code tweaks in the OS that lets multi-core CPUs support more parallel streams.
The improvement with OS 4.6 is greatest for systems running 10-Gigabit Ethernet and Symantec’s NetBackup OpenStorage (OST) interface. For instance, max performance for the DD690 – Data Domain’s largest system – goes from 1.4 TB per hour to 2.7 TB per hour for a 90% increase using the new OS, according to Data Domain’s estimates. That’s with 10-GigE and OST.
Is speed all that important for dedupe? Throughput often gets lost in the debate over dedupe ratios and the inline versus post-processing argument, but analysts and customers say speed is a major selling point for dedupe systems. Speed is plays a big role in Data Domain’s inline deduping, which risks slowing backups because it dedupes while backups are taking place.
“Faster equals more data processed, which equals more data reduction,” Enterprise Strategy Group analyst Brian Babineau says. “Performance improvement is a means to other benefits, including storing more data in smaller footprint.”
Rich VanLare, Network Administrator for shopping center developer Regency Centers, has been using a Data Domain DD690 with NetBackup and OST since last October and was blown away by the speed with OS 4.5. VanLare says his goal was to decrease backups to below nine hours, and is down to five hours since replacing tape with the DD690. VanLare says he’ll upgrade to 4.6, but he’s happy with his current system.
“The box is incredibly fast to begin with,” VanLare said. “Personally I don’t need it [improved speed] because it’s already exceeded my expectations.”
VanLare, who claims to get more than 90 percent compression, said the OST option was the main factor he choose Data Domain over VTLs from NetApp and Overland Storage.
“I have a lot of administrators getting into the interface, and I just wanted things to be simple,” he said. “OST tells an administrator exactly what happens if something fails.”
VanLare biggest concern with Data Domain is he won’t be able to add a second box at a DR site until the economy improves. “I wanted to do that this year,” he said. “With budgets as they are, I’m not sure that’s going to be approved.”