Riverbed held a roundtable discussion about the company (and some other topics) with journalists last night at Boston’s Oceanaire restaurant in Government Center. I sat next to Eric Wolford, Riverbed’s SVP of marketing and business development, and he opened a conversation by asking me “What’s hot?”
“Well,” I told him, “these days I sometimes feel like I’m writing for SearchSSD.com rather than SearchStorage.com.” I didn’t expect Riverbed to be getting into the solid-state disk game, but Wolford said there’s probably a place for SSDs in at least some of its WAN optimization products, too.
Wolford said some large Riverbed customers use Steelhead devices on both sides of the wire for replication. At extremely high bandwidth (OC12 and above), Wolford said SSDs could help keep up when large volumes of data hit the devices’ disks simultaneously.
“With large data center-to-data center replication, they sometimes need so many spindles there’s an opportunity for solid-state storage,” he said.
But he doesn’t see SSDs replacing spinning disk systems, for Riverbed or the industry at large. “It’ll give us a new high end,” he said. And, he added, “an enormous amount of our business is at the T1 level and there’s really no opportunity for it there.”
Wolford also gave me an update on the Atlas primary storage dedupe product Riverbed was originally going to ship this year but recently pushed out until 2010.
“We got critical feedback from alpha customers where they want to deploy [Atlas], but don’t want dependency on the Steelhead appliance.” Wolford said. So Riverbed is working on bundling the Steelhead functionality into the Atlas product itself.
Atlas will sit out of band, he said, “to the side” of the array and perform post-process dedupe. Wolford says customers are hot for primary storage data reduction, but most vendors still can’t deliver it at speeds fast enough for primary storage. “If the device is out of the path of hot data, the performance burden isn’t as extensive,” he said.
A recent report by Andrew Reichman at Forrester Research showed that among 124 surveyed IT decision-makers, incumbent vendors and Fibre Channel still dominate the in-use storage systems supporting VMware deployments. But, according to Reichman and another analyst specializing in VMware, the Burton Group’s Chris Wolf, that doesn’t mean it’s necessarily how things should be.
According to the Forrester Report, users should “pick a vendor that offers thin provisioning, has deduplication on the road map, and has documented best practices in virtual environments.”
Some of these things, Reichman says, are harder to come by than you might think. “Especially clear best practices–it seems like vendors tiptoe around it, saying, ‘we can do whatever you need!’ and customers need more clarity,” he said.
Reichman also said he wanted more storage vendors to offer management console integration with VMware’s vCenter, the way Xiotech does with its Virtual View.
On the management console front, the Burton Group’s Wolf added that vendors who sell backup should be looking to consolidate VMware data protection features into the same software framework as array-based or network-based backup mechanisms.
“That integration is going to be important for backup products supporting a virtual environment,” he said.
Reichman’s survey found that most shops are sticking with an incumbent vendor for server virtualization deployments, and that that vendor is most often EMC. But Reichman also said that the survey is probably capturing the first wave of production deployments of VMware. As those deployments grow and become more complex, and as more storage vendors add new features to their products specifically to support virtual servers, users might be compelled to take a fresh look.
When they do, Reichman’s report urges users to consider Ethernet first rather than Fibre Channel, though he said Ethernet adoption may be driving primarily by Microsoft’s Hyper-V virtualization software rather than VMware, “which tends to either be protocol-agnostic or Fibre Channel-centric,” he said.
Wolf said there are a couple more virtual server-focused features he’d like to see vendors add as they try to market storage devices for virtual server support. One of them is array-level primary storage compression and dedupe, currently only offered in a handful of places like NetApp’s FAS systems, EMC’s newest Celerra products, and for nearline/archival file storage by startups StorWize and Ocarina. The other is more efficient deployment consolidation for things like patches on multiple virtual machine images.
“So you could deploy a patch one time and any dependent images automatically update as well–that’s the level of intelligence I’d like to see in the arrays,” he said.
According to a notice posted on Facebook’s official blog, a group of disk drives (a RAID group?) on what sounds like a clustered storage systems failed en masse over the weekend, causing 10 to 15% of user-uploaded photos to Facebook not to be available.
You may have noticed in the past day that some photos aren’t appearing or are displaying a “question mark” graphic when you go to view them. We have experienced some problems with our photo storage that affected between 10 to 15 percent of already uploaded photos. Don’t worry: Your photos are safe, and we are working to make them available again as soon as possible. We’ve already repaired about one-third of affected photos and expect to complete repairs on another third tonight.
Here’s what happened, and what we’re doing to fix the problem: During an otherwise routine software upgrade on Friday night, we ran into some problems with our photo storage and a few of the hard drives where we store photos apparently failed all at once. We’re trying to fully understand what happened, since simultaneous hardware failures like this are rare.
As high-profile sotrage outages go, this one doesn’t seem to be as severe as it could have been, at least not compared to other Web 2.0 services disasters like ma.gnolia, which wasn’t able to recover users’ bookmarks when its backups failed in January. According to Facebook’s post, users will not lose their pictures while they try to get the problem diagnosed and repaired, but won’t be able to view them until sometime next week–
We still have all your photos because we store them in a way that maintains multiple copies of the data in case of hardware failures like this. However, even though your photos are safe, we can’t serve photos off the affected storage volumes until they’re repaired. We’re working on them right now, but it will take some time because there’s so much data on them and the repair process largely involves copying huge amounts of data to new drives. This is why some photos aren’t showing up right now.
We’re restoring photos as we repair the hard drives, so some should be working again today and we should be back to normal by early next week. New photo uploads will continue to work properly during the repairs, because we write them to different storage volumes. Thanks for bearing with us while we return things to normal.
Storage Twitterers are skeptical about the cause of the problem. Tim Masters, Co-Founder of StorageMonkeys.com, wrote “Recovery will take until “early next week” after a “hard drive failure”? Wish I had that kind of SLA internally….most of us don’t get the luxury of a week to recover a LUN or a disk shelf…”
Bloggers who aren’t hard-bitten storage guys, meanwhile, had some praise for Facebook’s handling of the issue. “It’s good to know that Facebook maintains backups of all your data for situations like this…” wrote Adam Ostrow at Mashable.
Meanwhile, this isn’t the only tale of consumer-facing storage horror to surface on the Internet today. Gizmodo also reported the saga of Nicole, who was allegedly done wrong on the backup front by Best Buy’s Geek Squad.
“Best Buy charged Nicole $99 to backup her data but then replaced her hard drive without backing up a single byte,” Gizmodo’s Carey writes. “Nicole’s service contract clearly stated that Best Buy would perform the backup before any other service. Now Best Buy is claiming that her old hard drive is their property and that she has no right to the data that they failed to backup or restore.”
To me, Best Buy reserving some kind of property rights on the disk drive sounds like code for “it’s gone to our after-market resale disk drive repository in the sky, and we don’t know where it is.” I don’t think they’re witholding the information deliberately or maliciously (why voluntarily create a PR problem like this one?), but I also don’t think Nicole’s getting her data back.
With more and more digital data protection issues like this one falling into the laps of consumers, we are probably going to eventually–after a long, slow process of learning by painful experience–see an approach to this stuff more like that of enterprise storage and backup experts, none of whom I can imagine uploading a photo to Facebook or bringing a computer hard drive in for service anywhere without making their own backups first.
Here are some stories you may have missed this week:
Capacity planning: Users delay top-tier purchases
Venture capitalist talks storage, economy and the cloud
More musical chairs: EqualLogic exec changes roles within Dell
As always, you can find the latest storage news, trends and analysis at http://searchstorage.com/news.
IDC’s storage tracker numbers released today shows an 0.5% year-over-year decline in worldwide external storage sales in the fourth quarter of 2008.
This was the first time external disk sales declined in more than five years, yet it’s hardly a surprise. Everybody knows spending dropped due to the economy late last year, and IDC already told us that PCs and servers revenues fell 1.9% in the fourth quarter. But the dynamics behind the storage decline are interesting.
NAS and iSCSI revenues continued to grow. NAS increased 8.6% over the previous year, and iSCSI was up 62%. However, Fibre Channel revenue declined 3.2%.
With smaller installed bases, NAS and iSCSI have been growing faster than FC, but FC had been growing as well. The decline of FC will likely continue in the short term, as current market conditions favor NAS and iSCSI over more expensive FC SANs.
“End users are looking for more economical ways to meet their growing storage needs as many IT budgets shrink due to the current economic conditions,” IDC research analyst for disk storage Liz Connor said in IDC’s press release. “FC SAN systems fulfill many high-end storage needs, but usually at a higher average price. However, iSCSI and NAS storage solution alternatives offer increased enterprise-level features at lower costs, and compel vendors to consider these technologies. Continued end user education, growing confidence in IP-based storage, increasing product sophistication, as well as a typically lower price point, result in increased adoption of iSCSI and NAS by many budget conscious end users.”
The trends Connor talks about are almost certain to continue for the rest of this year because spending isn’t likely to pick up before late 2008 at the earliest. But there’s no guarantee that FC sales will rise when storage spending does improve. A year from now, enhanced Ethernet will be here to power not only Fibre Channel over Ethernet (FCoE) but improve NAS and iSCSI as well. Emerging storage markets such Web 2.0, film/broadcast, video surveillance, and health care are dominated by organizations that deal with more files than Oracle databases. They can do just fine with NAS or IP SANs. The largest FC vertical, financial services, has been crippled by the economy.
So even when storage revenues start going up again, FC may not follow.
NetApp chief technical architect Val Bercovici let slip on his blog that NetApp is planning a new interface for Windows users of NetApp filers, complete with screenshots and feature details.
The preview is in the second half of Bercovici’s post about an award NetApp won at VMWorld Europe. Apparently the judges deducted points for the management interface, so Bercovici responded with the big reveal of what’s coming soon.
…our newest customers or partners evaluating and deploying their FAS arrays one or two at a time also deserve a modern interface to help them come upto speed. In the 21st century, that interface is most commonly provided by an administrative workstation running Microsoft’s Windows GUI… NSM [NetApp System Manager] using the familiar Microsoft Management Console (MMC) interface with a clean and modern Windows 2008 Server look & feel.
He also includes a screencap of what the interface will look like, pointing out how systems are listed in the navigation tree with active-active pairs grouped together along with a list of their software services. NSM also integrates with the Windows System Tray to pop up ‘bubble alerts’ for health issues with the arrays, snapshot management, and auto discovery for authentication systems like Active Directory, among others.
Existing enterprise NetApp users say it probably won’t have much impact on their environments. “In general, a GUI is a sexy tool, that all vendors like to demonstrate but a “real/old fashioned” system engineer will use it not very often,” wrote Reinoud Reynders, IT manager for the University Hospitals of Leuven in Belgiumin, an email to Storage Soup. “For vendors, it’s very important that they have one (for pre-sales activities), but after that, the use is less important.”
One of the public faces of EqualLogic has a new role within the company that acquired it, Dell officials confirmed today.
John Joseph, formerly VP of marketing for the iSCSI SAN vendor, has been shifted to a new position as Vice President of Enterprise Solutions Marketing. A Dell spokesperson described the shift in an email to SearchStorage:
John has taken on a new role within Dell focused on integrating solutions. This move comes on the heels of Dell’s recent announcement that it will organize itself around three major customer segments – large enterprise, public sector, and small and medium businesses. John’s move further demonstrates the success of the EqualLogic acquisition and its integration into Dell and the importance of storage in Dell’s enterprise business solutions.
Asked for clarification on what exactly “integration” means, the spokesperson offered further,
The role focuses on bringing our different products together (storage, servers, etc.) and making sure we addressing our customer’s data center needs. So yes, he will still have contact with all storage products as well as servers, services, and software. Customers want a ready tested and certified IT solution from Dell and we’re responding.
So Joseph will still have contact with the EqualLogic products, but it seems to have gotten more remote–or at least more mixed in with other duties. Duties, we might note, that seem a bit removed from his previous role in marketing for an iSCSI SAN platform that Dell positions for SMBs and the midrange. Presumably, enterprise solutions require some high-scale, Fibre Channel activities as well.
More importantly, Joseph was, as mentioned above, a public symbol for EqualLogic and is closely associated with that company and its products. His continued presence at the wheel following the acquisition was one of the more encouraging signs I saw for the Dell/EqualLogic integration.
Meanwhile, Fusion-io, (the topic of plenty of coverage yesterday) has more quietly replaced its CEO, Don Basile, with its former senior vice president David Bradford.
Fusion-io didn’t officially announce its CEO change, but quoted Bradford and identified him as its CEO in a Tuesday news release about its OEM deal with Hewlett-Packard. He’s also listed as CEO on their executive bios web page.
Bradford’s bio on the site credits him with persuading Apple co-founder Steve Wozniak to join the startup. This contradicts a published report from Fortune last month which credited Basile with the publicity-generating hire.
I’ve seen this kind of CEO shift happen at other startups as they enter different phases, in this case from developing product to trying to grow revenue. A similar thing even happened during a transition in VMware’s growth patterns last year when Paul Maritz replaced founder Diane Greene. But while Fusion-io is not a publicly traded company and is not required to announce a change in CEO, the storage community on Twitter took notice of the move with some chatter this morning.
Storage end user and blogger Martin Glassborow wrote on Twitter, “The sudden and rather stealth change of CEO is interesting. You wonder if there has been some direction issues.”
StorageIO Group founder and analyst Greg Schulz Tweeted back, “Concur, some normal shuffling of people moving around, however also some attrition, RIFs, and strategy changes as you point out.”
I’ve been in touch with several people at Fusion-io today to try to get to the bottom of the apparent discrepancy over The Woz, but have yet to hear back.
We interviewed Fusion-io Inc. CTO David Flynn for one of our news stories today–here’s some nitty-gritty bonus footage on how the company’s product goes about protecting data, and how that compares to spinning-disk systems.
Beth: So one ioDrive is 320 GB. Is data striped across all the chips or do you have separate data sets?
Flynn: Each one of the Flash modules looks like a volume and you can either stripe them or mirror them to make them look like one volume. Or is you have multiple cards you can aggregate all of those volumes with RAID 10. We have RAID-5 like redundancy on the chips, then RAID between the memory modules. What we’ve come to realize after we introduced FlashBack is that it actually lets you get more capacity.
Most SSDs are 64 GB at most—32 GB, 64 GB. With this technology we put five to 10 times as many chips within our card. That would increase the failure rate because the individual chip’s failure rates add up. With our ability to compensate, we can get to higher capacities, and with that we can increase endurance, because you can spread the data out.
Internally it’s more like RAID 50 because I have eight die in my redundancy chip. There’s one parity die for each package. It’s 24+1 and then that quantity times eight, because there’s eight of those sets. If you were to line it up like disk drives, it would look exactly like that, 24 disk drives and then an extra one, 8 rows. So when we talk about this as a SAN in the palm of your hand we really mean it, because we’ve taken die within the various NAND packages and arrayed them together just like a disk array. It’s also self-healing in that if you have a fault the system reconstructs the data that otherwise might’ve gone missing and moves it to a different spot and turns off the use of the spot that failed. You don’t have to service it. It automatically just maps it out. Like Xiotech’s ISE product—that’s bleeding edge stuff for disk arrays, and it’s built into the silicon here.
What about double parity protection? That’s all the rage in the disk drive world these days. What if more than one die fails at once?
For us to rebuild and heal takes a split second. Having a second failure during that time is not going to happen. It takes so long to rebuild a disk drive—it can take more than a day now—that the probability of a double failure goes up. The other thing is that disk drive failures are often highly correlated—the drives come from the same batch. They tend to fail randomly but close to each other in time. Our portfolio does cover n+m redundancy as well as N+1 because we anticipate a day when we’re putting not hundreds of these die on the boards but thousands and going into the tens and hundreds of thousands.
At the same time the Flash memory has finite write endurance, so they are all going to wear out at some point. So how do you compensate for that?
We account for how many write cycles it’s been through so we can give somebody a running…like an odometer, for tread wear on a tire. You can go five years or 50,000 miles. We warranty it, and you can swap out the modules without needing a new carrier card. Because we have such high capacity we naturally get a longer lifespan. It’ll last for 5 years even if you’re doing nothing but writing constantly. Wear-out has been overrated I think because most of the failures people are seeing have nothing to do with wear-out, they have to do with internal events that cause chips to lose data.
Here’s the four factors. This is the dirty little secret of the NAND world—it’s the newest fab process, which means it has its kinks. It’s the tightest feature size—they’re going to 32 nm. The density of the array of cells is achieved by sharing control lines. And then, fourth, and the real killer, to move the electrons into the floating gate cell it takes 20 volts internally. Most core voltages are well under a volt nowadays.
These four factors mean having a short-out event on one of these tiny little control lines—if you have just one chip it’s no big deal, it’s 40 out of a million. Which for a thumb drive, nobody would notice—it’s more likely to get shorted out in your pocket. But when you put hundreds of them together, now you have hundreds of those 40 out of a million chances to have something go bad, and that actually adds up to be something like one or two percent of these things fielded would have a data loss event. For a normal SSD the way they compensate is to put fewer chips on it or try to sweep the problem under the carpet—what they say if you talk to them is, ‘Well, we screen it very well, we run it in advance to make sure it’s not going to happen.’ You can screen it up front but there’s still probabilities of failure.
Here’s the thing: disk drives wear out, too. The trouble is, it’s unpredictable. One of the strongest motivators to going to solid state technology is the predictability of when you’re going to need to service it. And after a couple of years, you’re going to be able to replace it for a fraction of what it cost initially.
Not a month after an Israeli news source reported that EMC Corp. had been under investigation concerning government contracts in Israel, EMC revealed in its annual report filed with the SEC that it’s under investigation by the Civil Division of Department of Justice (DOJ). The DOJ investigation involves “allegations concerning (i) EMC’s fee arrangements with systems integrators and other partners in federal government transactions, and (ii) EMC’s compliance with the terms and conditions of certain agreements pursuant to which we sold products and services to the federal government, including potential violations of the False Claims Act.”
There’s no relation to the Israeli investigation, according to an EMC spokesperson. In another contrast with that case, in which EMC flatly denied comment, this time the company is flatly denying any wrongdoing will be found by the DOJ. “EMC did not make improper payments to business partners and did not violate the False Claims Act,” wrote the spokesperson in an email to SearchStorage.com. “The matters at issue in this case are historical in nature; some of the allegations relate to events nearly ten years old. We will vigorously defend this case and the many years EMC has spent serving the U.S. Government…”
The SEC filing reads,
The subject matter of this investigation also overlaps with that of a previous audit by the U.S. General Services Administration (“GSA”) concerning our recordkeeping and pricing practices under a schedule agreement we entered into with GSA in November 1999 which, following several extensions, expired in June 2007. We have cooperated with both the audit and the DoJ investigation, voluntarily providing documents and information, and have engaged in discussions aimed at resolving this matter without any admission or finding of liability on the part of EMC.