Disk vs. tape is not a new argument, but over time it takes on different permutations, especially as disk-based backup in its various forms gains popularity and new technologies get introduced like data deduplication that bring some of the economics of disk closer to those of tape.
One theme I’ve heard cropping up in this discussion among high-end vendors lately is the idea of people in large enterprises deploying vast amounts of disk for backup, then realizing the cost inefficiencies, and space and power requirements of disk, and finally running back to tape either alongside or as a replacement for disk.
This back-and-forth popped up again in post written by IBM’s Tony Pearson in response to a post written by Hitachi Data Systems’ Hu Yoshida. Yoshida’s post referred to a conversation with a storage admin at SNW who said his robotic tape libraries were actually drawing more power than his enterprise VTL.
This idea makes Pearson sputter:
I am not disputing [the] approach. It is possible that [the user] is using a poorly written backup program, taking full backups every day, to an older non-IBM tape library, in a manner that causes no end of activity to the poor tape robotics inside. But rather than changing over to a VTL, perhaps Mark might be better off investigating the use of IBM Tivoli Storage Manager, using progressive backup techniques, appropriate policies, parameters and settings, to a more energy-efficient IBM tape library. In well tuned backup workloads, the robotics are not very busy. The robot mounts the tape, and then the backup runs for a long time filling up that tape, all the meanwhile the robot is idle waiting for another request.
The weird thing is, I’ve heard plenty of vendors debating this of their own accord, usually taking sides along product lines with tape-centric vendors taking the position Pearson did, and vendors who sell disk for secondary storage taking the opposite view.
But I’m curious. I’m sure there’s some middle ground where the advantages and disadvantages just depend on personal preferences. But might there really be a trend here? Are users finding problems with disk-based systems and re-integrating tape? How many organizations really even left tape totally behind to begin with? And how do new data reduction/power reduction technologies change the equation? One thing not addressed by either Pearson or Yoshida’s post is where MAID might come into this argument, as well as the potential combination of MAID and dedupe.
Dell’s been so acquisitive in storage lately that every new announcement from them, especially about partnerships, has me paying attention. I don’t believe that their buying spree is necessarily over.
This week, Dell certified ExaGrid’s diskless iSCSI deduplication gateway with EqualLogic’s iSCSI SAN for secondary dedupe storage. ExaGrid claims this is the first iSCSI-based deduplication gateway. Data Domain also sells a gateway, but it’s for FC. NetApp’s deduplication works on its V-series gateways, but isn’t separable from the OnTap OS.
Still, given the concern about performance for even FC-based dedupe systems, I wonder what the appeal is of an iSCSI dedupe system based on a gateway. It seems Dell is still sussing this out, too. Senior manager for Dell/EqualLogic product marketing Kevin Wittmer said Dell will not resell or support the combined product. It instead will be sold entirely through ExaGrid channels (the two have mutual channel partners).
Wittmer said this was a project begun on the ExaGrid side before Dell acquired EqualLogic, and added “you’re going to see Dell paying attention to this market space.”
Does that mean Dell will try to make its own foray into deduplication? In other words, is this ExaGrid partnership a test to see if the technology is worth acquiring?
“We will continue to look at the market space,” Wittmer responded. “I don’t want to go into detail right now on Dell’s product strategy.”
Then Wittmer said another thing that you could take one of several ways – “… it has much bigger implications that could impact all of Dell.”
Oracle is getting into the archiving game with the Oracle Universal Online Archive, which will archive email as well as unstructured files. The product will use Oracle’s own database as the underlying infrastructure, with Oracle Fusion Middleware on top for data ingestion and user interface.
Despite the name, the product is on-site software. There will also be an email-only option, Oracle E-Mail Archive Service, which supports Exchange, Notes and SMTP mail. The products are expected to be available sometime this year. The Universal Archive goes for $20 per named user or $75,000 per CPU, while the Email Archive is priced at $50 per named user or $40,000 per CPU.
Not only am I not surprised to see Oracle get into the data archiving space, to be honest, I’m wondering what took them so long. And while writing the previous paragraph, I said “Ouch” a few times–when it was noted that Oracle can archive multiple content types in one repository, which most third-party archivers can’t do yet; when it was noted that Oracle can support not only Notes but SMTP on top of Exchange, which most third party archivers can’t do yet; and again when I saw the steep pricing.
Be that as it may, it’s been well known that databases like SQL are the basis for most third-party archiving software today. It’s also been well known that customers are catching on to archiving for database data as well. Finally, it’s bleedin’ obvious that Exchange is the dominant email platform and the dominant focus in email archiving. And I’ve wondered for a long time why companies like Oracle and Microsoft didn’t get in on this, since they have what seems like a slam dunk: ownership of the application and core technology, and mighty brand power that could conceivably crush the third-party market.
Easy, there, killer, was the response from ESG analyst Brian Babineau, who studies the archiving space. He pointed out that database archiving systems have to understand both the underlying database structure and the overlaying application, something Oracle isn’t doing. They may have an 800-lb. gorilla brand, he said, “but they have a tougher fight because there are native database archiving and native enterprise application vendors.”
To me this still leaves open the question of why Microsoft doesn’t just add archiving to Exchange, but Babineau pointed out the folks from Redmond already dipped a toe into the archiving market with FrontBridge and didn’t get too far. But I still have trouble believing that the Exchange archiving market would last long if Microsoft were to make a stronger move, say by acquiring a company like Mimosa and making stubbing and archiving a part of the Exchange interface.
My previous post about the value-add of online backup got me thinking about another series of conversations I’ve had recently about data storage SaaS in general (more on the compliance and archiving side than in backup, per se).
One value prop I hadn’t really thought about was suggested to me today by Jim Till, CMO of a company called Xythos. Xythos began as a SaaS-architected content management product during the tech bubble, watched that bubble and the market for storage service providers burst, re-architected for on-premise deployment at midsized to large enterprises, and is just now coming full circle with a SaaS offering again. Till said that customers of Xythos’s online product tend to be small organizations or remote and branch offices of larger organizations.
But in addition to the bandwidth issue, Till said, the reason organizations cite for going to a service for storage has little to do with bandwidth or expertise. He says the uptake has been among organizations relatively small in manpower but in “knowledge manager” industries such as tech consulting, law, or medicine. “They tend to be organizations where the biggest challenge is that standard methods of content storage aren’t accessible to distributed groups of people, and they need to uniformly apply policy against distributed content,” he said.
Any organization with data that’s widely distributed is unlikely to have a lot of data in one place. But it’s the distribution of that data, not its size or the experience of data management staff, that makes SaaS make sense, at least from Till’s point of view.
At least one recent case study I did on email archiving SaaS is consistent with this picture, too. For one of Fortiva’s email archiving SaaS customers, the Leukemia and Lymphoma Society, the problem wasn’t a 1.5 TB Exchange store, but 25,00 full and part-time employees receiving 12 million inbound messages a year at 103 different locations.
If this becomes a trend, the landscape of SaaS vendors might extend beyond traditional on-premise backup vendors to those who sell storage consolidation and accessibility over a wide area, such as Riverbed and Silver Peak.
Now, wouldn’t that be fun?
I was very happy to see one of my regular blog-stops, Anil Gupta’s Network Storage, pick up on a recent post I wrote–the one about HP’s new online storage services.
In his response post, Gupta picks up on this graf in particular:
Like most online storage offerings to date, this offering is small in scale and limited in its features when compared with on-premise products. Most analysts and vendors say online storage will be limited by bandwidth constraints and security concerns to the low end of the market, with most services on the market looking a lot like HP Upline.
there is nothing unique in most Online Backup Services that couldn’t be in traditional backup for laptop/desktop. At least traditional backup also come with peace of mind that all backups are stored on company’s own infrastructure. In last few years, I tried over a dozen online backup services in addition to putting up with traditional backup clients for laptop/desktop and I don’t see much difference among the two.
IMO, most online backup services are just taking existing on-premise backup strategy for laptops/desktops and repackaging it to run backups to somebody else’s infrastructure instead of your own.
I see what he’s saying, but in my opinion Gupta probably has “too much” experience with backup clients to necessarily see things from the SMB customer’s point of view. For him, installing a backup client isn’t a big deal–for some, it might be enough of a reason to let somebody else deal with it. Or at least, backup SaaS vendors are hoping so.
In case you’re like me and can’t get enough of the technical nitty-gritty on the new self-healing storage systems from Atrato and Xiotech, here are some tidbits from the cutting room floor so to speak, that didn’t make it into the article I did this week comparing the two systems.
This in particular was a paragraph that could have been fleshed out into a whole separate piece: “Both vendors use various error correction codes to identify potential drive failures, and both said they can work around a bad drive head by storing data on the remaining good sectors of the drive.”
This is where I’m running into each vendor’s unwillingness to expose their IP, which is understandable, and so trying to get to the bottom of this may be a fruitless endeavor. But that’s never stopped me before, so here’s a few more steps down the rabbit hole for those who are interested.
Xiotech’s whitepapers and literature talk a lot about the ANSI T10 DIF (Data Integrity Field), which is part of how its system checks that virtual blocks are written to the right physical disk, and that physical blocks match up with virtual blocks. The standard, which is also used by Seagate, Oracle, LSI and Emulex in their data integrity initiative, adds 8K per 512K block with data integrity information. I asked Xiotech CTO and ISE mastermind Steve Sicola about what kind of overhead that adds to the system, but the only answer I got was that it’s spread out over so many different disk drives working in parallel that it’s not noticeable.
Then along comes Atrato, claiming to base its self-healing technology on a concept from satellite engineering called FDIR, for Fault Detection, Isolation and Recovery. The term was first coined, according to Wikipedia, in relation to the Extended Duration Orbiter in the 90′s.
An Atrato whitepaper reveals three standard codes used for the first step in that process–fault or failure detection. Among them are S.M.A.R.T., which, again according to Wikipedia, “tests all data and all sectors of a drive by using off-line data collection to confirm the drive’s health during periods of inactivity”; SCSI Enclosure Services (SES), which tests non-data characteristics including power and temperature; and the SCSI Request Sense Command, which determines whether drives are SCSI-compliant.
The thing about all of these methods is that they have existed long before either the ISE or Atrato’s Velocity array. There are, of course, key differences between the way the systems are packaged, including the fact that Xiotech puts the controller right next to groups of between 20 and 40 disk drives, and Atrato manages 160 drives at once, but when it comes down to the actual self-healing aspects, the vendors are not disclosing anything about what new codes are being used to supplement those standards.
As Sicola put it to me, “What we’re doing is like S.M.A.R.T., but it goes way beyond that.” How far ‘way beyond that’ actually is, is proprietary. Which is kind of too bad, because it’s hard to tell how much of a hurdle there would be to more entrants in this market.
An analyst I was talking to about these new systems said some are talking about them as a desperation move for Xiotech, which has not exactly been burning down the market in recent years (it reinvented itself once already as an e-Discovery and compliance company after the acquisition of Daticon, which I haven’t heard much about lately).
Then again, others point out, Xiotech has Seagate’s backing (and can start from scratch with clear code on each disk drive, as well as use Seagate’s own drive testing software within the machine. Meanwhile, the ability to adequately market this technology has also been called into question with regards to Atrato.
But while it’s obviously going to take quite some time to assess the real viability of these particular products, it’s exciting for me as an industry observer to see vendors at least trying to do something fundamentally different with the way storage is managed. I think both of them share the same idea, that the individual disk drive is too small a unit to manage at the capacities today’s storage admins are dealing with.
Even if the products don’t perfectly live up to the claims of zero service events in a full three or five years, as ISE beta tester I was speaking with put it, “anything that will make the SAN more reliable has benefits.” It’s pretty easy to get caught up in all the marketechture noise and miss that forest for the trees.
Even further reading: IBM’s Tony Pearson is less than enthused (but has links to lots of other blogs / writeups on this subject)
The inimitable Robin Harris summarizes his thoughts on ISE, and gets an interesting comment from John Spiers of LeftHand Networks (another storage competitor heard from!).
This blog is about three months in the making.
First, a bit of background. Several posts ago, I predicted the death of SATA in favor of SAS, which is only marginally more expensive (not talking the dirt-cheap integrated SATA controllers, but higher-end cache-carrying SATA RAID controllers) for an admittedly smaller capacity but much higher speed.
After using SAS on some of the servers and blades at work, I came home to my SATA-based desktop computer and wept silently whenever I did anything disk-intensive, because it was soooooo much slower. I have SCSI for the OS in all my server equipment, but even those machines weren’t as peppy as the SAS stuff at work. Taking these two things into account, plus the fact that the games I like to play are all disk I/O intensive, then throwing in a bit of friendly rivalry for good measure, I decided to upgrade my desktop machine to use SAS storage.
It’s fairly routine for EMC to certify a multitude of different products as interoperable with its own, based on customer requests. But a recent press release about official compatibility between EMC and a Linux-based mail server positioned as an alternative to Microsoft Exchange made me pay more attention than I usually do to such proclamations.
One thing especially sticks out from this arrangement: several EMC customers, with plenty of Microsoft integration available from EMC’s product line, have instead chosen to go with this alternative mail server. From a startup called PostPath, no less.
Moreover, Barry Ader, EMC’s senior director of product marketing, acknowledged that there are several customers who have asked for the integration. “There are a handful I’m aware of, but there may be more,” was as specific as he would get, but he added, “They tend to be important customers to drive this kind of application work for us.”
EMC’s “important” customers tend to be large. In my book, if more than one important EMC customer is catching on to a product, it might be worth paying attention to.
In and of itself, PostPath’s application is a little bit outside our realm in storage, but it’s the way that the mail server handles storage that chiefly sets it apart from Exchange. According to CEO Duncan Greatwood, PostPath uses a file system (NFS or XFS depending on how servers are attached to storage) rather than the JET database, which allows for more efficient indexing schemes and a more organized layout of data on disk The JET database, which was never designed for the kinds of workloads enterprise Exchange servers are seeing today, has a deadly sequential-reads-with-random-writes issue slowing its storage I/O. PostPath also does a single write when a message is received, as opposed to Exchange, which writes blocks to multiple areas of storage based on different database fields with each message.
What all of this means is that attached to the right storage (ahem), PostPath allows email admins to offer virtually “bottomless” mailboxes to users.
Still, Greatwood acknowledges that he has an uphill battle on his hands. “Most of the Linux-based mail server alternatives to Exchange have not gone very far,” he said. But he maintains a key difference with PostPath is that the product speaks the same language as familiar Microsoft peripherals such as Outlook and Active Directory, so end users don’t have to stop using the tools they’re comfortable with. He also says that with all of Microsoft’s recent antritrust woes, especially in Europe, they’re not keen on crushing upstart competitors lately.
I know that storage managers (to say nothing if admins who have managed Exchange) have been looking for a better mousetrap for quite some time. And cozying up to EMC customers can’t be hurting PostPath’s cause.
HP has taken the wraps off a new online storage service for consumers and small offices, called HP Upline. The service has three levels: Home and Home Office, Family Account and Professional Account. Home accounts include one license, unlimited storage, online backup and basic support for $59 per year; a family account adds 3 licenses and a management dashboard for $149 per year; and a professional account gets 3 licenses, expandable to 100, as well as priority support.
The product is limited to PCs and doesn’t include some of the more advanced features being offered by online storage services such as file versioning. However, it does offer users the ability to tag content for later search and share, and to publish files online using the service through a feature called the Upline Library.
Like most online storage offerings to date, this offering is small in scale and limited in its features when compared with on-premise products. Most analysts and vendors say online storage will be limited by bandwidth constraints and security concerns to the low end of the market, with most services on the market looking a lot like HP Upline. Symantec has focused its backup software as a service (SaaS) within its Windows-centric Backup Exec product, traditionally sold into smaller shops; EMC’s Mozy Enterprise service, despite the name, is at this point recommended only for workstation-level backup. However, a “hybrid” approach for larger shops is now being proposed by EMC.
Wading into bickering between vendors is always fun. My most recent go-round with this has been the AutoCAD compatibility debate between Silver Peak and Riverbed. It began with the difficulties Riverbed users were seeing with optimizing AutoCAD 2007 and 2008 files, and progressed into a weeklong followup process culminating in a conference call between me, Riverbed VP of marketing Alan Saldich, Riverbed chief scientist Mark Day, Silver Peak director of product marketing Jeff Aaron, and Silver Peak CTO David Hughes, which led to this story.
Don’t think this drama’s over yet, either. While on that rather unusual conference call they seemed to reach a consensus that further testing is necessary on both products, neither company has stopped sending little hints my way since that the other guy’s full of it. Meanwhile, another contact I spoke with for the followup story wrote me late last week to suggest they’re both perhaps piling it higher and deeper.
“After reading the back and forth between Silver Peak and Riverbed, and finding neither firms’ claims especially credible, we’ve put forth a public offer to test in a controlled environment,” wrote James Wedding, an Autodesk consultant who blogs at Civil3D.com. “Shockingly, neither company has responded or replied. We have visitors logged from both firms, so they are reading, but no takers. Color me shocked that neither firm wants independent testing on this problem that will continue for a minimum of another year as Autodesk decides to make a change to accommodate the WAN accelerator market.” The Taneja Group has also offered to carry out testing, also with no discernable response from the vendors.
We’re ready when you are, guys.