In the course of a conversation today with a new SRM vendor, ArxScan, CEO Mark Fitzsimmons mentioned a use case for the startup’s product that had me raising my eyebrows: basically, keeping data deduplication systems honest.
According to Fitzsimmons, a large pharma company wanted the Arxscan product to migrate data identified as redundant by the data deduplication system to another repository and present it for review through a centralized GUI, so that the customer could sign off on what data was to be deleted.
“So you’re replacing an automated process in the data center with a manual one?” was the confused reaction from one of my editors on the conference call.
“Well, we’re working on automating it,” was the answer. “But the customer found dedupe applications weren’t working so well, and wanted a chance to look at the data before it’s deleted.”
I’ve heard of some paranoia at the high end of the market about data deduplication systems, particularly when it comes to virtual tape libraries or large companies in sensitive industries like, well, pharmaceuticals. One question I’ve heard brought up more than once by high-end users is about backing up the deduplication index on tape, the better to be able to recover data from disk drives should the deduplicating array fail. But breaking apart the process for better supervision? That’s a new one for me.
Anyone else heard of anything like this? Or is the customer going overboard?
Until now, IBM’s VTL partner was FalconStor while Diligent supplies Hitachi Data Systems and Overland Storage. IBM and HDS say they’re still ommitted to HDS selling Diligent’s ProtecTier software even though it’s now owned by IBM.
“We don’t’ see any change with Diligent, the agreements we’ve had with them will continue,” HDS CTO Hu Yoshida said today. He compared the situation to EMC buying VMware. “VMware works well for us, we drive a lot of business from VMware,” he said. “This is a new world, we’re in an era of coopetition.”
Still, you don’t have to squint hard when reading the statement HDS issued today to see plenty of wiggle room:
The ProtecTIER software from Diligent Technologies offers very tailored data de-duplication technology that addresses only a fraction of the overall business continuity and disaster recovery capabilities that our customers require. This product comprises a single component of the broader portfolio of market-leading back-up and data protection solutions that Hitachi Data Systems offers its customers.
In other words, HDS is saying it could get by fine without ProtectTier. Where else can HDS go if it wants to switch? FalconStor is certainly available now that IBM has Diligent and EMC is partnered with Quantum for dedupe. Sun and Copan sell FalconStor data dedupe software, but Sun and Copan don’t exactly equate to EMC and IBM for disk backup market share. There’s also Sepaton, which has a VTL OEM deal with Hewlett-Packard — although HP has yet to offer Sepaton’s dedupe software.
One dedupe vendor not looking for an OEM partner is Data Domain, which rode deduplication backup products from stealth to IPO in four years and is generally considered the dedupe market leader. Data Domain CEO Frank Slootman says his channel partners wouldn’t appreciate competition from OEMs.
“That’s a decision you have to make early on as a company,” Slootman said. “If you start off as a channel company like we are, it’s difficult to run an OEM model right alongside it because they are incompatible. OEM usually means death to your channel.”
Two press releases caught my eye this week that aren’t exactly earth-shattering, but got me thinking about the way the storage market is changing and widening.
First, SanDisk revealed that its flash cards are recording footage of an excursion to Everest by a three-member climbing team sponsored by Dell, Windows Vista, MSN and MSNBC. Here’s a media gallery of the chilly-looking expedition so far.
Then there was also an announcement from RAID, Inc. of its compact Razor RAID array using 2.5-inch SAS drives, billed as “ideal for small spaces such as cockpits, tanks, submarines and other civilian applications with specific space constraints.” The ‘cockpits’ idea got my imagination going.
Between flash memory, with fewer moving parts and power requirements, and small-form-factor hard disks, not to mention the continued increase in content we store digitally, enterprise-level data storage is worming its way into unheard-of environments. As such, many in the industry have been predicting an increasing focus on edge devices, mobile computing environments and the mobile workforce for the storage market. Hopefully enterprise storage managers are paying attention to these new frontiers while architecting storage at headquarters.
Also, since it’s Friday, and who couldn’t use a laugh? Check out this priceless Gizmodo post on an internal Microsoft sales video that recently made its awkward YouTube debut. Key line: “You’ve gotta wonder how, in a company the size of Microsoft, there’s not a single person who [can] step up and say “Hey, you know what? This Vista music video we’re making for the sales department, complete with a cheesy Bruce Springsteen impersonator and horrible music, damages the dignity of not only everyone involved in its production, but everyone who watches it.”
Disk vs. tape is not a new argument, but over time it takes on different permutations, especially as disk-based backup in its various forms gains popularity and new technologies get introduced like data deduplication that bring some of the economics of disk closer to those of tape.
One theme I’ve heard cropping up in this discussion among high-end vendors lately is the idea of people in large enterprises deploying vast amounts of disk for backup, then realizing the cost inefficiencies, and space and power requirements of disk, and finally running back to tape either alongside or as a replacement for disk.
This back-and-forth popped up again in post written by IBM’s Tony Pearson in response to a post written by Hitachi Data Systems’ Hu Yoshida. Yoshida’s post referred to a conversation with a storage admin at SNW who said his robotic tape libraries were actually drawing more power than his enterprise VTL.
This idea makes Pearson sputter:
I am not disputing [the] approach. It is possible that [the user] is using a poorly written backup program, taking full backups every day, to an older non-IBM tape library, in a manner that causes no end of activity to the poor tape robotics inside. But rather than changing over to a VTL, perhaps Mark might be better off investigating the use of IBM Tivoli Storage Manager, using progressive backup techniques, appropriate policies, parameters and settings, to a more energy-efficient IBM tape library. In well tuned backup workloads, the robotics are not very busy. The robot mounts the tape, and then the backup runs for a long time filling up that tape, all the meanwhile the robot is idle waiting for another request.
The weird thing is, I’ve heard plenty of vendors debating this of their own accord, usually taking sides along product lines with tape-centric vendors taking the position Pearson did, and vendors who sell disk for secondary storage taking the opposite view.
But I’m curious. I’m sure there’s some middle ground where the advantages and disadvantages just depend on personal preferences. But might there really be a trend here? Are users finding problems with disk-based systems and re-integrating tape? How many organizations really even left tape totally behind to begin with? And how do new data reduction/power reduction technologies change the equation? One thing not addressed by either Pearson or Yoshida’s post is where MAID might come into this argument, as well as the potential combination of MAID and dedupe.
Dell’s been so acquisitive in storage lately that every new announcement from them, especially about partnerships, has me paying attention. I don’t believe that their buying spree is necessarily over.
This week, Dell certified ExaGrid’s diskless iSCSI deduplication gateway with EqualLogic’s iSCSI SAN for secondary dedupe storage. ExaGrid claims this is the first iSCSI-based deduplication gateway. Data Domain also sells a gateway, but it’s for FC. NetApp’s deduplication works on its V-series gateways, but isn’t separable from the OnTap OS.
Still, given the concern about performance for even FC-based dedupe systems, I wonder what the appeal is of an iSCSI dedupe system based on a gateway. It seems Dell is still sussing this out, too. Senior manager for Dell/EqualLogic product marketing Kevin Wittmer said Dell will not resell or support the combined product. It instead will be sold entirely through ExaGrid channels (the two have mutual channel partners).
Wittmer said this was a project begun on the ExaGrid side before Dell acquired EqualLogic, and added “you’re going to see Dell paying attention to this market space.”
Does that mean Dell will try to make its own foray into deduplication? In other words, is this ExaGrid partnership a test to see if the technology is worth acquiring?
“We will continue to look at the market space,” Wittmer responded. “I don’t want to go into detail right now on Dell’s product strategy.”
Then Wittmer said another thing that you could take one of several ways – “… it has much bigger implications that could impact all of Dell.”
Oracle is getting into the archiving game with the Oracle Universal Online Archive, which will archive email as well as unstructured files. The product will use Oracle’s own database as the underlying infrastructure, with Oracle Fusion Middleware on top for data ingestion and user interface.
Despite the name, the product is on-site software. There will also be an email-only option, Oracle E-Mail Archive Service, which supports Exchange, Notes and SMTP mail. The products are expected to be available sometime this year. The Universal Archive goes for $20 per named user or $75,000 per CPU, while the Email Archive is priced at $50 per named user or $40,000 per CPU.
Not only am I not surprised to see Oracle get into the data archiving space, to be honest, I’m wondering what took them so long. And while writing the previous paragraph, I said “Ouch” a few times–when it was noted that Oracle can archive multiple content types in one repository, which most third-party archivers can’t do yet; when it was noted that Oracle can support not only Notes but SMTP on top of Exchange, which most third party archivers can’t do yet; and again when I saw the steep pricing.
Be that as it may, it’s been well known that databases like SQL are the basis for most third-party archiving software today. It’s also been well known that customers are catching on to archiving for database data as well. Finally, it’s bleedin’ obvious that Exchange is the dominant email platform and the dominant focus in email archiving. And I’ve wondered for a long time why companies like Oracle and Microsoft didn’t get in on this, since they have what seems like a slam dunk: ownership of the application and core technology, and mighty brand power that could conceivably crush the third-party market.
Easy, there, killer, was the response from ESG analyst Brian Babineau, who studies the archiving space. He pointed out that database archiving systems have to understand both the underlying database structure and the overlaying application, something Oracle isn’t doing. They may have an 800-lb. gorilla brand, he said, “but they have a tougher fight because there are native database archiving and native enterprise application vendors.”
To me this still leaves open the question of why Microsoft doesn’t just add archiving to Exchange, but Babineau pointed out the folks from Redmond already dipped a toe into the archiving market with FrontBridge and didn’t get too far. But I still have trouble believing that the Exchange archiving market would last long if Microsoft were to make a stronger move, say by acquiring a company like Mimosa and making stubbing and archiving a part of the Exchange interface.
My previous post about the value-add of online backup got me thinking about another series of conversations I’ve had recently about data storage SaaS in general (more on the compliance and archiving side than in backup, per se).
One value prop I hadn’t really thought about was suggested to me today by Jim Till, CMO of a company called Xythos. Xythos began as a SaaS-architected content management product during the tech bubble, watched that bubble and the market for storage service providers burst, re-architected for on-premise deployment at midsized to large enterprises, and is just now coming full circle with a SaaS offering again. Till said that customers of Xythos’s online product tend to be small organizations or remote and branch offices of larger organizations.
But in addition to the bandwidth issue, Till said, the reason organizations cite for going to a service for storage has little to do with bandwidth or expertise. He says the uptake has been among organizations relatively small in manpower but in “knowledge manager” industries such as tech consulting, law, or medicine. “They tend to be organizations where the biggest challenge is that standard methods of content storage aren’t accessible to distributed groups of people, and they need to uniformly apply policy against distributed content,” he said.
Any organization with data that’s widely distributed is unlikely to have a lot of data in one place. But it’s the distribution of that data, not its size or the experience of data management staff, that makes SaaS make sense, at least from Till’s point of view.
At least one recent case study I did on email archiving SaaS is consistent with this picture, too. For one of Fortiva’s email archiving SaaS customers, the Leukemia and Lymphoma Society, the problem wasn’t a 1.5 TB Exchange store, but 25,00 full and part-time employees receiving 12 million inbound messages a year at 103 different locations.
If this becomes a trend, the landscape of SaaS vendors might extend beyond traditional on-premise backup vendors to those who sell storage consolidation and accessibility over a wide area, such as Riverbed and Silver Peak.
Now, wouldn’t that be fun?
I was very happy to see one of my regular blog-stops, Anil Gupta’s Network Storage, pick up on a recent post I wrote–the one about HP’s new online storage services.
In his response post, Gupta picks up on this graf in particular:
Like most online storage offerings to date, this offering is small in scale and limited in its features when compared with on-premise products. Most analysts and vendors say online storage will be limited by bandwidth constraints and security concerns to the low end of the market, with most services on the market looking a lot like HP Upline.
there is nothing unique in most Online Backup Services that couldn’t be in traditional backup for laptop/desktop. At least traditional backup also come with peace of mind that all backups are stored on company’s own infrastructure. In last few years, I tried over a dozen online backup services in addition to putting up with traditional backup clients for laptop/desktop and I don’t see much difference among the two.
IMO, most online backup services are just taking existing on-premise backup strategy for laptops/desktops and repackaging it to run backups to somebody else’s infrastructure instead of your own.
I see what he’s saying, but in my opinion Gupta probably has “too much” experience with backup clients to necessarily see things from the SMB customer’s point of view. For him, installing a backup client isn’t a big deal–for some, it might be enough of a reason to let somebody else deal with it. Or at least, backup SaaS vendors are hoping so.
In case you’re like me and can’t get enough of the technical nitty-gritty on the new self-healing storage systems from Atrato and Xiotech, here are some tidbits from the cutting room floor so to speak, that didn’t make it into the article I did this week comparing the two systems.
This in particular was a paragraph that could have been fleshed out into a whole separate piece: “Both vendors use various error correction codes to identify potential drive failures, and both said they can work around a bad drive head by storing data on the remaining good sectors of the drive.”
This is where I’m running into each vendor’s unwillingness to expose their IP, which is understandable, and so trying to get to the bottom of this may be a fruitless endeavor. But that’s never stopped me before, so here’s a few more steps down the rabbit hole for those who are interested.
Xiotech’s whitepapers and literature talk a lot about the ANSI T10 DIF (Data Integrity Field), which is part of how its system checks that virtual blocks are written to the right physical disk, and that physical blocks match up with virtual blocks. The standard, which is also used by Seagate, Oracle, LSI and Emulex in their data integrity initiative, adds 8K per 512K block with data integrity information. I asked Xiotech CTO and ISE mastermind Steve Sicola about what kind of overhead that adds to the system, but the only answer I got was that it’s spread out over so many different disk drives working in parallel that it’s not noticeable.
Then along comes Atrato, claiming to base its self-healing technology on a concept from satellite engineering called FDIR, for Fault Detection, Isolation and Recovery. The term was first coined, according to Wikipedia, in relation to the Extended Duration Orbiter in the 90’s.
An Atrato whitepaper reveals three standard codes used for the first step in that process–fault or failure detection. Among them are S.M.A.R.T., which, again according to Wikipedia, “tests all data and all sectors of a drive by using off-line data collection to confirm the drive’s health during periods of inactivity”; SCSI Enclosure Services (SES), which tests non-data characteristics including power and temperature; and the SCSI Request Sense Command, which determines whether drives are SCSI-compliant.
The thing about all of these methods is that they have existed long before either the ISE or Atrato’s Velocity array. There are, of course, key differences between the way the systems are packaged, including the fact that Xiotech puts the controller right next to groups of between 20 and 40 disk drives, and Atrato manages 160 drives at once, but when it comes down to the actual self-healing aspects, the vendors are not disclosing anything about what new codes are being used to supplement those standards.
As Sicola put it to me, “What we’re doing is like S.M.A.R.T., but it goes way beyond that.” How far ‘way beyond that’ actually is, is proprietary. Which is kind of too bad, because it’s hard to tell how much of a hurdle there would be to more entrants in this market.
An analyst I was talking to about these new systems said some are talking about them as a desperation move for Xiotech, which has not exactly been burning down the market in recent years (it reinvented itself once already as an e-Discovery and compliance company after the acquisition of Daticon, which I haven’t heard much about lately).
Then again, others point out, Xiotech has Seagate’s backing (and can start from scratch with clear code on each disk drive, as well as use Seagate’s own drive testing software within the machine. Meanwhile, the ability to adequately market this technology has also been called into question with regards to Atrato.
But while it’s obviously going to take quite some time to assess the real viability of these particular products, it’s exciting for me as an industry observer to see vendors at least trying to do something fundamentally different with the way storage is managed. I think both of them share the same idea, that the individual disk drive is too small a unit to manage at the capacities today’s storage admins are dealing with.
Even if the products don’t perfectly live up to the claims of zero service events in a full three or five years, as ISE beta tester I was speaking with put it, “anything that will make the SAN more reliable has benefits.” It’s pretty easy to get caught up in all the marketechture noise and miss that forest for the trees.
Even further reading: IBM’s Tony Pearson is less than enthused (but has links to lots of other blogs / writeups on this subject)
The inimitable Robin Harris summarizes his thoughts on ISE, and gets an interesting comment from John Spiers of LeftHand Networks (another storage competitor heard from!).
This blog is about three months in the making.
First, a bit of background. Several posts ago, I predicted the death of SATA in favor of SAS, which is only marginally more expensive (not talking the dirt-cheap integrated SATA controllers, but higher-end cache-carrying SATA RAID controllers) for an admittedly smaller capacity but much higher speed.
After using SAS on some of the servers and blades at work, I came home to my SATA-based desktop computer and wept silently whenever I did anything disk-intensive, because it was soooooo much slower. I have SCSI for the OS in all my server equipment, but even those machines weren’t as peppy as the SAS stuff at work. Taking these two things into account, plus the fact that the games I like to play are all disk I/O intensive, then throwing in a bit of friendly rivalry for good measure, I decided to upgrade my desktop machine to use SAS storage.