Data Deduplication archives - Storage Soup

Storage Soup:

data deduplication

Oct 27 2009   1:50PM GMT

Industry bloggers debate dedupe to tape



Posted by: Beth Pariseau
data deduplication, tape data storage

It just wouldn’t be the storage industry if there weren’t technical debates popping up on a daily basis.

One that caught my eye today is an ongoing conversation between some storage bloggers about data deduplication to tape, and whether or not it’s a crazy idea. Or, more accurately, whether it’s “good crazy” or “bad crazy.”

Backup expert W. Curtis Preston got things started with a blog written after he visited CommVault’s headquarters in Oceanport, N.J., and discussed the concept of CommVault’s data deduplication to tape feature added in Simpana 8. “Dedupe to tape is definitely crazy.  But is it crazy good or crazy bad?” Preston wrote.

Everyone (including the CommVault folks) agrees that no one would want to do any significant portion of their restores from deduped tape.  But I also agree that if I typically do all my restores from within the last 30 days, and someone asks me for a 31 day-old file, it’s generally going to be the type of restore where the fact that it might take several minutes to complete is not going to be a huge deal.  (In the case that you did need to do a large restore from a deduped tape set, you could actually bring it back in to disk in its entirety before you initiate the restore.)

Now here’s the business case. Anyone who has done consulting in this business for a while has met the customer where everyone knows that 99% of the restores come from the last 30-60 days — and yet they keep their backups for 1-7 years.  What a waste of resources.  CommVault is saying, “Hey.  If you’re going to do that, at least dedupe the tapes.”  They showed me two business cases from two customers that doing this was saving them over $500K per year in their Iron Mountain bill.

Curtis made some declarative statements in that blog post, and when that happens you can expect someone in the storage blogosphere to write a post in opposition. EMC Networker data backup consultant Preston de Guise did the honors this time, with a reponse titled “Dedupe to tape is “crazy bad” if the architecture is crazy.”

Yes, it’s undoubtedly the case that the CommVault approach will reduce the amount of data stored on tape, which will result in some cost savings. However, penny pinching in backup environments has a tendency to result in recovery impacts – often significant recovery impacts. For example, NetBackup gives “media savings” by not enforcing dependencies. Yes, this can result in in saving money here and there on media, but can result in being unable to do complete filesystem recoveries approaching the end of a total retention period, which is plain dumb.

The CommVault approach while saving some money on tape will significantly expand recovery times (or require large cache areas and still take a lot of recovery time). Saving money is good. Wasting a little time during longer-term recoveries is likely to be perceived as being OK – until there’s a pressing need. Wasting a lot of time during longer-term recoveries is rarely going to be perceived as being OK.

An IT admin/blogger writing at Standalone Sysadmin picked up on de Guise’s post and had this to say:

My problem with this is tape failure. If one of the 50 individual backup tapes fails, it’s no problem. Sure, you lose that particular arrangement of the data, but it’s not that big of an issue. Unfortunate, sure, but not tragic. If you lose the 1 tape that contains the deduplicated data, though, then you immediately have a Bad Day(tm).

Essentially, you are betting on one tape not failing over the course of (in the argument of Mr Preston) 7+ years. And if something does happen in that 7 years, whether it’s degaussing, loss, theft, fire, water, or aliens, you don’t lose one backup set. You lose every backup that referenced that set of data.

So I would, if I could afford one, buy a deduplicated storage array in a heartbeat for my backup needs. But I would not trust a deduplcated archival system at all. The odds of loss are too great, and it’s not worth the savings. I’d rather cut the frequency of my backups than save money by making my archives co-dependent.

Of course, another user we talked to around the launch of Simpana 8 felt differently:

The global deduplication with Simpana 8 also extends to tape, making it the first product of its kind to allow for writes to physical tape libraries without requiring reinflation of deduplicated data. “That’s very appealing,” said Paul Spotts, system engineer for Geisinger Health, a network of hospitals and clinics in central Pennsylvania. “We added a VTL [virtual tape library] because we were running out of capacity in our physical tape libraries, but we lease the VTL, so we’re only allowed to grow so much per quarter.”

What say the rest of you?

Oct 16 2009   3:54PM GMT

CommVault fires back at EMC’s Slootman



Posted by: Beth Pariseau
storage vendors, data backup, data deduplication

Former Data Domain CEO Frank Slootman, now president of EMC’s data backup and recovery division, sat down for a Q&A with SearchDataBackup.com that’s been getting some attention from the industry, particularly other deduplication competitors.

Among those competitors, one with a contentious relationship with EMC/Data Domain is former partner CommVault, with whom Data Domain had a messy breakup after CommVault introduced its own deduplication with Simpana 8.

Here’s what Slootman had to say about them:

SearchDataBackup: Will you continue to work closely with Symantec Corp.’s OpenStorage (OST) API now that you’re EMC?

Slootman: Yes. I’m not throwing my partners under the bus. We’ll compete, but we’re all competitors and partners these days. We won’t screw them. We’ll screw other companies, like CommVault. We {Data Domain] treated them as a good partner and they came after us.

In an email to Storage Soup this week, CommVault vice president of marketing and business development Dave West had this response:

As I said back in June, I applaud Frank and Data Domain’s ability to create momentum for deduplication and a tremendous return for its shareholders. In the Dave Raffo piece, Frank calls out CommVault simply because we’re giving them a run for their money. Simpana, with built-in dedupe, works really well, and we are winning business. Now, I find it ludicrous to suggest a product vision that forces a customer to deploy 3 or more disparate products to achieve basic data protection. (Pile on more products for replication, encryption, archive and SRM).  At the end of the day, customers want less complexity, improved operational efficiency and ultimately, to spend less money. That means fewer, not more solutions. Less hardware and smarter software. EMC’s product portfolio is both complicated and costly for customers, so buyer beware. Also, in our opinion, this interview should raise some serious flags among the thousands of already nervous NetWorker customers out there looking for reassurance in the wake of the Data Domain acquisition.

I asked West to elaborate on the “red flags” about NetWorker, and he pointed to this statement by Slootman in another part of the interview:

SearchDataBackup: If Avamar is the future of data backup software, where does that leave NetWorker?

Slootman: Well, Avamar is augmenting NetWorker in a lot of places. People are moving a good part of their workload to Avamar, but not all. They’re still running applications like big, fat databases on traditional backup software. NetWorker can support conventional backup on tape and mixed media and people can integrate it with Data Domain.

“Former EMC customers are telling us that there is no real investment or innovation going into the Networker product and they’re tired of it,” West added.

This dedupe feud will get really interesting if CommVault partner Dell Inc. starts selling Data Domain, which is a likely scenario because Dell sells much of EMC’s storage products. CommVault’s Simpana is currently a big piece of Dell’s deduplication strategy.


Sep 16 2009   1:42PM GMT

Former Avamar chief Walsh to run Storewize



Posted by: Dave Raffo
data optimization, data deduplication

The appointment of Ed Walsh as Storwize CEO this week has people in the storage industry wondering how long it will be until the primary data reduction vendor gets acquired.

EMC bought data deduplication specialist Avamar 17 months after Walsh became its CEO, and it took Walsh 19 months to sell virtualization startup Virtual Iron to Oracle this year. But Walsh says there’s plenty of room for Storwize to grow on its own.

“I think this company has a lot of legs,” he said. “The opportunity is quite large.”

Walsh certainly knows the data reduction space. Outside of Data Domain – also part of EMC now – no company did as much to market data deduplication in its early days as Avamar.

“Data deduplication was not a term until Avamar used it,” Walsh said. “Data Domain called it capacity optimized storage. At Avamar, we had to teach the market that data deduplication was something that you wanted. Now the market is rife. The technology has proven itself.”

While the industry is now filled with backup vendors doing dedupe a la Avamar, only Storwize and startup Ocarina Networks are dedicated solely to reducing primary data. NetApp also has dedupe for primary data on its storage systems, EMC this year added single instance storage to its NAS filers and Riverbed is working on a primary dedupe device – although it’s taking longer than originally thought.

“The difference for primary data is, there’s no tolerance for any performance degradation,” Walsh said. “Storwize really cracked the code on that. We get 6x or 9x improvement and no performance degradation. That gives us a long lead time on the competition.”

Storwize and Ocarina originally referred to their technologies as compression instead of dedupe. They work different than dedupe, and dedupe was considered a secondary storage technology. Ocarina has relented and refers to its product as dedupe now, because that’s the term potential customers want to use.

Storwize has emphasized its STN appliances do compression – not dedupe – but its release announcing the new CEO had the headline, “Deduplication pioneer Ed Walsh takes the reins at Storwize.” Walsh says it really doesn’t matter if it’s called compression or dedupe, as long as it works.

“Everyone does it slightly different,” he says. “In the end, it’s still data reduction.”


Jun 30 2009   5:30PM GMT

Industry watchers place bets on EMC-NetApp-Data Domain love triangle



Posted by: Beth Pariseau
data deduplication, Strategic storage vendors

Other than the extension of EMC’s bid for Data Domain last Friday, the NetApp / Data Domain / EMC drama has begun to simmer along at a more muted pitch than we saw during the initial bid and counter-bid process. For now, the storage industry is in a holding pattern, waiting to see who wins - and looking to place bets.

The prevailing wisdom so far is that, for all the seeming enmity between Data Domain’s management and EMC Corp., the ultimate decision lies with the shareholders, and it’s unlikely shareholders will choose NetApp mixed stock / cash deal over EMC’s all-cash bid. Some shareholders have already filed suit against the Data Domain board, saying the board failed in its responsibility to shareholders by agreeing to be acquired by NetApp.

Talk has also turned to anti-trust due diligence currently being carried out on the proposed deal by government regulators including the FTC.  According to a Reuters report last week,

The U.S. government could hinder EMC Corp’s (EMC.N) $1.8 billion bid for Data Domain Inc
(DDUP.O) as antitrust regulators are expected to scrutinize it more closely than a competing offer by NetApp Inc (NTAP.O).

While by far the bigger company, EMC is in a more precarious antitrust position than its smaller rival because EMC is the largest player in the market for so-called data reduction technology in which Data Domain specializes.

Both bids are being reviewed by the U.S. Federal Trade Commission, but antitrust experts and industry analysts say EMC’s offer could get delayed for weeks or months, while they expect NetApp’s to win quick approval.

However, storage industry analysts say it would be a stretch for antitrust laws to block an EMC acquisition. “It’s tough to unravel,” said Forrester Research analyst Stephanie Balaouras. ”Given [that] dedupe will exist everywhere,  [in both] hardware and software, I think there are plenty of options.”

In the meantime, the Motley Fool published an interesting post yesterday entitled “EMC’s Just Not That Into Data Domain Anymore“:

EMC’s (NYSE: EMC) tender offer for storage efficiency expert Data Domain (Nasdaq: DDUP) was set to expire today, so the company filed an extension until July 10. Data Domain will hold its annual shareholders’ meeting in the meantime. And none of it matters.

As of last Friday, with an already-extended deadline looming large, only 0.28% of Data Duplication’s shares had been tendered to EMC’s offer. That’s tantamount to a vote of “no confidence” in the deal…. it looks like Data Domain’s owners prefer to see the competing NetApp (Nasdaq: NTAP) offer coming to fruition…EMC would have to cough up more cash to win this battle. Even then, EMC might have to resort to downright hostilities if it really wants Data Domain…That’s just not a healthy way to get hitched, unless you want to start planning the divorce party already.

Acrimony is nothing new between NetApp and EMC, of course, but the lack of interest from Data Domain shareholders as pointed out here is quite interesting. After all this, might the original news we reported on a month ago might still wind up being the story, give or take a few hundred million dollars?

Curiouser and curiouser.


May 21 2009   5:25PM GMT

NetApp looks to get one right



Posted by: Dave Raffo
storage vendors, data deduplication

Like all large acquisitions, NetApp’s $1.5 billion purchase of Data Domain leaves a few lingering questions in its wake.

The first is, will this be another acquisition that blows up in NetApp’s face? Let’s face it, NetApp hasn’t hit any home runs in past pickups. A quick look at its track record shows NetApp:

• bought Spinnaker for $300 million five-and-half years ago, and still hasn’t fully integrated the code into its Data OnTap operating system.

• paid $272 million for Decru in 2005, only to be frustrated when the appliance-based encryption market never developed.

• acquired Topio for $160 million in 2006, and discontinued selling its heterogeneous replication software at the end of last year.

NetApp president Tom Georgens is quick to point out the 2008 acquisition of Onaro for $100 million has worked out. Georgens says the SANScreen SRM software NetApp got from Onaro has sold well above expectations in the first year since the deal.

But even counting Onaro as a hit still leaves NetApp with a poor average with acquisitions.

NetApp CEO Dan Warmenhoven found himself on the defensive on the NetApp earnings call Wednesday night when asked about previous acquisitions. “Spinnaker was completely integration — we tried to fuse together two separate technologies,” he said. “That was a much harder problem than we anticipated going in. Decru had a little bit different outcome. While I agree with you it was not to the fulfillment of our expectations, I think it was because we saw that market shift much faster than we thought.”

Neither of those problems should arise with Data Domain, though. The dedupe market is clearly established and growing, and Data Domain has led the charge. No integration is necessary in this case. There may be some integration down the road, but NetApp can sell the Data Domain dedupe boxes while it develops future products. Unlike Spinnaker, Decru and Topio, Data Domain is a public company. It has a strong organization and an accomplished sales force. And as Warmenhoven points out, NetApp already knows how to sell software wrapped in commodity hardware. The odds look good for NetApp in this case.

Another question in the wake of the deal is, will the ripple effects result in more acquisitions? It is sure to renew speculation that EMC will buy out its dedupe partner Quantum, but EMC already has the only thing from Quantum that it wants – its dedupe code. Why should it buy the entire company, unless another suitor forces it into a defensive deal?

The more likely deal would be Hewlett-Packard and Sepaton. HP already sells Sepaton’s dedupe and VTL software, and has a track record of buying companies following successful OEM relationships.

Regardless of what happens next, NetApp’s deal has made a hot tech area even more interesting.


May 20 2009   8:38PM GMT

NetApp drops $1.5 billion for Data Domain



Posted by: Dave Raffo
disk-based backup, data deduplication

Well, NetApp found a way to make money off data deduplication without charging for its primary deduplication licenses.

NetApp acquired Data Domain today for $1.5 billion, giving it the top dedupe revenue-producing product platform after the deal closes in about two to four months.

Earlier this week, NetApp issued a release saying 7,200 customers were using its dedupe for more than 37,000 systems. But those customers aren’t paying for dedupe because NetApp doesn’t charge for dedupe licenses for its primary storage. It does sell virtual tape libraries (VTLs) with separate dedupe licenses, but that platform will likely be phased out now that NetApp has Data Domain’s product line.

NetApp paid $25 per share in cash and stock for Data Domain, well above the $18.08 price Data Domain opened at today.

NetApp and Data Domain both count EMC as their largest competitor, and this will intensify the NetApp-EMC competition. EMC licenses Quantum’s deduplication software for its Disk Library family, and also offers host-based deduplication with its Avamar software. EMC recently moved to challenge NetApp in primary dedupe by adding single instance capability to its Celerra NAS platform.

See our story on SearchStorage for more details.


May 18 2009   9:00AM GMT

Hifn adds speed and software to data reduction cards



Posted by: Beth Pariseau
data reduction, data deduplication

Hifn (now part of Exar Corp.) is taking another crack at getting major OEMs to ship products integrated with its DR line of compression, encryption and deduplication hashing acceleration cards, which could potentially spur the development of primary storage data deduplication offerings.

Prior to its acquisition by Exar, Hifn began sampling Express DR 250 and 255 cards to OEMs, but they hadn’t made their way into any announced third-party products. At this spring’s SNW, Hifn launched its own product based on the DR 255.

It was unclear why the chip boards, which perform processor-intensive data reduction and encryption in silicon, hadn’t caught on with OEMs. Maybe Hifn’s announcement today of its new DR 1600 series may tacitly answer that question with new features such as high availability and boosted performance.

The DR 1600 line consists of six new models offering different levels of performance and combinations of compression, encryption, and dedupe. The Express DR 1600, 1610 and 1620 perform LZS compression and encryption only, at speeds of up to 300 MBps, 900 MBps, and 1800 MBps, respectively. The Express DR 1605, 1615, and 1625 run at the same three levels of throughput, but offer compression, encryption and hardware-based hashing for data deduplication (hash comparisons must still be performed by an OEM in software).

Hifn has also developed new software to go with the cards for this release, which includes a new API to standardize and ease integration of the cards into storage products to make it quicker for OEMs to take them to market. The 1600 series includes new high availability software for failover between cards, or to “pass through” traffic. That means if one card fails, the other can still perform compression, encryption, and dedupe in software.

According to Zack Mihalis, director of product marketing for Hifn, the new cards are sampling to OEMs and will become generally available at the end of July. Mihalis claimed that several large OEMs are considering the cards, potentially for primary storage dedupe. EMC, NetApp and Quantum are traditionally among Hifn’s OEMs, but Mihalis declined to disclose if any of them are sampling the DR 1600 cards.

Still, some industry analysts see this as the first step toward primary storage data reduction products becoming as ubiquitous as those for backup workloads. “Hifn has some very major OEMs as clients,” said IDC analyst Benjamin Woo. “This release is very timely - in this downturn we need to be more efficient with how we deal with data.”

However, Taneja Group analyst Jeff Boles pointed out that there’s still plenty of engineering work to be done to produce primary storage dedupe products, even with some of it already completed by Hifn. “Keep in mind that Hifn is hashing at 1,800 megabytes per second, but that’s not the speed of writing out to disk,” he said. “It’s still up to someone to make maximum use of this on disk, with caching, etc. Can you use this to service a random workload? That may be an engineering feat in itself.”


May 14 2009   1:03PM GMT

CommVault sales slip, looks to cloud for sunnier days



Posted by: Dave Raffo
data protection, data deduplication, Cloud storage

Even with the sales expectation bar lowered due to the economy, CommVault still failed to clear it by a long way last quarter. Now CommVault CEO Bob Hammer is looking for data deduplication and management of storage clouds to pull his company out of its slump.

CommVault’s revenue of $56.1 million last quarter was down 1% from last year and down 7% from the disappointing previous quarter, and well below its previous forecast of $63 million to $67 million. CommVault’s net income of $200,000 for the quarter was down from $6.2 million in the same quarter last year.

Hammer blamed the poor results mainly on the economy, compounded by pricing discounts from his larger competitors Symantec and – to a lesser extent — EMC with its Avamar products.

“The numbers weren’t good,” Hammer told StorageSoup. “We got hit pretty hard clearly, but most of it was the economy. We found customers freezing budgets, reducing budgets, reducing capex. We also saw more competitive pricing pressures, but the big issue was the market locked up.”

The good news, Hammer says, is CommVault has already seen a thaw in spending budgets and strong interest in sales of Simpana 8 driven by deduplication. CommVault released Simpana 8 in late January, and its large OEM partners Dell and Hitachi Data Systems will begin selling it this quarter.

CommVault’s internal goals call for revenues to increase in double-digit percentages this quarter, but the company lacked the confidence to give any forecast. Hammer did say many customers’ budget restrictions have lifted.

“It’s too early to call this a big thaw, but it looks like the fundamentals are in place,” Hammer said. “The whole psychology is lot more positive. Budgets are there and customers are initiating projects. There’s still budget scrutiny, but it seems to be a lot easier to work with customers to close the deal.”

Hammer said CommVault shuffled its workforce to try to increase revenue by placing more people in sales and reducing other areas. The vendor will also offer “more flexible” pricing and payment models to counter what Hammer calls Symantec’s “kill CommVault in the cradle” discount programs. CommVault’s average selling price dropped to around $200,000 last quarter from $250,000 the previous quarter.

Hammer said Simpana 8 gained several hundred customers in the quarter, including more than 100 for its block-level dedupe. He says the software dedupe product had a high win rate against dedupe appliances from Data Domain, Quantum and others.

“The release was extremely successful, which sounds interesting given that we missed our number,” Hammer said.

CommVault is already looking to Simpana 9, which will likely be in beta late this year and in general release in mid-2010. The concentration will be on helping service providers managing storage in the cloud. Hammer says managed service providers are already a fast-growing segment of CommVault’s customer base.

“Storage clouds represent a natural target for Simpana,” he said. “There is no universal automated platform to manager internal and external clouds in a large global enterprise. We’ve been working on several innovative concepts to enable Simpana to be the first fully automated platform to deal with key aspects of cloud computing.”


Jul 30 2008   3:51PM GMT

Sepaton promises 40:1 dedupe, or your next disk’s on them



Posted by: Dave Raffo
data deduplication, data backup

Just about all data deduplication vendors make claims about the dedupe ratios their systems provide, with the caveat that the ratios vary by data type and backup frequency.

Sepaton today says it’s willing to guarantee its ratio for Exchange. The VTL vendor said if customers don’t get a 40:1 ratio with its DeltaStor dedupe software in 30 days, it will throw in a free disk shelf with at least 7.5 TB of capacity - a $50,000-plus value.

There are some conditions. First, the customer must use Symantec NetBackup for now, because that’s the only backup software DeltaStor supports. And the customer must do daily full backups, which result in better reduction than incremental backups. The guarantee is part of what Sepaton calls a FastStart Deduplication Package for Symantec NetBackup, consisting of an S2100-ES2 library with 20 TB and DeltaStor.

Analysts who closely follow the dedupe market say Sepaton deserves credit for making the guarantee, but isn’t exactly sticking its neck out. Because Exchange includes a lot of messages sent to multiple recipients with attachments, it tends to have a great deal of duplicated data that can be reduced.

“To guarantee anything takes guts,” Arun Taneja of the Taneja Group said. “It’s a good marketing strategy for them to set the trend and draw a line in the sand. But for full backups for email for 30 days, 40:1 is very achievable. So I would say it’s not a very large risk.”

Glasshouse backup guru Curtis Preston agrees. “I think it’s a great idea and I doubt they would have done it if they hadn’t already done a lot of testing to verify they actually can get more than 40:1 in most Exchange environments,” he said. “There is a lot of duplicate data in Exchange.”
Sepaton director of product management Jim Shocrylas said the 20 TB system would give a customer with 4 TB of daily full backups a retention period of about half a year. He said the guarantee applies to full backups because Microsoft’s best practice recommendation for backing up Exchange is daily fulls.

“This is first of a number of guarantees we’ll be coming up with for specific data,” Shocrylas said. “Others will follow.”


May 12 2008   3:33PM GMT

VendorFights: Data Deduplication Edition



Posted by: Beth Pariseau
data deduplication

With data deduplication in the news today, I recommend checking out the responses to Jon Toigo’s questionnaire for data deduplication vendors. I found his questions about backing up deduped data to tape and the potential legal ramifications of changing data through dedupe especially interesting. The responses from the vendors so far about hardware-based hashing are also interesting, in that they seem to break down according to whether or not their companies offer a hardware- or software-based product.

It would be pretty disappointing if Hifn’s announcement of hardware-based hashing led to a religious war around software- vs. hardware-based dedupe systems. It’s clear (and has been generally accepted, or so I thought) that hardware performs better than software, meaning it’s in users’ best interest to improve the throughput of data deduplication systems by moving processor-intensive calculations to hardware. And the dedupe market is full of enough FUD as it is.

Speaking of which, Data Domain and EMC are getting all slapper-fight about dedupe thanks to today’s product announcement from Data Domain (and attendant comparisons to EMC/Avamar), and the fact that EMC is planning to finally roll out deduping tape libraries at EMC World (based on Quantum’s dedupe).

EMC blogger Storagezilla calls the statement by DD in a press release that its new product is 17 times faster than Avamar’s RAIN grid “nose gold” (props for the phraseology, at least), and then points out that Avamar’s back end doesn’t actually do any deduping, which is something I still don’t quite get.

So Data Domain’s box is faster at de-dup than the Avamar back end which doesn’t do any de-dup.

Since the de-dup is host based and only globally unique data leaves the NIC do I get to count the aggregate de-dup performance of all the hosts being backed up?

Yes, I do!

How does Avamar decide what data is ‘globally unique’? If this is determined before data leaves the host, than that processing must be done at the host. ‘Zilla even says he can count the aggregate performance of all the hosts being backed up in the dedupe performance equation. . .which brings me back to the first point again: Avamar’s back end doesn’t do de-dupe, but it’s faster at dedupe than Data Domain anyway?

Chris Mellor explored this further:

Accrding to EMC, Avamar moves data at 10 GB/hr per node (moving unique sub-file data only). Avamar reduces typical file system data by 99.7 percent or more, so only 0.3 percent is moved daily in comparison to the amount that Data Domain has to move in conjunction with traditional backup software. This equals a 333x reduction compared to a traditional full backup (Avamar has customer data indicating as much as 500X, but 333X is a good average).

‘An EMC spokesperson’ (should we assume it was, or wasn’t, Storagezilla himself?) further stated to Mellor:

“Remember that Data Domain has to move all of the data to the box, so naturally they’re focusing on getting massive amounts of data in quickly. EMC Avamar never has to move all of that data, so instead we focus on de-dupe efficiency, high-availability and ease of restore. Attributes that are more meaningful to the customer concerned with effective backup operations. “

Again I ask, where does the determination that data is ‘globally unique’ take place? It’s got to be taking up processor cycles somewhere. The rate at which it makes those determinations, and where it makes those determinations, would be the apples-to-apples comparison with DD, which is making those calculations as data is fed into its single-box system.

All of that is overlooking that the real meat and potatoes when it comes to dedupe is single-stream performance, anyway — total aggregate throughput over groups of nodes (which is really what both vendors are talking about) doesn’t mean as much. For one thing, Data Domain’s aggregate isn’t really aggregate, because it doesn’t have a global namespace yet. For another, I fail to see how EMC can even quote an aggregate TB/hr figure when talking about a group of networked nodes. Doesn’t network speed factor in pretty heavily to that equation?

Personally, I don’t think either vendor is really putting it on the line in this discussion (c’mon guys, get MAD out there ;)!). And if Avamar really performs better than Data Domain, why isn’t its dedupe IP being used in EMC’s forthcoming VTLs? (EMC continues to deny this officially, or at least refuses to confirm, but there’s internal documentation floating around at this point that indicates Quantum is the partner.)

Meanwhile, according to EMC via Mellor:

EMC says Data Domain continues to compare apples and oranges because it wants to avoid the discussion that there are a number of different backup solutions that fit a variety of unique customer use cases.

I have to admit this made me chuckle. Most of the discussions I’ve had about EMC over the last year or so have involved their numerous backup and replication products and what the heck they’re going to do with them all long-term. Finally, it seems we have an answer: Turn it into a marketing talking point!

I don’t think Data Domain even really wants to avoid that subject, either. They’re well aware that there are a number of different products out there that fit different use cases, given their positioning specifically for SMBs who want to eliminate tape.

At the same time, it’s interesting to watch the EMC marketing machine fire itself up in anticipation of a new major announcement–the scale and coordination are something to behold. This market has already been a contentious one. It’ll be interesting to see what happens now that EMC’s throwing more of its chips on the table.