Storage Channel Pipeline

Oct 4 2010   10:33AM GMT

Primary deduplication’s effect on data integrity, performance

Eric Slack Eric Slack Profile: Eric Slack

Dedupe has been with us for the better part of 10 years. Because of the percentage of duplicated data in the backup space, it was deployed there first. But risk also played a part in its appearance first in backup. Let’s face it: If your dedupe box craters, it’s still just a backup that’s lost. As a technology matures, it gets more stable, and users start looking for new places to apply it. Generally, their expectations of how much impact it will have (in this case, how much space it will save) also decreases. It’s kind of a risk-reward scenario. That explains why primary deduplication is getting attention these days.


The thought of primary deduplication certainly came up early on in the adoption cycle of the technology, but there were plenty of “high-value targets” for the dedupe vendors to go after in backup. When it was first introduced to backup customers, they were promised effective data reduction in the double digits — in the high double digits for some data sets — and by and large they got it. While dedupe has certainly not been adopted by everyone (current estimates hover around one-third for market penetration), dedupe vendors seem to be ready to move on.


Now with non-backup data, data reduction numbers are firmly in the single digits. While this wouldn’t have sold may dedupe appliances for Data Domain 10 years ago, it seems to be enough for this second wave of vendors in the primary storage space. But some have questioned the ability of dedupe to perform as a real-time data optimization technology, as it would be in this application. In order for dedupe to become a common technology in primary storage, users would have to be confident it would have no effect on data integrity and no appreciable impact on performance.


It turns out this isn’t as much of a stretch as many think, once they look at how dedupe works under the covers. Essentially, dedupe uses technologies similar to those developed for file systems and snapshots (or clones). It leverages extents, metadata management and references to existing data blocks that primary storage users have relied on for years.


As a VAR, you’ve probably seen primary deduplication included in some of your current vendors’ products, or at least you’ve been exposed to it in the news.  Dell recently purchased Ocarina Networks, which developed a primary deduplication technology. BlueArc and Xiotech have announced partnerships with Permabit to incorporate dedupe into their offerings. Based on the number of companies putting dedupe into their primary storage solutions, one would conclude that it’s not just for backups anymore.


Follow me on Twitter: EricSSwiss

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: