Storage Soup

Apr 21 2008   4:21PM GMT

Data deduplication: no lifeguard on duty?

Beth Pariseau Beth Pariseau Profile: Beth Pariseau

In the course of a conversation today with a new SRM vendor, ArxScan, CEO Mark Fitzsimmons mentioned a use case for the startup’s product that had me raising my eyebrows: basically, keeping data deduplication systems honest.

According to Fitzsimmons, a large pharma company wanted the Arxscan product to migrate data identified as redundant by the data deduplication system to another repository and present it for review through a centralized GUI, so that the customer could sign off on what data was to be deleted.

“So you’re replacing an automated process in the data center with a manual one?” was the confused reaction from one of my editors on the conference call.

“Well, we’re working on automating it,” was the answer. “But the customer found dedupe applications weren’t working so well, and wanted a chance to look at the data before it’s deleted.”

I’ve heard of some paranoia at the high end of the market about data deduplication systems, particularly when it comes to virtual tape libraries or large companies in sensitive industries like, well, pharmaceuticals. One question I’ve heard brought up more than once by high-end users is about backing up the deduplication index on tape, the better to be able to recover data from disk drives should the deduplicating array fail. But breaking apart the process for better supervision? That’s a new one for me.

Anyone else heard of anything like this? Or is the customer going overboard?

1  Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.
  • Tory Skyers
    In my opinion the customer is not going overboard, I've had similar concerns. I've posed questions to de-dupe resellers about the validity of the data that has been de-duped and would it be able to stand up in court. I haven't gotten an answer that I understand, or better stated the answers that I've gotten haven't been: "Yes" or "No". In industries where regulatory agencies can simply shut your doors, and doing so costs you billions in R&D, if they decide someone has tampered with test results or data collection methods I can completely understand why Pharma would be paranoid. I don't know that breaking up the process will help me sleep better at night, but in the scheme of things is the extra savings in disk worth the risks that block level data manipulation poses? I may sound like a naysayer, but I actually like de-dupe, I think it's a great idea who's time for the spotlight has come, and am a proponent but not the way it seems it's being pitched as a panacea for storage growth. Not every technology is meant for every application and sometimes saving money isn't the primary item on the project request form.
    0 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: