Posted by: Beth Pariseau
data deduplication, tape data storage
It just wouldn’t be the storage industry if there weren’t technical debates popping up on a daily basis.
One that caught my eye today is an ongoing conversation between some storage bloggers about data deduplication to tape, and whether or not it’s a crazy idea. Or, more accurately, whether it’s “good crazy” or “bad crazy.”
Backup expert W. Curtis Preston got things started with a blog written after he visited CommVault’s headquarters in Oceanport, N.J., and discussed the concept of CommVault’s data deduplication to tape feature added in Simpana 8. “Dedupe to tape is definitely crazy. But is it crazy good or crazy bad?” Preston wrote.
Everyone (including the CommVault folks) agrees that no one would want to do any significant portion of their restores from deduped tape. But I also agree that if I typically do all my restores from within the last 30 days, and someone asks me for a 31 day-old file, it’s generally going to be the type of restore where the fact that it might take several minutes to complete is not going to be a huge deal. (In the case that you did need to do a large restore from a deduped tape set, you could actually bring it back in to disk in its entirety before you initiate the restore.)
Now here’s the business case. Anyone who has done consulting in this business for a while has met the customer where everyone knows that 99% of the restores come from the last 30-60 days — and yet they keep their backups for 1-7 years. What a waste of resources. CommVault is saying, “Hey. If you’re going to do that, at least dedupe the tapes.” They showed me two business cases from two customers that doing this was saving them over $500K per year in their Iron Mountain bill.
Curtis made some declarative statements in that blog post, and when that happens you can expect someone in the storage blogosphere to write a post in opposition. EMC Networker data backup consultant Preston de Guise did the honors this time, with a reponse titled “Dedupe to tape is “crazy bad” if the architecture is crazy.”
Yes, it’s undoubtedly the case that the CommVault approach will reduce the amount of data stored on tape, which will result in some cost savings. However, penny pinching in backup environments has a tendency to result in recovery impacts – often significant recovery impacts. For example, NetBackup gives “media savings” by not enforcing dependencies. Yes, this can result in in saving money here and there on media, but can result in being unable to do complete filesystem recoveries approaching the end of a total retention period, which is plain dumb.
The CommVault approach while saving some money on tape will significantly expand recovery times (or require large cache areas and still take a lot of recovery time). Saving money is good. Wasting a little time during longer-term recoveries is likely to be perceived as being OK – until there’s a pressing need. Wasting a lot of time during longer-term recoveries is rarely going to be perceived as being OK.
An IT admin/blogger writing at Standalone Sysadmin picked up on de Guise’s post and had this to say:
My problem with this is tape failure. If one of the 50 individual backup tapes fails, it’s no problem. Sure, you lose that particular arrangement of the data, but it’s not that big of an issue. Unfortunate, sure, but not tragic. If you lose the 1 tape that contains the deduplicated data, though, then you immediately have a Bad Day(tm).
Essentially, you are betting on one tape not failing over the course of (in the argument of Mr Preston) 7+ years. And if something does happen in that 7 years, whether it’s degaussing, loss, theft, fire, water, or aliens, you don’t lose one backup set. You lose every backup that referenced that set of data.
So I would, if I could afford one, buy a deduplicated storage array in a heartbeat for my backup needs. But I would not trust a deduplcated archival system at all. The odds of loss are too great, and it’s not worth the savings. I’d rather cut the frequency of my backups than save money by making my archives co-dependent.
Of course, another user we talked to around the launch of Simpana 8 felt differently:
The global deduplication with Simpana 8 also extends to tape, making it the first product of its kind to allow for writes to physical tape libraries without requiring reinflation of deduplicated data. “That’s very appealing,” said Paul Spotts, system engineer for Geisinger Health, a network of hospitals and clinics in central Pennsylvania. “We added a VTL [virtual tape library] because we were running out of capacity in our physical tape libraries, but we lease the VTL, so we’re only allowed to grow so much per quarter.”
What say the rest of you?