Tape Data Storage archives - Storage Soup

Storage Soup:

tape data storage

Oct 27 2009   1:50PM GMT

Industry bloggers debate dedupe to tape



Posted by: Beth Pariseau
data deduplication, tape data storage

It just wouldn’t be the storage industry if there weren’t technical debates popping up on a daily basis.

One that caught my eye today is an ongoing conversation between some storage bloggers about data deduplication to tape, and whether or not it’s a crazy idea. Or, more accurately, whether it’s “good crazy” or “bad crazy.”

Backup expert W. Curtis Preston got things started with a blog written after he visited CommVault’s headquarters in Oceanport, N.J., and discussed the concept of CommVault’s data deduplication to tape feature added in Simpana 8. “Dedupe to tape is definitely crazy.  But is it crazy good or crazy bad?” Preston wrote.

Everyone (including the CommVault folks) agrees that no one would want to do any significant portion of their restores from deduped tape.  But I also agree that if I typically do all my restores from within the last 30 days, and someone asks me for a 31 day-old file, it’s generally going to be the type of restore where the fact that it might take several minutes to complete is not going to be a huge deal.  (In the case that you did need to do a large restore from a deduped tape set, you could actually bring it back in to disk in its entirety before you initiate the restore.)

Now here’s the business case. Anyone who has done consulting in this business for a while has met the customer where everyone knows that 99% of the restores come from the last 30-60 days — and yet they keep their backups for 1-7 years.  What a waste of resources.  CommVault is saying, “Hey.  If you’re going to do that, at least dedupe the tapes.”  They showed me two business cases from two customers that doing this was saving them over $500K per year in their Iron Mountain bill.

Curtis made some declarative statements in that blog post, and when that happens you can expect someone in the storage blogosphere to write a post in opposition. EMC Networker data backup consultant Preston de Guise did the honors this time, with a reponse titled “Dedupe to tape is “crazy bad” if the architecture is crazy.”

Yes, it’s undoubtedly the case that the CommVault approach will reduce the amount of data stored on tape, which will result in some cost savings. However, penny pinching in backup environments has a tendency to result in recovery impacts – often significant recovery impacts. For example, NetBackup gives “media savings” by not enforcing dependencies. Yes, this can result in in saving money here and there on media, but can result in being unable to do complete filesystem recoveries approaching the end of a total retention period, which is plain dumb.

The CommVault approach while saving some money on tape will significantly expand recovery times (or require large cache areas and still take a lot of recovery time). Saving money is good. Wasting a little time during longer-term recoveries is likely to be perceived as being OK – until there’s a pressing need. Wasting a lot of time during longer-term recoveries is rarely going to be perceived as being OK.

An IT admin/blogger writing at Standalone Sysadmin picked up on de Guise’s post and had this to say:

My problem with this is tape failure. If one of the 50 individual backup tapes fails, it’s no problem. Sure, you lose that particular arrangement of the data, but it’s not that big of an issue. Unfortunate, sure, but not tragic. If you lose the 1 tape that contains the deduplicated data, though, then you immediately have a Bad Day(tm).

Essentially, you are betting on one tape not failing over the course of (in the argument of Mr Preston) 7+ years. And if something does happen in that 7 years, whether it’s degaussing, loss, theft, fire, water, or aliens, you don’t lose one backup set. You lose every backup that referenced that set of data.

So I would, if I could afford one, buy a deduplicated storage array in a heartbeat for my backup needs. But I would not trust a deduplcated archival system at all. The odds of loss are too great, and it’s not worth the savings. I’d rather cut the frequency of my backups than save money by making my archives co-dependent.

Of course, another user we talked to around the launch of Simpana 8 felt differently:

The global deduplication with Simpana 8 also extends to tape, making it the first product of its kind to allow for writes to physical tape libraries without requiring reinflation of deduplicated data. “That’s very appealing,” said Paul Spotts, system engineer for Geisinger Health, a network of hospitals and clinics in central Pennsylvania. “We added a VTL [virtual tape library] because we were running out of capacity in our physical tape libraries, but we lease the VTL, so we’re only allowed to grow so much per quarter.”

What say the rest of you?

May 28 2008   11:47AM GMT

Storage experts pan report on tape archiving TCO



Posted by: Beth Pariseau
tape data storage

The disk vs. tape debate that has been going on for years is heating up again, given technologies like data deduplication that are bringing disk costs into line with tape.

Or, at least, so some people believe.

The Clipper Group released a report today sponsored by the LTO Program which compared five-year total cost of ownership (TCO) for data in tiered disk-to-disk-to-tape versus disk-to-disk-to-disk configurations. The conclusion?

“After factoring in acquisition costs of equipment and media, as well as electricity and data center floor space, Clipper found that the total cost of SATA disk archiving solutions were up to 23 times more expensive than tape solutions for archiving. When calculating energy costs for the competing approaches, the costs for disk were up to 290 times that of tape.”

Let’s see. . .sponsored by the LTO trade group. . .conclusion is that tape is superior to disk. In Boston, we would say, “SHOCKA.”

This didn’t get by “Mr. Backup,” Curtis Preston, either, who gave the whitepaper a thorough fisking on his blog today. His point-by-point criticism should be read in its entirety, but he seems primarily outraged by the omission of data deduplication and compression from the equation on the disk side.

How can you release a white paper today that talks about the relative TCO of disk and tape, and not talk about deduplication?  Here’s the really hilarious part: one of the assumptions that the paper makes is both disk and tape solutions will have the first 13 weeks on disk, and the TCO analysis only looks at the additional disk and/or tape needed for long term backup storage.  If you do that AND you include deduplication, dedupe has a major advantage, as the additional storage needed to store the quarterly fulls will be barely incremental.  The only additional storage each quarterly full backup will require is the amount needed to store the unique new blocks in that backup.  So, instead of needing enough disk for 20 full backups, we’ll probably need about 2-20% of that, depending on how much new data is in each full.

TCO also can’t be done so generally, as pricing is all over the board.  I’d say there’s a 1000% difference from the least to the most expensive systems I look at.  That’s why you have to compare the cost of system A to system B to system C, not use numbers like “disk cost $10/GB.” 

Jon Toigo isn’t exactly impressed, either:

Perhaps the LTO guys thought we needed some handy stats to reference.  I guess the tape industry will be all over this one and referencing the report to bolster their white papers and other leave behinds just as the replace-disk-with-tape have been leveraging the counter white papers from Gartner and Forrester that give stats on tape failures that are bought and paid for by their sponsors.

Neither Preston nor Toigo disagrees with the conclusion that tape has a lower TCO than disk. But for Preston, it’s a matter of how much. “Tape is still winning — by a much smaller margin than it used to — but it’s not 23x or 250x cheaper,” he writes.

For Toigo, the study doesn’t overlook what he sees as a bigger issue when it comes to tape adoption:

The problem with tape is that it has become the whipping boy in many IT shops.  Mostly, that’s because it is used incorrectly - LTO should not be applied when 24 X 7 duty cycles are required, for example…Sanity is needed in this discussion… 

Even when analysts agree in general, they argue.


Apr 30 2008   10:23AM GMT

RenewData takes a single swipe at tapes



Posted by: Beth Pariseau
data backup, tape data storage

Even as we continue to debate whether or not tape is dead, indicating at least that its salad days are probably behind it, some of the most interesting innovations in tape technology I’ve seen are happening right now.

For example, there’s Index Engines’ tape indexing and search software. If you’d been able to give backup administrators the ability to do a keyword search across dozens of backup tapes to identify what tapes should be restored, as well as the ability to extract single relevant files from said tapes, we might not have ever heard of a VTL.

I’d put the latest development from ediscovery services provider RenewData into that category as well. Renew says its tape-processing systems now only need to take a single pass through a given piece of linear media. Renew previously needed two or three passes, requiring its admins to mount tapes in proper order and reassemble data as it was ingested. The single-pass process will reduce the time it takes to find relevant information stored on its clients’ tapes.

The single-pass process is made possible by software that allows that data to be reassembled on the back end. Renew is not selling that software, except as part of the back-end of its hosted services. Renew’s VP of marketing Bob Little says the company doesn’t have any plans to offer it as an on-premise product.

But I have to wonder if someone else won’t find a way to develop something similar. I also wonder, if the tape space keeps coming up with finding new ways to access data randomly on linear media, whether this disk vs. tape debate could get much more interesting.


Mar 24 2008   2:47PM GMT

Tape is dead, long live tape



Posted by: Beth Pariseau
Data storage management, tape data storage

Ever since I started covering storage, I’ve been hearing the disk vs. tape debate, usually including proclamations that tape is dead or dying.

There are good reasons to make that assertion. Disk-based backup is catching on, particularly among SMBs, and data deduplication is evening out the cost-per-GB numbers between disk and tape for many midrange applications. Disk is preferable to tape in many ways, especially because it allows faster restore times for backup and archival data. Once again, people are starting to ask, what’s the point of using tape? Dell/EqualLogic’s Marc Farley posted a funny video on his blog to illustrate the question on Friday.

I’m not so sure we’ll ever really see the end of tape. When it comes to the high end, there’s simply too much data to keep on spinning disk. The cost of disk is often still higher per GB, depending on the type of disk and the type of application accessing it. And that doesn’t include power and cooling costs.

I’ve also heard lots of good reasons to give up tape. And maybe in certain markets, like SMBs, tape will die — if it hasn’t already. But whenever tape is on the ropes, another trend comes along to boost it back into relevance.  When disk took over backup, the data archiving trend kicked in, and tape’s savings in power and cooling and its shelf life for long-term data preservation came to the fore. Now, as data dedupe has disk systems vendors pitching their products for archive, too, along comes “green IT” to buoy tape.

Now, I’d like to ask the same questions Farley did, because I’m just as curious to know, and because he and I may have different audiences with different opinions. Do you think tape is dead? If not, what do you use it for? Let us know the amount of data you’re managing in your shop as well.