Yottabytes: Storage and Disaster Recovery

Jan 26 2016   12:04PM GMT

User Finds Amazon Glacier an Expensive Roach Motel for Data

Sharon Fisher Sharon Fisher Profile: Sharon Fisher


People used to joke about the notion of “write-only memory,” where data could be written to it but not retrieved again. To at least one user, that’s what’s happening with Amazon’s Glacier service.

As you may recall, Glacier was announced in August, 2012, as low-cost storage for long-term archiving in return for customers being willing to wait several hours to retrieve their data. That resulted in a cost of $10 per terabyte of data.

But a fellow who used the service found out that retrieving his data wasn’t nearly as easy as putting it in there. Marko Karppinen writes in Medium that he used the service to back up some 150 CDs, or 63 giagabytes, soon after the service became available. Recently, he decided to migrate the data.

“The culprit was the same neat freak tendency that had me toss all those CDs in 2012,” Karppinen writes. “I simply no longer wanted to have that 51¢ AWS bill appear, each and every month, in my email inbox and on my AmEx statement. Here in present-day 2016 I’m paying for a one-terabyte Dropbox account and, as a part of my Office 365 subscription, a 1TB OneDrive. Why would I keep a convoluted 51¢-a-month archival setup when I already have all the cloud storage I could need, on two diverse–yet–incredibly–convenient providers?”

But Karppinen found out it wasn’t as easy – or as cheap – as he might have thought. First of all, it was technically complicated to do, with limited tools to support it. Moreover, it is – as advertised – glacial, he writes. “Before you try it, it’s hard to appreciate how difficult it is to work with an API that typically takes four hours to complete a task.”

(Kind of like working with punch cards in the old days, grasshopper.)

Karppinen writes that he ended up spending most of the weekend trying various tactics – with a requisite four-hour wait after each new attempt.

Second, it was expensive. “Here I was, working on a full retrieval of the archive, something that Glacier was explicitly not designed for,” Karppinen writes. “Glacier’s disdain for full retrievals is clearly reflected in its pricing. The service allows you to restore just 5% of your files for free each month. If you want to restore more, you have to pay a data retrieval fee.”

When Karppinen had originally researched the fee, he noted that the description said it “started at” $0.011 per gigabyte, and assumed that that was what he would be charged, for a total of 86 cents. But as it turns out, it ended up costing him more than $150.

Glacier data retrievals are priced based on the peak hourly retrieval capacity used within a calendar month,” Karppinen explains.You implicitly and retroactively ’provision’ this capacity for the entire month by submitting retrieval requests. My single 60GB restore determined my data retrieval capacity, and hence price, for the month of January, with the following logic:

  • 8GB retrieved over 4 hours = a peak retrieval rate of 15.2GB per hour
  • 2GB/hour at $0.011/GB over the 744 hours in January = $124.40
  • Add 24% VAT for the total of $154.25.
  • Actual data transfer bandwith is extra.

Had I initiated the retrieval of a 3TB backup this way, the bill would have been $6,138.00 plus tax and AWS data transfer fees.” [All emphasis his.]

Remember, to add insult to injury, Karppinen still hadn’t gotten his music back – but he did eventually figure out how to do that. And he includes all the gnarly details.

Interestingly, when we wrote about Glacier in 2012, we noted two points:

  • “The service is intended not for the typical consumer, but for people who are already using Amazon’s Web Services (AWS) cloud service. Amazon describes typical use cases as offsite enterprise information archiving for regulatory purposes, archiving large volumes of data such as media or scientific data, digital preservation, or replacement of tape libraries. ‘If you’re not an Iron Mountain customer, this product probably isn’t for you,’ notes one online commenter who claimed to have worked on the product. ‘It wasn’t built to back up your family photos and music collection.'” [Emphasis mine.]
  • “There is also some concern about the cost to retrieve data, particularly because the formula for calculating it is somewhat complicated.”

Not to say “I told you so” or anything, of course. And Karppinen sounds like he’s figured that out already – and has a lesson for all of us as well. “More and more, we expect cloud infrastructure to behave like an utility,” he writes. “And like with utilities, even though we might not always know how the prices are determined, we expect to understand the billing model we are charged under. Armed with that understanding, we can make informed decisions about the level of due diligence appropriate in a specific situation. The danger is when we think we understand a model, but in reality don’t.”

1  Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.
  • kboland
    Cloud retrieval costs are definitely something to be considered when evaluating the cost of a cloud DR solution.  Here is another interesting analysis http://blogs.unitrends.com/comparing-unitrends-cloud-amazon-aws-cloud-google-cloud-backup/ 
    110 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: