Storage Soup

Sep 30 2008   10:58AM GMT

Let’s manage data, not just storage

Brein Matturro Profile: Brein Matturro

 (Ed. Note: this guest blog comes from Siemens Medical Solutions storage administrator Jim Hood in response to the editorial in the July Storage magazine, Dedupe and virtualization don’t solve the real problem).

I was happy to see that someone finally acknowledged the root of some of the evils in the storage business. Your editorial, “Dedupe and virtualization don’t solve the real problem,” spoke to the heart of the matter: “The math is easy: More servers mean more apps, and more apps mean more data.” It cannot be spoken any clearer than that. I have been involved with storage all of my 27 years in IT from the early ‘80’s until now spanning mainframe and open systems and have seen the amount of data expand exponentially. I wish my retirement fund had the same growth curve.    

In our business, we continue to satisfy our hosted mainframe customers’ needs with relatively small amounts of data (our bread-and-butter apps in zOS use customized VSAM [Virtual storage access method] files hardly over the “4-gig limit” to provide databases for hospital clinical applications) while similar applications on Windows stretches the imagination – mine at least. As someone who has lived through this transformation and now has to support the backup processes for our open system business, the amount of data we handle makes my head spin.

It isn’t unusual for us to process 25 TB of backup data every day (because we use Tivoli Storage Manager, this consists of only new or changed files). We have accumulated over 2 PB of capacity in our backup inventory. I don’t see it getting any less even though we have an active relationship with users, and encourage them to look at what they backup and how long they retain the backup data. The volume just keeps growing.  

With all the technology at our disposal, the industry does not seem to want to address your basic math problem. I believe we live in an age where both technology and its pricing have brought us to a point where “creating data is cheap” — so cheap that there is no turning back. We seem to have lost the thought processes associated with data management: how many files, file size, other data spawned from these files, where does the data reside, what data should be backed up, etc. 

I’m not sure, going forward, how to make it appear as though storage costs are kept relatively level while at the same time incurring new costs for hardware, software and people to manage this growth. In our environment we pass on expenses by using a chargeback system, but pressure from the user base (application development) to reduce their costs from one fiscal year to the next usually translates to lower chargeback pricing while the real problem — too much data — persists.  We can try to dedupe and virtualize our way out of, this but somebody will have to pay for it.

To really address this problem will require, as you stated, “an awful lot of manual work,” but it will be difficult for many organizations to cough up the resource costs to do so. Let’s face it, that grunt work doesn’t generate any new revenue through new products. So again, it becomes a storage management issue rather than a data management solution. 

My view is this: Twenty years ago we had a modest home with a one-car garage (mainframe) to keep all our stuff in. In the last decade we decided we needed more stuff — newer stuff — and moved to a larger house with a two-, heck, three-car garage (Windows). The reality of the economy and housing market is reshaping the world of real estate. I’m not sure what kind of “housing crunch” will be necessary to have us take a different look at how we create data. Getting people to do that would be a good first step in the right direction.  

Finally, on a more humorous note, I think one of the problems is in how we refer to amounts of data. One TB is no big deal, right? How do I sell my problem to those who write the checks when I speak in terms of one or two of something? “So, Jim, you say you can’t manage your 2 PB easily!” or “What is so hard about managing your growth from 1 PB to 2 PB, come on, you only grew by one!” It is all about perception these days and by truncating real capacities, we diminish the true state of affairs. Sometimes I try to communicate the reality by simply changing the language: 2,000 TB makes a larger impact than 2 PB.  Maybe we all need to begin speaking in larger quantities than single digits

Jim Hood

EHS Storage Management

Siemens Medical Solutions

2  Comments on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.
  • CT
    Just for fun let us imagine that storage is not there any longer and all read and write activity should be performed using pen and paper by the requestor ...(!) ... The human nature drives us to pursue pleasure and to avoid pain. Consuming storage does not have (today) a good balance of pleasure and pain in our corporate and government culture. Perhaps the amount of data stored could be reduced more managing people using the “drive” toward pleasure then attempting to manage the storage consumption using the technology.
    0 pointsBadges:
  • Storage4life
    From the lack of responses I suspect that this subject has little interest outside of a few storage admins around the world. However, our cummulative data volume will continue to grow since, as CT suggests, there is little "pain" in the creation process. It is the outcome of people's work and/or pleasure so data creation must continue. The truth is, people don't think about the (storage) consumption side and the additional resources necessary for storage, backup, archive, etc. I do because my business is managing this. I believe that there is simply this large "assumption" mindset out there that its all taken care of and doesn't really cost anything (or the cost is built into the process/price somewhere). Until the connection is made (that there is a storage cost to this) there is no motive to reduce the amount of data one generates. Even so, any reduction in one area of data growth would surely be overshadowed by some other net-new application in another area. Maybe someday the collective pain (cost) of many users (individuals, businesses, corporations (or departments within them)) will help drive down consumption just enough to keep the overall storage growth curve at some reasonable trend so that all of this somehow works out. Makes me wonder what it will be like 10 years from now. Fifty years, don't want to think about it. Jim
    0 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: