DataCenter archives - Taylor's Take on Storage

Taylor's Take on Storage:

DataCenter

Dec 21 2008   5:14PM GMT

This Week in Storage (12-19-08)



Posted by: Taylor Allis
Storage, DataCenter, DataManagement, srm, Storage Vendors, HDS, storagemanagement

Jingle bell storage: What to buy a geek
A great list for the geek in your life. Best gift idea? Kiva.org! I’m the geek Beth mentions in the last paragraph. My buddy who gave me the gift? CBS news’ Hari Sreenivasan.

Economic Downturn Hits Storage Spending
Ok – so I think we all know that our IT budgets will be flat or cut next year. We also know that data will continue to grow (and grow fast) regardless of what we do with our budgets. So what to do? Optimize, optimize, optimize. If you haven’t run a formal efficiency and optimization program in your storage infrastructure environment, then you are overlooking a huge chuck of wasted capacity and space. The storage utilization and allocation rates are far worse than most vendors are telling us (I know b/c I came from one!) It’s not in their best interested to sell less storage, but it is in yours to buy less…

The state of data backup in 2009, Part 3
Good reading in Beth’s Data Backup series – this issue covers disaster recovery and the emergence of Cloud backup. Amazon was first to market with S3, EMC came out with Mozy, and Beth writes about Symantec’s offering…

Symantec adds change management to SRM
Symamtec mentions making agentless options available too. When I work with IT admins this is top of their list – they don’t like agents crawling all over their environment. I understand – but agents will still be around, maybe minimized. ESG says that SRM tools aren’t viewed as a must-have – this is unfortunate. There is HUGE value in them – but you need to know (or have a partner that knows) how to deploy them, interpret them, and ACT upon the data. If you don’t take this step, you just bought shelf-ware. You do this, and you can free between 30% - 70% of your capacity – I’ve seen this done with multiple infrastructures and count THAT as a must have.

Brocade Buys Foundry Networks for $3B
Brocade drops $3B to pick up Foundry – a good move on the surface. This will make them more competitive with Cisco – offering LAN and WAN equip. 10GbE still looks to be a great bet, and networking companies are investing in it. See Stephen Foskett’s blog.

HDS embraces SSDs
HDS is a little late to the party (EMC led the charge, followed by Sun and others). But USP is a great disk system, and SSD will make it better. STEC is making out as the SSD partner to have. Again – if you have any apps that live or die by latency times, you need to be researching SSD options.

“Despereaux” uses clustered storage
For you HPC junkies, the movie “Despereaux” had to chunk through 1,700 shots and 90 million images. They did it the way most do, Linux clusters running an HPC filesystem – this one Lustre. They stored 200TB of generated data on Infortrend’s EonStor RAID system (comment if you know anything about EonStor – I don’t). I’ll probably take my boy to the movie and bore him on the details on how it was made…

Dec 12 2008   7:37AM GMT

De-Dup Primary Storage?



Posted by: Taylor Allis
Storage, DataCenter, DataManagement, dedup

I was recently talking about a Storage Magazine article, Dedupe moves beyond backup.

The conversation led me to look back on some of my past analysis around de-dup. I ended up looking 5 years into the past.

Global Compression at StorageTek

AdTek

At StorageTek there used to be an engineering research and IP department called Advanced Technology Research or “AdTek.” My current business partner and boss, Randy Chalfant, used to run it. A brilliant engineer by the name of Chuck Milligan ran the group after Randy – Chuck is the one who hired me at StorageTek. I eventually ended up heading the department.

I was looking at an old list of Research Probes we were recommending to STK execs for productization – there were 11 cases we presented in 2003 (Grid Storage, Flash/SSD, Encryption, etc.) On the list was “Global Compression.” In our pitch to management, we stated that this yielded extremely high compression ratios and had the potential to disrupt tape. We recommended adding it as a feature to the backup disk products STK was looking to bring to market – we even recommended some companies to evaluate for investment. (Unfortunately, some other probes were picked for further research that year!)

Fast forward some years and my strategy team and I found ourselves briefing Sun executives (after the STK acquisition) on the future of de-duplication as it has come to be known. I remember saying two things:

1. De-duplication has officially moved from cutting edge to a must have for disk backup, VTL, and secondary storage

2. Dedup will move from secondary storage to primary storage in the future (we backed up our claims with an excellent 451 Group report on the subject)

Dedup in Primary vs. Secondary Storage

Now we have dedup in primary storage. However, some think primary storage is not always the best place for dedup. The thinking is that de-dup works where there is a lot of…duplication. Primary storage tends to hold more transactional data, while secondary storage has more duplicate data. While this is true, there is more duplicate data on primary storage than users know.

I have moved from simply recommending storage strategies to actually implementing them in my new venture (which is much more fun!) Dedup is one of the steps we use with clients to get to a more efficient and optimized storage infrastructure.

We help storage users identify all of the inert data sitting on their primary storage – data that has not been referenced in more than 6 months. Users are almost always surprised about how much we find – around 40% on average.

The next question is what to do with this data – it needs to be cleaned up or moved in order to return that 40% to free pool capacity.

One clean up step is dedup – and in some instances a significant amount can be deduplicated. What are duplicates doing on primary storage? A lot of data management practices (or lack thereof) lead to this.

One example: In many cases application engineers will be testing new applications or updates. They need to run tests on real data – but obviously can’t run them on live, production data. So, they make a snap copy of the production data and run the tests against this data set. If they want to run another test, they’ll make another copy and so on. Do they remember to go back into the system and clean up their copies? Most often the answer is no – and this simple process (which is one of many) robs a primary disk system of its precious capacity.

So, deduplication can have a significant impact on primary storage in addition to secondary storage. But like any storage technology, the way in which it is implemented is the critical part of the equation.