Data de-duplication, the hottest technology in storage
Data de-duplication with its promise of reducing storage capacity of backup environment by 95% (according to Forrester) has become fully Mainstream with more than 84% of the Gartner survey respondents currently using or planning to use it.
Data de-duplication (also called “intelligent compression” or “single-instance storage”) is a specialized data compression technique that reduces the storage needs by eliminating redundant data and storing only one unique instance of the data.
Unlike the standard file compression techniques, the focus of data de-duplication is to take a very large volume of data and identify large sections – even entire files – that are identical and store one copy of it. The standard example is an email system where there could be 100 instances of the same 1 MB file attachment and with de-duplication, only one instance of the attachment is actually stored thereby reducing a 100 MB storage demand to only one MB. The single copy that is stored could in turn be compressed by single-file compression technique and providing further storage reduction.
The Storage Networking Industry Association (SNIA) refers to data de-duplication as “The replacement of duplicate data with references to a shared copy in order to save storage space. This may be done at a whole-record level or at a sub-record level.” Refer to http://searchstorage.techtarget.com/definition/data-deduplication for detailed definition and techniques used.
With the lowering cost of hardware, it hardly seems to make any sense to be excited by any storage saving techniques. The need and significance can be understood from the points made by one of Forrester reports that claim that:
- Storage capacity requirements for most organizations double every 12 to 18 months (30-40% annual data growth, duplicate copies to be created for recovery etc.)
- Legal and other trends are requiring that as much backup data as possible is retained in disk (rather than tape) and kept there as long as possible.
And with backup vendors with de-duplication technology claiming de-duplication ratios of 20:1 or more (upto 50:1), it is catching the interest of most IT professionals. According to Forrester, Data de-duplication along with Cloud as storage has the potential to really make the disk as cheap as Tape.
Forrester’s report dated July 2007 says “It does not expect tape to completely vanish for another five years at least, but we believe that firms will continue to shift more of their backups (as well as their investment) to disk as their first line of protection. Technology such as de-duplication will accelerate this shift”.
The storage and even non-storage related vendors definitely seem to be doing all they can to keep the trend. NetApp with its own NetApp de-duplication feature, EMC with their acquisition of Data Domain for De-dupe, IBM with its patented inline de-duplication technology and Dell’s intention to acquire Ocarina Networks for its content aware de-dupe and compression technology and Quest’s announcement to acquire BakBone Software with data de-duplication software are all set to make it the hottest trend.
Data de-duplication, provides cost savings, directly by lowering storage space requirements and indirectly by reducing power and cooling costs, network bandwidth costs and may also save software license costs. In addition to saving costs, it also carries other benefits like “longer retention periods”, “better recovery time objectives”, “reduced I/O” as well as “improved availability” making itself a definite option to be explored (if not done already).