Posted by: Randy Kerns
data deduplicaiton, data reduction, Storage
The premise of doing data reduction of stored information is that more data can be put in the available physical space. Storing more data in a fixed amount of space drives down the price of storing data and gives added benefits of reducing the footprint, power consumption, and cooling required.
Performance requirements for data reduction vary depending on the type of data. If the data needs to be accessed frequently or in a time critical manner, the process of data reduction and expansion on access must have no measurable impact on performance. The performance demand is relaxed as the data becomes less important or more infrequently accessed.
Performance impact is crucial when using data reduction with solid-state technology. Solid-state storage, implemented in NAND flash today, is used in performance demanding environments. Response time is the most critical element in accelerating performance.
Data reduction is accomplished through deduplication and compression. Deduplication is most effective where there is repetitive data, such with successive backups. The effectiveness diminishes as the data becomes less repetitive. Compression uses an algorithmic process to reduce the representation of data in strings as it is parsed. The compression effectiveness varies based on the type of data or compressibility of the data, but is relatively consistent for a type and has predictable averages.
There are arguments for using either dedupe or compression, but many of the arguments are parochial. For primary data, compression in a storage system has proven effective for a long time, going back to the StorageTek Iceberg/IBM RVA virtual disk products from the 1990s.
There are several ways to reduce data on NAND flash. One method is predicated on the use of standard solid-state devices (SSDs) packaged to replace hard disk drives (HDDs) with the attachment and data transfer using disk drive protocols. These standard devices have an internal flash controller and flash memory chips along with the protocol interfaces to mimic a disk drive. For the use of these drives, data reduction is added external to the SSD, in what we would call the storage controller. The implementation in the storage controller is done using the internal processor or with custom hardware. In this case, data reduction uses controller resources and may have a noticeable performance impact.
There is less likely to be a performance impact if the reduction is done inline – while the data is being written. Other implementations may store data and then do the data reduction later (called a post-storage data reduction or sometimes referred to as post-processing data reduction). Post-storage reduction consumes resources which may or may not be impacting and the response time may be delayed while the data is expanded before access.
Other designs using flash storage have custom flash controllers with flash memory. These are unique designs for the different storage system implementations. Often, shadow RAM is used in these designs to optimize page updating. A processor element is included to control the algorithms for flash usage. Data reduction in the flash controller is transparent to the storage controller that manages the access to the storage. The flash controller is expected to do the data reduction without impacting performance.
Over time, data reduction will become an important competitive feature for solid-state storage, and designs and capabilities will continue to advance. This does not mean that compressing data elsewhere will not be useful. There is value for compressing data on HDDs and for transferring data, especially to remote sites. The important thing to understand is that reducing data stored in solid state technology is an evolutionary development with compelling value and will result in vendor competitive implementations.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).