Posted by: Randy Kerns
primary data reduction, real-time compression
Compression of data on primary storage has taken center stage in the storage wars now with IBM’s release of Real-Time Compression on the Storewize V7000 and the SAN Volume Controller.
Although not the first product to offer data reduction in primary storage, IBM raised the bar by doing compression inline (real-time) and without performance impact. Other solutions in the open systems storage area primarily compress data and sometimes dedupe it as a post-processing task after the data has been written.
Competition for storage business is intense, and inline compression of data for primary storage will be a major competitive area because of the economic value it brings customers. If the compression can effectively reduce the amount of data stored, the reduction amount serves as a multiplier to the amount of capacity that was purchased.
IBM claims a 5x capacity improvement, which gives customers five times as much capacity as they pay for. Even if IBM’s compression comes in at 2x, that would still be significant savings despite an additional license fee for the feature.
Doing compression with no performance impact means the compression is transparent to the application and server operating system. The customer gets increased capacity benefits without having to make an accommodation such as installing another driver or version of an application. The effective compression rate will vary with data types, but there has been a long history of compressing data and the types and compression rates are not a new science. Vendors usually publish an expected average and sometimes offer a guarantee associated with the purchase.
Compression of real-time data in the mainframe world goes back to the StorageTek Iceberg (later offered as the IBM Ramac Virtual Array) that compressed mainframe count-key-data in the 1990s. That system compressed data at the channel interface and then stored the compressed information on disk.
The use of the Log Structured File system and the intelligence in the embedded storage software allowed the system to manage the variable amount of compressed data (done on a per-track level), and removed the direct mapping to a physical location. That was an effective compression implementation and demonstrated the effect that compression multiplies the actual capacity.
One of the more significant aspects of compressing data at the interface level was the effect that had on the rest of the system resources. With data that was reduced by something like 5x or 6x, the other resources in the system benefited.
• The cache capacity was effectively multiplied by that same amount, allowing for more data to be resident in cache giving higher hit ratios on reads and greater opportunity for write coalescing.
• The interface to the device had the data transfer bandwidth effectively multiplied for much faster transfer of data from the disk drive buffers.
• The disk devices, while storing more data, also would transfer more data over a period of time to the disk buffers and the controller.
Similar benefits gained by the implementation in the StorageTek system can be achieved in new systems targeted for primary storage in open systems.
In the case of the StorageTek system, the compression was a hardware-intensive implementation on the channel interface card. With IBM’s Storewize V7000 and SVC, the implementation is done in software, capitalizing on the multi-core processors available in the storage systems. Faster processors with more cores in succeeding generations should provide additional improvement. Having compressed data in cache and compressed data transferred on the device level interface and from the device means performance gains there offset time spent in the compression algorithm.
There are other potential areas where transparent compression could be done. Compressing the data in the device such as in the controller for solid state technology is another option.
Customers will benefit from reduction of data actually stored and the inline compression of data that is transparent to operations. The benefits are in the economics and this will be a competitive area for vendors.
There will be a considerable number of claims regarding implementations until this becomes a standard capability across storage systems from a majority of vendors. You can expect a rush to bring competitive solutions to market.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).