Using DB2 Deep Compression for significant cost savings
DB2 provides Storage Optimization feature enabling transparent compression of data on the disk thereby decreasing disk space and storage infrastructure requirements. Though various data compression techniques are available in DB2 today, the row compression – also referred to as Deep Compression – is the most significant one in terms of savings achieved.
While IBM claims that there are customers reporting up to 83% reduction, most environments seem to get typical improvement in the range of 40 to 70%. As the data is kept compressed on both disk and memory, the memory consumption also reduces. Compressed data results in fewer I/O operations further leading to improved performance. Log records are also compressed.
Before dwelling on the deep compression usage and its advantages, it is imperative to note that:
- Deep compression incurs additional license cost. To use deep compression, you must have a license for the DB2 Storage Optimization feature.
- Compression techniques obviously result in increased processer utilization to handle compression and decompression. In case of deep compression it is reported that the CPU Utilization increases by around 3 to 5%.
- Deep compression is available from DB2 9.1 and improvements and new features added to each of the later versions.
Row compression (or deep compression) compresses data rows by replacing patterns of values that repeat across rows with shorter symbol strings. Row compression uses a variant of the Lempel-Ziv algorithm to compress data by row.
The row compression is enabled by setting the COMPRESS attribute of the table to YES. It can be done while creating the table or by altering an existing table. Row compression when enabled is applicable for the entire table even if it is partitioned. The database manager creates a compression dictionary for each table that is enabled for row compression.
The table is scanned for repetitive and duplicable data and after collecting these entire, the compression dictionary is built assigning short, numeric keys to those entries. Entire rows – and not just certain fields or part of rows – are examined for identifying the repeating entries or patterns. Tables containing text data are more effectively compressed compared to ones that contain numeric data.
The compression dictionary doesn’t occupy much space and for extremely large tables, it is expected to occupy about 100 KB. The dictionary for lookup is stored in hidden objects in the database and is cached in memory for quick access.
Once compression is enabled, operations that add data to the table, such as an INSERT, LOAD INSERT, or IMPORT INSERT operation use compression. To disable compression for a table, the COMPRESS feature can be set to NO in the ALTER TABLE statement. Once disabled, the rows that are subsequently added are not compressed. To compress or decompress the entire table, the reorganization of the table can be done.
The largest tables are the most probably candidates for compressed considering the storage savings. But a large number of small tables in an environment can make them eligible candidates. Small tables may not suitable for compression, as the dictionary is stored within the physical table data object itself. DB2 creates the compression dictionary only after the table size reaches the threshold to make compression worthwhile. Therefore it would be easier and faster to alter all tables to be compressed and let DB2 decide.
Running the INSPECT command with the ROWCOMPESTIMATE clause helps to estimate the storage savings. Before DB2 9.5, INSPECT command was also used to create the compression dictionary if compress is enabled for a table and dictionary doesn’t exist.
The quality of a compression dictionary is based on the data used to create it. Theoretically, optimal compression dictionaries and hence compression ratios, are achieved when the compression dictionary is built from an inclusive sample set. Table reorganization (and INSPECT) build a dictionary based on all table data and thus produces the most optimal dictionary.
Test results demonstrated that compression rates achieved by building a dictionary on a 10% sample of the entire data set were almost as good as building the dictionary using the data from the entire table. Creating the dictionary based on 10% of the data saved over 50% of the total time required to load and reorganize the tables.
Considering this factor, in DB2 Version 9.5, IBM introduced the automatic dictionary creation (ADC) feature and with this the compression dictionary for a table enabled for compression gets automatically created when a sufficient amount of data (approx. 700 KB) is stored in the table.
The KEEPDICTIONARY keyword, the default, in the REORG TABLE command allows the compression dictionary already existing to be reused during the reorganization of the data in the table. And the RESETDICTIONARY keyword can be used to replace the compression dictionary and create a new one consider all the existing data.
When DB2 Storage Optimization feature is enabled, compression for temporary tables is enabled automatically and cannot be disabled. DB2 optimizer considers the storage savings and the impact on query performance to determine whether to use compression or not.
Useful thumb rules related to applicability of compression are:
- Compression is suitable for tables larger than 1 MB.
- Threshold for creating compression dictionary is table size of around 2 MB
- Read-only tables are ideal candidates for compression.
- Tables with read/write ratio of 70/30 or higher are excellent candidates for compression.
- Tables with heavy updates may not be suitable candidates for compression
- For compressed tables with high update activity, increase the percentage of each page to be left as free space after reorganization.
- Compression works best when I/O accesses are sequential. BI and other decision support systems that rely on complex queries involving large volume of Sequential access are the ideal candidates that benefit from compression.
DB2 9.7 provides the Reclaimable Storage feature using which the space that has been freed due to compression of an existing table can be reclaimed.
When you are considering using DB2 in cloud, definitely explore the usage of deep compression as it would mean significant savings in your cash out-flow as the cost reduction is achieved in terms of both storage usage as well as cost involving in moving the data to cloud.