Using DB2 Hash organized tables for improved performance
In DB2 version 10 for z/OS, IBM introduces a new access type called Hash Access and access method called Hash space. This new option of organizing tables using hash improves performance of queries that access individual rows using equal to predicate (say getting data using customer number or product number).
DB2 uses an internal hash algorithm with the Hash space to reference the location of the data rows. Thus using hash provides the advantage of selecting a hash access path which (in most cases) means only one I/O to retrieve a row from the table in turn reducing the CPU usage and improved response time makes it a very compelling proposition.
Using Hash access also means no need for maintaining the data sequence or clustering the index (for that matter, index clustering is not allowed if the table is hash organized). This results in efficient insert processing and avoids data sharing contention in maintaining a clustering sequence or clustering index.
While creating a table, the hash access can be enabled by adding the organization-clause in the CREATE TABLE statement.
- o ORGANIZE BY HASH UNIQUE (column-names) HASH SPACE (hash-space-value)
“Organize by Hash” specifies that a hash is to be used for organizing the data of the table. The list of column names defines the hash key based on which the placement of a row is determined. Mentioning UNIQUE prevents the table from containing more than one row with the same value of the hash key (this applies even for the NULL value).
The amount of fixed hash space to be pre-allocated (default value is 64M) for the table can be specified. For tables partitioned by range, this space is for each partition. The value specified can be in KB, MB or even GB.
Hash organization is optimal when a table is of a stable or predictable size. When a table is organized by hash, DB2 automatically creates an overflow index. If the table exceeds the specified hash space, the extra rows are placed in the overflow index. The rows in the index are not hash access enabled and DB2 scans through the index to retrieve them.
An existing table can be altered to use hash access organization by specifying the “ADD ORGANIZE BY HASH” in the ALTER TABLE command. The table has to be reorganized and incompatible functions like index clustering would be disabled in the process. In case of existing tables by specifying AUTOESTSPACE(YES), we can let DB2 automatically estimate the best size for the hash space using the real-time statistics.
Using hash, obviously requires additional disk storage. Hash access can be monitored and the storage space used can be tuned for improved performance. The ACCESSTYPE column in the plan table has a value of ‘H’, ‘HN’, or ‘MH’, hash access is used to access the data. HASHACCESS (in SYSTABLESPACESTATS) indicates the number of times that hash access paths have been used to access the table. Using HASHLASTUSED, we can find out if DB2 has used the hash access path recently.
If HASHACCESS value is very low (even after several queries having accessed the table) or if DB2 has not used the hash access path recently – hash organization from the table can probably be removed thereby saving storage space.
The size of the hash space can be altered, if required, using the ALTER TABLE statement. Hash space must be increased:
- If the TOTALENTRIES (in SYSINDEXSPACESTATS) value is greater than 10% of TOTALROWS (in SYSTABLESPACESTATS).
- If the DATASIZE (in SYSTABLESPACESTATS) is greater than HASHSPACE (in SYSTABLESPACE). The HASHSPACE must at least be 20% larger than DATASIZE.
Hash organization is only available on Universal Table Spaces (UTS) that are both segmented and partitioned. Hash access cannot be used in the following cases:
o LOB and XML table spaces
o Tables defined with APPEND YES.
o Materialized Query tables
o Tables in basic row format (for hash, using reordered row format – RRF – is required)
Some of the restrictions in using hash organized tables include:
o Parallelism is not used for parallel groups when hash access is used
o Hash access is not used to access fact or dimension tables in a qualified star join
o For queries that use multi-row fetch, hash access is chosen only if the query contains an IN clause that returns multiple rows.
Hash table organization would be most useful where unique keys are accessed using equal predicates – customer id, product id, document id – and the order of the data is immaterial.