Posted by: Randy Kerns
erasure codes, forward error correction, information dispersal algorithms, RAID
Recent developments point to a change in how we protect the loss of a data element on a failed disk. RAID is the venerable method used to guard against damage from a lost disk, but RAID has limitations – especially with large-capacity drives that can hold terabytes of data. New developments address RAID’s limitations by providing advantages not specific to disk drives.
The new protection technology has been called several things. The name most associated with research done in universities is called information dispersal algorithms, or IDA. Probably the more correct term as it has been implemented is forward error correction, or FEC. Another name used based on implementation details is erasure codes.
The technology can address the loss of a disk drive that RAID was targeted to protect. It can also prevent the loss of a data element when data is distributed across geographically dispersed systems. The following diagram gives an overview of the coverage protection for data elements. The implementation allows for a selection of the amount of coverage of protection across data. An example that is commonly used is a protection setting of 12 of 16, which means only 12 of 16 data elements are needed to recreate data from a lost disk drive.
Vendors with products that use FEC/erasure codes include Amplidata, Cleversafe, and EMC Isilon and Atmos. Each uses a slightly different implementation, but they are all a form of dispersal and error correction.
The main reason to use erasure codes is for protection from multiple failures. This means multiple drives in a disk storage system could fail before data loss would occur. If data is stored at different geographic locations, you can handle having several locations unavailable to respond and still not lose data. This makes erasure codes a good fit for cloud storage.
Other advantages include shorter rebuild times after a data element fails and less performance impact during a rebuild. A disadvantage of erasure codes is they could add latency and require more compute power when making small writes.
One of the most potentially valuable benefits from using erasure codes is the reduction in service costs for disk storage systems. Using a protection ratio that has a long-term coverage probability (meaning multiple failures will not occur with the potential to lose data for a long period of time), a storage system may not require a failed device to be replaced over its economic lifespan. This would reduce the service cost. For a vendor, this reduces the amount of warranty reserve.
This form of data protection is not prevalent today and it will take time before a large number of vendors offer it. There are good reasons for using this type of protection and there are circumstances when it is not the best solution. Storage pros should always consider the value it brings to their environment.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).