Storage Soup

A SearchStorage.com blog.

» VIEW ALL POSTS Jan 5 2012   9:13AM GMT

Life after RAID



Posted by: Randy Kerns
Tags:
erasure codes
forward error correction
information dispersal algorithms
RAID

Recent developments point to a change in how we protect the loss of a data element on a failed disk. RAID is the venerable method used to guard against damage from a lost disk, but RAID has limitations – especially with large-capacity drives that can hold terabytes of data. New developments address RAID’s limitations by providing advantages not specific to disk drives.

The new protection technology has been called several things. The name most associated with research done in universities is called information dispersal algorithms, or IDA. Probably the more correct term as it has been implemented is forward error correction, or FEC. Another name used based on implementation details is erasure codes.

The technology can address the loss of a disk drive that RAID was targeted to protect. It can also prevent the loss of a data element when data is distributed across geographically dispersed systems. The following diagram gives an overview of the coverage protection for data elements.  The implementation allows for a selection of the amount of coverage of protection across data.  An example that is commonly used is a protection setting of 12 of 16, which means only 12 of 16 data elements are needed to recreate data from a lost disk drive.

Vendors with products that use FEC/erasure codes include Amplidata, Cleversafe, and EMC Isilon and Atmos. Each uses a slightly different implementation, but they are all a form of dispersal and error correction.

The main reason to use erasure codes is for protection from multiple failures. This means multiple drives in a disk storage system could fail before data loss would occur. If data is stored at different geographic locations, you can handle having several locations unavailable to respond and still not lose data. This makes erasure codes a good fit for cloud storage.

Other advantages include shorter rebuild times after a data element fails and less performance impact during a rebuild. A disadvantage of erasure codes is they could add latency and require more compute power when making small writes.

One of the most potentially valuable benefits from using erasure codes is the reduction in service costs for disk storage systems. Using a protection ratio that has a long-term coverage probability (meaning multiple failures will not occur with the potential to lose data for a long period of time), a storage system may not require a failed device to be replaced over its economic lifespan. This would reduce the service cost. For a vendor, this reduces the amount of warranty reserve.

This form of data protection is not prevalent today and it will take time before a large number of vendors offer it. There are good reasons for using this type of protection and there are circumstances when it is not the best solution. Storage pros should always consider the value it brings to their environment.

(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).

1  Comment on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • Randomitdude
    The MTDL (mean time to data loss) is over 100 years in a RAID6. google: mtdl raid6 intel Triple parity ala ZFS? sheesh. Reminds me of a pissy exchange I had years ago with someone that was insisting we would see 128 bit CPUs. The whole idea of multiple drive failures is not a fun discussion either. Imagine a data center with 3000 drives (BTDT) and it loses a drive every few weeks (maybe once every 2 months) now imagine it loses a drive in a RAID6 array and THEN loses another drive during rebuild and THEN hits a UBE during that rebuild. Sure.. possible, once every 100 years (see above). If the data is that stinking important it would be in-sync in at least 2 dispersed datacenters. I'm not seeing RAID protection beyond RAID6 having much uptick. My opinion. Rob
    0 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: