What you'll learn: RAID 6 is rapidly becoming a standard component of modern storage systems, with nearly every midrange or larger array adding this capability. We tell you why the erasure codes in RAID 6 arrays allow today's high-capacity disk drives to be used without fear of data loss due to unrecoverable read errors (UREs).
The data storage industry has relied on a few basic technologies for more than three decades, and chief among these is the concept of redundant array of independent disks (RAID). But the mathematics that underlie traditional RAID are being replaced in response to increasing disk capacity and demands for greater flexibility. That means cutting-edge storage systems, from RAID 6 arrays to cloud storage, are using erasure codes rather than traditional parity.
Many early RAID systems simply mirrored data between two independent disk drives. Referred to as RAID 1 systems, these were simple to implement, required no advanced calculations and could accelerate read performance in a fairly uncomplicated way.
However, RAID 1 wasn't efficient when it came to capacity. As a result, parity-based solutions like RAID 5 became popular. These systems apply a simple even-or-odd computation to data, providing redundancy across multiple disk drives without the overhead of RAID 1. While performing a simple parity calculation introduces overhead and slows write operations, this can be easily overcome through the use of specialized hardware and DRAM cache.
One common misconception of RAID is that it ensures the integrity of data. Although RAID allows data to be recovered in the event of a disk failure, mirroring and parity can't detect smaller unrecoverable read errors. Physical hard disk drives employ a variety of mechanisms to improve the reliability of read and write operations. One of the most important is a common cyclic redundancy check (CRC) that's capable of detecting single bit errors, forcing the disk drive to retry reading the data.
Manufacturers estimate that conventional hard disk drives are incredibly reliable, only failing to read one bit out of approximately 12 TB. That was reassuring when hard disk drives were measured in MB or GB, but today's multiterabyte drives make the chance of an unrecoverable read error (URE) unacceptably high.
Erasure codes add reliability
That's how we arrive at erasure codes and RAID 6. Long ago, mathematicians developed algorithms to detect errors in a data stream, some of which can even recover the original data when one or more errors are detected. These calculations were too difficult for early RAID controllers, but today's microprocessor technology has brought them within reach. Many of these advanced data protection systems rely on a class of mathematics called erasure codes.
When erasure codes are applied to data storage, we can store a few extra bits and make a storage system far more reliable than a conventional parity solution can. Of course, RAID 6 isn't a standardized technology. Indeed, most implementations employ a standard RAID 5-style parity bit in addition to a Reed-Solomon code to improve reliability. This combination makes a RAID 6 array resilient even if multiple disks fail at once or if an unrecoverable read error is found. Because of this capability, RAID 6 is more reliable than RAID 5. More importantly, RAID 6 arrays allow today's high-capacity disk drives to be used without fear of data loss due to an unrecoverable read error.
Technology similar to erasure codes is used to make Blu-ray Discs less susceptible to scratches, and DSL and WiMax data transmissions more reliable. Erasure codes are also used in many object and cloud storage systems, distributing data across multiple locations. The ability of erasure codes to allow data to be recovered from unreliable sources makes them extremely valuable in many modern systems.
RAID 6 becoming standard component of modern storage systems
As hard disk drives grow, RAID 6 becomes increasingly important. With Seagate's recent introduction of a 3 TB hard disk drive, we have reached the point where every typical 4- or 5-disk RAID 5 set is likely to encounter an unrecoverable read error in its lifetime. Because RAID 5 can't recover from such errors, RAID 6 has become an absolute must-have for RAID sets employing such large drives. Systems that still use RAID 5 should include advanced techniques like verify-on-write and data scrubbing to reduce the risk of data loss.
Happily, the average end user need not worry about the gloomy prospect of read errors and data loss. Storage manufacturers have been adding integrity-enhancing features for years, and nearly all enterprise data is protected using backups, replication or snapshot technology. RAID 6 is rapidly becoming a standard component of modern storage systems as well, with nearly every midrange or larger array adding this capability. Advances in processor technology make it likely that RAID 6 will spread to every corner of the market just in time to head off disaster.
Erasure codes in action
Erasure code math sounds complicated, but it's easy to comprehend. Imagine your phone number was 123-4567. It would be easy to remember because of the linear pattern of the numbers. Because it's a linear pattern of numbers, you can tell someone to start at 1 and dial each number until they reach 7. This allows them to easily recognize if they have dialed the wrong number.
Erasure coding creates a mathematical function to describe a set of numbers so they can be checked for accuracy and recovered if one is lost. Referred to as polynomial interpolation or oversampling, this is the key concept behind erasure codes. Although the math used is more complicated than our phone number example, the fundamental idea is the same.
BIO: Stephen Foskett is an independent consultant and author specializing in enterprise storage and cloud computing. He is responsible for Gestalt IT, a community of independent IT thought leaders, and organizes their Tech Field Day events. He can be found online at GestaltIT.com, FoskettS.net, and on Twitter at @SFoskett.