It doesn't matter if you have a hard disk drive, a solid-state drive or a hybrid drive -- they can all fail. Simple math demonstrates that as you increase the number of drives you have, the frequency of drive failures also grows. Data storage growth continues to accelerate, and today's drive capacities just aren't keeping pace. That means more drives are necessary, but protecting those drives can be a challenge for IT pros relying on traditional
The trouble with RAID
RAID has been the primary method for protecting data on hard disk drives (HDDs) for the past quarter of a century. RAID protects data from loss in the event of a drive failure by distributing data across the drives in a RAID group. You can think of RAID as a form of virtualization: Different RAID sets provide different levels of redundancy levels within that RAID set or group of drives, and RAID allows data to be accessed even when one or more drives are lost or inaccessible. But there's always a tradeoff among performance, resilience and cost. For example, RAID 5 costs less than RAID 6, and has less overhead (20% to 25% overhead for RAID 5 versus approximately 25% to 30% for RAID 6) and higher performance, but it can tolerate only a single concurrent drive failure or unrecoverable bit error rate in the RAID group versus two concurrent drive failures or unrecoverable BERs for RAID 6. When the allowable number of concurrent drive failures or unrecoverable BERs is exceeded, data is lost.
RAID rebuilds a failed drive using parity and by exercising every bit of every drive in the RAID group. But rebuilding a drive is an I/O-intensive process that also takes time and storage resources. And the bigger the drive, the longer it takes to rebuild. A 4 TB drive can take days to weeks to rebuild, not minutes or hours. Not only do these long rebuild times raise the risk of additional drive failures or unrecoverable BERs, but most storage system performance also will degrade noticeably during a RAID drive rebuild.
Storage administrators will commonly assign RAID HDD rebuilds as a background task to mitigate controller performance loss, but dragging out the HDD rebuild time increases the risk window. HDDs tend to fail in bunches, so the risk of another drive failing or returning an unrecoverable BER in a RAID group after the first drive fails goes up anywhere from two to 10 times depending on HDD capacity, the number of HDDs in the RAID group, and whether or not rebuilds are a primary or secondary task. In addition, the risk of a third concurrent HDD failure increases by another two to 10 times.
Each additional simultaneous loss further drains storage controller processor performance. The HDD failure cycle becomes a positive feedback loop in which each subsequent failure increases the probability of more failures and data loss. In my experience, IT shops have been reporting an increase in non-recoverable drive failures, which require data recovery from snapshots or backups.
The most common workaround is replication, but replication adds a substantial amount of cost to the process for marginally better data resilience. For example, RAID 6 with 25% overhead will add another 125% of overhead for replication. This means 100 TB of data requires 250 TB of usable storage. That may work for small amounts of data, but the Capex and Opex costs become unsustainable for hundreds of terabytes, petabytes or exabytes of data. It becomes an even greater problem, as protection must exceed three or more drive failures.
Multi-copy mirroring and erasure codes
Multi-copy mirroring (MCM) and erasure codes (EC) are two alternatives to RAID technology that are heavily utilized in many IT shops and promoted by many vendors in their newest products or versions.
Multi-copy mirroring replaces RAID by making multiple concurrent copies of the data on different drives in different drawers, processors and even nodes. It is often bundled with some form of autonomic healing that continually checks the health of the data. When it finds data that is inaccessible or corrupted, it goes to a good copy of the data and makes another copy. The number of mirrored copies is typically decided by a user-set policy. Multi-copy mirroring is built into OpenStack Swift Object Storage and even into the Hadoop Distributed File System. It eliminates the need for RAID, and can dial up the resilience to survive as many concurrent drive failures or unrecoverable BERs as required. However, each copy of the data requires an additional 100% of usable storage. So, protecting against four concurrent drive failures or unrecoverable BERs would require 500% usable storage (i.e., 1 PB of data requires 5 PBs of usable storage). This can make MCM an expensive proposition.
Erasure codes, or forward error correction, also replace RAID, and the technology is quickly becoming the No. 1 choice for data durability in shops with large amounts of data. These codes primarily utilize information dispersal algorithms that separate and write data into n subsets or chunks of information. The chunks are then distributed to multiple drives, drawers or storage nodes. Chunks can also be distributed to disparate storage locations in the same physical location, campus, city, region, country or around the world. Reading EC data requires reading a subset (width) s of the total chunk n number. The width required to produce the data is directly tied to the level of resilience required against the total number of concurrent drive failures or unrecoverable BERs. A common EC has a breadth of 16 and a width of 12. This protects the data against four concurrent drive failures or unrecoverable BERs, drawer failures, storage node failures, site failures and so on. And it does so for a mere 33% storage overhead (i.e., 1 PB of data requires 1.33 PB of usable storage). That's a major reduction in cost versus multi-copy mirroring.
Similar to MCMs, ECs are commonly bundled with autonomic healing that automatically checks the data. When it finds a chunk that is not readable or accessible, it creates a new chunk somewhere else. This makes data far more durable than the underlying storage media would suggest, allowing it to be accessible for decades and potentially centuries.
Due to latency issues, ECs shouldn't be used with data requiring high performance. Small datasets are another problem, as they don't break up efficiently into chunks. However, ECs are an excellent choice for large data sets, large amounts of data and data that is less frequently accessed.
Erasure codes are most commonly found today in object storage systems from such vendors as Amplidata, Caringo, Cleversafe, DataDirect Networks, EMC, NEC and Scality. Expect ECs to become more prevalent in storage systems of all types as data storage continues with its unrelenting growth.
About the author:
Marc Staimer is founder and senior analyst at Dragon Slayer Consulting in Beaverton, Ore. The 15-year-old consulting practice focuses on the areas of strategic planning, product development, and market development. Marc can be reached at firstname.lastname@example.org.
This was first published in October 2013