Q
Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

What's the best way to protect against HDD failure?

HDD failure can put bytes of data at risk. Is multi-copy mirroring or erasure coding the more efficient data protection approach?

Erasure coding and multi-copy mirroring were developed in response to the inability of traditional RAID to keep...

up with hard disk drive (HDD) density gains. Even as HDDs have increased in areal density, they have not improved the bit error rate or number of heads per platter. The probability of a non-recoverable bit error has increased, raising the potential of HDD failure and subsequent RAID group data loss. Slowing speeds per gigabyte increase HDD rebuild times, as well as the risk window for concurrent HDD failure and RAID group data loss.

RAID 6, RAID 60 and RAID 6 triple parity have helped to a degree; however, long HDD rebuild times and the adrenalin heart attack-inducing drills caused by HDD failure created an urgent need for a sound alternative. This became increasingly obvious -- especially for nearline data that must be retained for years or even decades with no way to recreate it should it be lost.

Multi-copy mirroring solves the problem by making multiple copies of the data on different HDDs behind various storage controllers (commonly called nodes). When an HDD failure occurs or the HDD has a non-recoverable bit error, a good copy of the data is simply copied to another drive. The number of concurrent HDD or node failures that can be tolerated determines the number of copies: two concurrent failures require two copies of the data, while three concurrent failures require three copies of the data. Copying data from another good copy makes this a very fast data protection and recovery option, but it is very expensive. Each copy of the data consumes additional storage capacity, which adds up quickly.

Erasure coding is designed to be more efficient because it breaks data into chunks. The number of total chunks is called the width, while the number of chunks required to read the entire datagram is called the breadth. Each chunk has part of the data or a representation of the data (such as a formula) and metadata information about the whole datagram. Common width-to-breadth ratios for erasure codes are 16:10, meaning once the first 10 chunks are read the entire datagram is recreated. If any chunks (up to six) are missing, they are recreated and written to other HDDs and/or nodes.

Erasure coding is also much more economical than multi-copy mirroring. The 16:10 example protects against up to six concurrent HDD or node failures without losing a byte of data. To do so only requires 60% more storage vs. the 600% needed for multi-copy mirroring. If the width-to-breadth ratio was 26:20, the additional storage consumed would be a mere 30% and still protect against up to six concurrent HDD or node failures. The downside is that chunking adds considerable processing overhead, slowing writes and reads. This makes erasure coding mostly useful for secondary data or nearline storage, such as public and private cloud object storage.

Next Steps

Erasure coding provides drive-level protection

Three ways to use RAID to prevent multiple drive failures

Pros and cons of erasure coding vs. RAID

Video: Explore RAID vs. erasuring coding for data protection

This was last published in June 2015

PRO+

Content

Find more PRO+ content and other member only offers, here.

Essential Guide

Hard disk vs. flash storage: The fight of the century?

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What is your preferred method to prevent HDD failure?
Cancel

-ADS BY GOOGLE

SearchSolidStateStorage

SearchCloudStorage

SearchDisasterRecovery

SearchDataBackup

Close