Erasure coding, mirroring offer data protection for cloud storage

This year, a handful of data storage vendors are hyping erasure coding as a feature that can provide superior data protection and improve object storage technology. Erasure coding, which is an algorithm for slicing and dicing data, is currently available in a handful of cloud storage offerings.

Although erasure coding has been commercially viable for more than a decade, most storage customers are likely more familiar with multicopy mirroring, which is available in almost every cloud storage offering. Both technologies are alternatives to traditional RAID technology.

"The reason you're seeing all these alternatives to RAID is because of the time it takes to rebuild these high-density drives," said Marc Staimer, president of Dragon Slayer Consulting in Beaverton, Ore.

In his Storage Decisions seminar presentation "How to build a storage cloud: What applications are most 'cloud worthy' and who does what?," Staimer discusses erasure coding and multicopy mirroring technology. But, he noted, "This isn't an either-or situation. You'll find multicopy mirroring in some cloud storage that uses erasure codes."

Using erasure codes is a very CPU-intensive process, Staimer warned, because "they affect latency. It's really meant for data that isn't accessed frequently."

On the flipside, the technology does consume less storage than multicopy mirroring.

That's because multicopy mirroring essentially makes another copy of the data. For erasure codes, one copy of the data means roughly "one and one-fourth times my total storage," Staimer said. "Four copies [as would be required in a typical multicopy mirroring situation] is going to need four times the amount of storage."

Despite the extra storage required, multicopy mirroring still offers huge resiliency advantages over RAID 6. "If a copy turns up bad during a checksum … it will pull from another copy of the data," Staimer explained. "When [a copy] comes up unhealthy, it will just call from a good copy and delete the one that's not healthy."

And erasure coding can improve upon that. When it writes the data, it's breaking it into chunks of metadata. That means each chunk has descriptive data about the entire data set. So when reading just a few of those chunks, you can actually read all the data.

Staimer said erasure code technology is at least 10,000 times more resilient than RAID 6. "When the absolutely most important thing, like in an archive, is data resiliency, you might want to consider [erasure coding]," he explained.

To learn more about erasure coding and multicopy mirroring as they relate to cloud storage and data resiliency, view this expert video.

View All Videos