BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
What advice do you have for storage IT professionals on the decision between erasure coding and replication?
Most IT shops at this point for their local storage are going with erasure coding. People tend to install RAID 5 and RAID 6 arrays, and those are forms of erasure coding. But, if you're concerned about disaster recovery -- and what happens if one of your sites burns down -- you probably want to have replication between sites. Most data centers don't have more than two or perhaps three sites: primary and one or two backups. In such a situation, you're going to have replication between your data centers, since there aren't enough sites to warrant erasure coding in any form. Of course, you would still have erasure coding at each individual site, likely in the form of RAID 5 or RAID 6, to make individual sites more reliable. But, for the vast majority of IT shops, there's no benefit to doing erasure coding between multiple sites.
If you use two to three cloud providers and three sites of your own, you could do erasure coding between sites. But, the difficulty is that erasure coding requires that you read much of the surviving data to recover, and that would be impractical with two to three cloud providers and three private sites. You would need to read three of them to get any piece of data if you started with six sites (three cloud providers, three sites of your own).
Inter-site erasure coding makes sense for recovering from disasters, as long as you're OK with waiting days to get back up. If you need to be up in minutes, you need replication, or at least a very high-bandwidth/low-latency network between all of your sites.
About the expert:
Ethan Miller is a professor of computer science at the University of California at Santa Cruz. Miller does research on erasure coding and how to use it in storage systems. In addition to his work at the university, Miller works part time at Pure Storage Inc., which sells enterprise solid-state storage arrays. But, erasure coding is not the primary focus of his work at the company, and Pure Storage's products currently use no erasure coding beyond combinations of RAID 5 and RAID 6.