Enterprise IT shops are well familiar with the most common forms of erasure coding -- RAID 5 and RAID 6 -- but they're likely to hear more about erasure codes that can protect against data loss from more than two disks, storage nodes or geographic locations.
In this podcast interview, Ethan Miller, a professor of computer science at the University of California, Santa Cruz, describes how erasure coding works, the different types, the applications and workloads for which erasure coding is most appropriate and the distinguishing characteristics between products that support erasure coding. He also shares his views on the future potential of erasure coding.
Professor Miller does research on erasure coding and how to use it in storage systems. In addition to his work at the university, Miller is a part-time employee at Pure Storage Inc., which sells enterprise solid-state storage arrays. Erasure coding is not the primary focus of his work at the company, and Pure Storage's products currently use no erasure coding beyond combinations of RAID 5 and RAID 6.
Can you provide us with a good working definition of erasure coding?
Ethan Miller: Erasure coding is a set of algorithms that allows the reconstruction of missing data from a set of original data. For example, if I had six pieces of data that I wanted to protect, I could use an erasure coding algorithm to generate two additional pieces, giving me a total of eight, and at least for the types of erasure coding we'll probably be talking about, any of the six pieces will be sufficient to rebuild the missing two.
Now the point of erasure coding is that you can pick the number of original data pieces to be as large as you want essentially. I know of some that go up to 200 and some odd number. And you can pick the number of duplicates to be as large as you want also. So, I might say that I would have 10 original data pieces and eight redundant ones, giving me a total of 18, and any 10 would allow me to rebuild the original data.
Just about every IT shop uses RAID 5 and RAID 6, which are very commonly used types of erasure coding. Most IT shops don't really have to worry too much about using erasure codes that protect against more than two data losses, at least for now. And the reason is that the chances of two devices that are part of the same group failing simultaneously -- that's relatively small. If you run a very large IT shop, let's say a petabyte or more, maybe you do want to think about alternate ways of doing erasure coding that can protect against more than two data losses simultaneously.
Are there different types of erasure coding?
Miller: Yes, there are two basic types of which people should be aware. In one type -- and something called Reed-Solomon falls into this category -- if you had let's say 12 data elements and four erasure code elements, any 12 elements from that group of 16 would be enough to rebuild the missing ones. Any 12. It doesn't matter which four fail; you can always rebuild.
There is a second type of erasure codes that are becoming a little more common today, where again you might have 12 data elements and four erasure code elements, and the erasure code can recover from most failures of four elements but not all failures. Now of course, this is slightly more risky, so you must be getting something back in return. And what you get back in return is that instead of needing 12 elements to rebuild anything missing, you might need only three, four, five, six elements. So you might need to read a lot less to rebuild, but the tradeoff is there are some sets of four failures that you can't recover from.
So when you're looking at a system, try to ensure, if you care about it, that they use something like Reed-Solomon. But if you're looking for performance, maybe the slight chance of four failures being a loss of data -- maybe that's something that is more attractive to you. It's important to be aware of these differences. Depending on your application, you may choose one or the other.
For which applications, workloads and types of data is erasure coding most appropriate?
Miller: One type of data for which it's extremely appropriate is archival data. And this is because for archival data, it's going to be around for a long time. The chance of losing one or even two or more devices goes up dramatically.
The second thing about archival data that makes it very amenable to erasure coding is you don't write it very often. You write it once, and then maybe you'll read it later on and maybe you won't. Erasure coding imposes most of its overhead on writes, but doesn't impose very much overhead on reads. In fact, the only overhead on reads is if there's a failure. If there's no failure, erasure codes really don't have an impact on reads in most cases.
For applications, that means that if you have a very active thing, where there's lots of reading and writing going on, erasure codes might not be the right choice for you. But if you have applications which are largely read-only, erasure codes can give you very good reliability, resistance to losing data at relatively little overhead, because the overhead in terms of performance comes at write time.
What distinguishes one product's erasure coding from another product's erasure coding?
Miller: I think the first thing that distinguishes them is basic performance. In other words, based on their implementation, how fast is it at writing? And how fast is it at reading when you're doing recovery?
Another question is how do they break up the units across which erasure coding is done? It could be entire disks. It could be volumes. It could be parts of disks or devices. It could be any of those things. Obviously there's a question of how big the erasure coding word is -- 10 data devices and five erasure coded devices versus 12 and four -- but usually most products are pretty configurable in that respect.
Another difference which is starting to become an issue, but isn't one necessarily, is multiple types of erasure codes. Most products that are out there use Reed-Solomon coding, and all Reed-Solomon coding works pretty much the same way, [dependent on] how fast your implementation is. But there are other forms of erasure codes where, in order to make things faster to recover, for example, they'll say that instead of being able to recover from all failures of four devices, we can recover from 99.9% of failures of four devices but not all of them. Making that small tradeoff could make your erasure coding faster, but it does slightly increase your risk. So in situations like that, it's important to understand: Can I recover from all four device failures or just most of them?
And so, there are little things like that around the edges of things that will differ between different companies' implementations. It's important to try and get a little bit more background in how an erasure code works and what kinds of failures it can recover from in deciding the difference between one product's erasure coding and another.
Do you think erasure coding is the wave of the future?
Miller: I think that for some things, erasure coding is the wave of the future. Certainly for archival storage, erasure coding in my opinion is an absolute must, because if you're going to store data for 10 or 20 years, you're going to want to survive a lot more failures. And again, for long-term archival storage, you're almost always reading it. As we said before, the only performance overhead is on writing. So for archival storage, erasure coding is absolutely the wave of the future. I think there's no question about that.
For other things, I think the question of whether erasure coding will become the wave of the future depends on the relative speed difference between the capacity of the device and how long it takes to read or write it. The bigger devices get relative to the speed at which you read and write them, the more erasure coding matters. That's why we've gone from RAID 5 back in the mid-1990s, when I was part of the RAID group at UC Berkeley, to RAID 6 today. It's because back when I was starting, it took you literally a few minutes to read the entire hard drive. Now it can take you the better part of a day. If it goes to a week to read your media at maximum speed, you might need even better erasure codes, because in the time you spend to rebuild it, a week, you might have two or three failures. You want to be able to recover from those. So I think that as our storage devices get bigger and not necessarily faster at the same rate, we're going to need erasure coding more and more.