We interviewed Fusion-io Inc. CTO David Flynn for one of our news stories today–here’s some nitty-gritty bonus footage on how the company’s product goes about protecting data, and how that compares to spinning-disk systems.
Beth: So one ioDrive is 320 GB. Is data striped across all the chips or do you have separate data sets?
Flynn: Each one of the Flash modules looks like a volume and you can either stripe them or mirror them to make them look like one volume. Or is you have multiple cards you can aggregate all of those volumes with RAID 10. We have RAID-5 like redundancy on the chips, then RAID between the memory modules. What we’ve come to realize after we introduced FlashBack is that it actually lets you get more capacity.
Most SSDs are 64 GB at most—32 GB, 64 GB. With this technology we put five to 10 times as many chips within our card. That would increase the failure rate because the individual chip’s failure rates add up. With our ability to compensate, we can get to higher capacities, and with that we can increase endurance, because you can spread the data out.
Internally it’s more like RAID 50 because I have eight die in my redundancy chip. There’s one parity die for each package. It’s 24+1 and then that quantity times eight, because there’s eight of those sets. If you were to line it up like disk drives, it would look exactly like that, 24 disk drives and then an extra one, 8 rows. So when we talk about this as a SAN in the palm of your hand we really mean it, because we’ve taken die within the various NAND packages and arrayed them together just like a disk array. It’s also self-healing in that if you have a fault the system reconstructs the data that otherwise might’ve gone missing and moves it to a different spot and turns off the use of the spot that failed. You don’t have to service it. It automatically just maps it out. Like Xiotech’s ISE product—that’s bleeding edge stuff for disk arrays, and it’s built into the silicon here.
What about double parity protection? That’s all the rage in the disk drive world these days. What if more than one die fails at once?
For us to rebuild and heal takes a split second. Having a second failure during that time is not going to happen. It takes so long to rebuild a disk drive—it can take more than a day now—that the probability of a double failure goes up. The other thing is that disk drive failures are often highly correlated—the drives come from the same batch. They tend to fail randomly but close to each other in time. Our portfolio does cover n+m redundancy as well as N+1 because we anticipate a day when we’re putting not hundreds of these die on the boards but thousands and going into the tens and hundreds of thousands.
At the same time the Flash memory has finite write endurance, so they are all going to wear out at some point. So how do you compensate for that?
We account for how many write cycles it’s been through so we can give somebody a running…like an odometer, for tread wear on a tire. You can go five years or 50,000 miles. We warranty it, and you can swap out the modules without needing a new carrier card. Because we have such high capacity we naturally get a longer lifespan. It’ll last for 5 years even if you’re doing nothing but writing constantly. Wear-out has been overrated I think because most of the failures people are seeing have nothing to do with wear-out, they have to do with internal events that cause chips to lose data.
Here’s the four factors. This is the dirty little secret of the NAND world—it’s the newest fab process, which means it has its kinks. It’s the tightest feature size—they’re going to 32 nm. The density of the array of cells is achieved by sharing control lines. And then, fourth, and the real killer, to move the electrons into the floating gate cell it takes 20 volts internally. Most core voltages are well under a volt nowadays.
These four factors mean having a short-out event on one of these tiny little control lines—if you have just one chip it’s no big deal, it’s 40 out of a million. Which for a thumb drive, nobody would notice—it’s more likely to get shorted out in your pocket. But when you put hundreds of them together, now you have hundreds of those 40 out of a million chances to have something go bad, and that actually adds up to be something like one or two percent of these things fielded would have a data loss event. For a normal SSD the way they compensate is to put fewer chips on it or try to sweep the problem under the carpet—what they say if you talk to them is, ‘Well, we screen it very well, we run it in advance to make sure it’s not going to happen.’ You can screen it up front but there’s still probabilities of failure.
Here’s the thing: disk drives wear out, too. The trouble is, it’s unpredictable. One of the strongest motivators to going to solid state technology is the predictability of when you’re going to need to service it. And after a couple of years, you’re going to be able to replace it for a fraction of what it cost initially.