News Stay informed about the latest enterprise technology news and product updates.

IBM sheds light on work to boost flash endurance

IBM has finally hopped onto the bandwagon with solid-state array vendors that use multilevel cell (MLC) NAND technology and guarantee the read/write endurance of flash modules. Those changes came after lots of behind-the-scenes work.

Engineers from the company’s Texas Memory Systems acquisition and IBM researchers from Zurich and other locations combined to develop the new FlashCore technology at the heart of the FlashSystem V9000 and 900 arrays.

As only a small number of vendors do, IBM buys NAND chips and makes the modules that go into its all-flash arrays (AFAs). But last year IBM was the only AFA vendor to make flash drives using enterprise MLC flash (eMLC). In the summer, IBM Fellow and CTO Andrew Walls said eMLC was as an important part of IBM’s strategy, bringing a 10x improvement in endurance over typical MLC-based solid-state drives (SSDs).

Last week, Walls said, “Our design goal with the FlashCore technology, with our advanced flash management, was to take endurance out of the equation. You simply run it and don’t worry about it.”

Flash can wear out over time due to the program/erase process for writing data to NAND chips. All the bits in a flash block need to be erased before a write takes places. The program/erase process eventually breaks down the oxide layer that traps electrons at floating gate transistors, leading to errors. The industry’s wear-out figures for eMLC flash are about 30,000 program/erase cycles and, for MLC, 10,000 or even as few as 3,000 cycles.

But anecdotal evidence is mounting that flash is not wearing out as once feared.

“It’s not happening at all,” asserted Gartner Research VP Joe Unsworth, speaking at IBM’s FlashSystem launch event last week. “We see very few failures of drives period, and of course, let’s not forget SSDs fail predictably. So, you can see as this occurs. Right now, we’re seeing about every six months, 2% to 4% flash wear across the solid-state array. That’s not much at all.”

Plenty of vendors have worked hard to improve the endurance of flash. Here’s a glimpse of what Walls said IBM did to improve the endurance of its MicroLatency flash modules without sacrificing performance or low latency.

—Collaborated with Micron, which provided the interface to the “inner workings of the flash,” enabling IBM to monitor and control the flash and change read thresholds.

—Set up a characterization lab in Poughkeepsie, New York, to test flash devices and observe how flash blocks behave as engineers tried different error correcting code (ECC) and garbage collection algorithms and other techniques.

—Developed an ECC algorithm that Walls said allows IBM to correct a high bit error rate and read data only once. “That is a significant step forward. It also allows us to stay in FPGA technology, and it is an algorithm that allows us to get extremely good endurance,” he said.

—Developed health binning and heat segregation technology instead of using the symmetric wear-leveling algorithms that Walls said ensure all cells handle about the same amount of writes in typical SSDs.

“When you do that, unfortunately the endurance of your flash is now going to be determined by your weakest cells, because they’re going to get punished the same as all the rest, and you will wear out depending on that,” he said.

Walls compared IBM’s approach to pack mules in the Grand Canyon carrying loads of 50 pounds, 100 pounds or 200 pounds to enable them to do the job with half the number of animals.

“We monitor the health and assess the health of each flash block as they age, and we determine and grade each of the flash blocks. The flash blocks that are the healthiest [are] going to get the hottest data,” Walls said. He said, as flash blocks age and get weaker, they handle colder data.

“That technique alone has given us a 57% improvement in endurance in most typical workloads,” he said.

Walls claimed that IBM reduced write amplification by up to 45% by grouping like heat levels.

One result of IBM’s efforts is a new FlashSystem Tier 1 Guarantee, which includes “MicroLatency” performance and read/write endurance for up to seven years as long the system is under warranty or maintenance.

That brings IBM up to par with other all-flash array vendors. In June 2014, when published a guide to 15 all-flash arrays, IBM was the only vendor that would not replace flash modules if they wore out before the warranty expired. Dell’s Compellent all-flash model noted a caveat that the SSDs had to be within the “rated life” period. None of the other vendors mentioned any restrictions, although the length of their guarantees varied.