Managing and protecting all enterprise data

luchschen_shutter - Fotolia

SSD reliability, performance aid flash storage adoption

Solid-state storage is most often viewed in terms of performance, but concerns about SSD reliability may be unwarranted.

Conversations about solid-state storage have focused primarily on performance -- understandably so. As IT demands grow, so too do the demands for performance. As a result, interest in solid-state storage has grown regardless of the form -- all-flash arrays, hybrid arrays, or even solid-state server-side caching products. While much has been said about the obvious performance benefits, we are starting to see evidence that there may also be benefits to data reliability.

Enterprise Strategy Group (ESG) recently polled 373 storage decision-makers on a wide variety of enterprise storage technology topics. Respondents familiar with solid-state storage technology were asked to select the most important factor that led them to consider solid-state storage. It wasn't surprising that the top response was improved performance, but the second most popular answer -- improved solid-state drive (SSD) reliability -- was less anticipated.

It may seem logical that fewer moving parts translates into higher reliability, but a quick glance at spec sheets to compare the mean time between failure or annualized failure rate statistics for SSDs and hard disk drives (HDDs) shows little difference.

Cheaper prices + lower endurance = higher SSD adoption?

The fact that organizations are moving to solid-state storage for higher SSD reliability may be even more significant when you consider that the industry is currently in the middle of a massive undertaking to drive down the price of solid-state storage by reducing one aspect of SSD reliability, the endurance level. As solid-state deployments shift from higher-endurance SLC flash to MLC or TLC, the relative cost of storing data is reduced by increasing the capacity density an individual cell can store. As the level of data density increases, the amount of program-erase (P/E) cycles, or wear, an individual cell can endure subsequently decreases.

With solid-state storage, each write operation generates a P/E cycle, and each P/E cycle damages the media to a small extent. Different levels of solid-state technology, such as SLC or MLC, support a different number of P/E cycles over the life of the drive. That's the endurance level. SLC-based solid-state may, for example, support endurance levels as high as 100,000 write cycles. With TLC solid-state, the endurance levels may be as low as 1,000 write cycles per cell.

While endurance levels can be used to project how much data can be written to solid-state storage over the life of the media, it's also important to note that the calculation is not as simple as multiplying the endurance level by the capacity. Other software factors come into play. Behind the scenes activities, such as wear leveling and garbage collection, alter the number of P/E cycles endured by each write operation. These impacts vary by storage manufacturer. Even if those activities increase the number of P/E cycles per write by a factor of five to one, a 1 TB SSD with an endurance of 1,000 write cycles could be able to support 200 TB of writes over its lifetime. While this represents a significant decrease from the higher endurance levels supported by SLC, the real-world impact may not be as significant.

The industry is still learning how many writes different applications require over a typical three- to five-year storage lifecycle. The strong reliability perception of solid-state storage in the wake of decreasing endurance levels may simply be the result of initial endurance levels offered by SLC being over-engineered in the first place so that they exceeded the requirements of a typical application environment.

SSD reliability depends on data unavailability, loss

Another more plausible explanation may be that perceived SSD reliability is more dependent on reducing the risk to data, rather than how often a particular technology component fails. If an SSD or an HDD fails, the data must be recreated in a separate location. Solid-state storage performance dramatically accelerates this rebuild process. So even in cases where solid-state storage and HDDs experience the same failure rates, the amount of time spent in a degraded state will likely be dramatically reduced with solid-state storage.

Additionally, the manner in which a device fails can also affect data unavailability or loss. Spinning hard drives are susceptible to a variety of unpredictable mechanical issues, such as scratches or head crashes, plus failures due to excessive heat or dust. Solid-state storage isn't susceptible to mechanical failures and can survive at higher temperatures. Common solid-state failures are often software-related, such as firmware issues, or from exceeding the endurance levels discussed earlier. But those types of failures are often more predictable and trigger remedial actions before a failure actually occurs.

Solid-state storage offers numerous capabilities that help reduce the risks of storing data, and organizations familiar with solid-state are catching on. As solid-state endurance levels and prices continue to decline, the resulting increased adoption rate will help the industry become better educated on the SSD reliability trade-offs for different application types. For now, however, many concerns about endurance may be unwarranted.

Article 7 of 8

Next Steps

Solve the SSD reliability challenge

Take a closer at look at solid state storage reliability

Measure SSD performance, including reliability

Dig Deeper on Solid-state storage

Get More Storage

Access to all of our back issues View All