Hard disk vs. flash storage: The fight of the century?
A comprehensive collection of articles, videos and more, hand-picked by our editors
Hard disk drives are hard to like. This 60-year-old device is slow and accesses data clumsily, but it stores volumes...
of data at a price that's hard to beat. Since the middle of the last decade, solid-state drive advocates have argued that SSDs would completely displace HDDs over the following few years. Yet here we are in 2017, and the HDD vs. SSD controversy continues. HDDs still play a greater role than SSDs in PCs and servers, and HDD unit shipments aren't much below 2005 levels.
Why hasn't flash SSD taken over the market? In a nutshell, it's because HDD replacement isn't how to make the best use of SSD technology. Figure 1 shows a continuum of memory and storage that fits into a tidy hierarchy. HDD vs. SSD technology plays a significant, but not the only, role in that hierarchy. Let's walk through this continuum and perform a detailed analysis to see why.
The chart represents price per gigabyte on the horizontal axis and bandwidth on the vertical axis. The axes are labeled in orders of magnitude -- 103 = 1,000, 104 = 10,000, etc. -- to help make the largest and smallest values clearly visible.
At the bottom left, we have tape, the slowest and cheapest element still in common use. At the top right we have the L1 cache, the fastest and most expensive memory in the system. Between these, from left to right, lie HDD; SSD, labeled NAND for this figure; dynamic RAM (DRAM); and then the various other cache levels. Everything fits together neatly as long as each orb is faster than the one below it, slower than the one above, cheaper than the orb to the right and more expensive than the orb to the left.
The memory-storage hierarchy uses algorithms to automatically move hot data into the faster and more costly elements of this hierarchy and cold data into slower and cheaper elements. This is performed at the cache-DRAM interface and at the different levels by the cache management logic. The DRAM-storage interface -- either HDD or SSD-NAND -- is managed by the demand-paging operating system, which is how virtual memory is implemented. SSDs are a relatively new element in this, so caching software, rather than the OS, manages the SSD-HDD interface. Most systems don't automate data management between tape and HDD.
To optimize a system's price/performance, IT managers must balance the amount of each memory or storage element they use. If they were to provide too much of any of these elements, the price would be higher than necessary. And if they were to use too little, then performance would suffer.
Using SSDs to reduce DRAM requirements
Because the speed difference between DRAM and HDD -- at about 1 million-to-1 -- is large, system administrators used to increase the system's DRAM to offset HDD delays. Since the introduction of inexpensive SSDs, this approach has lost its appeal. These same sys admins have learned that an SSD provides greater benefit than a larger DRAM for the same price. In other words, they're using SSDs, not to replace HDDs, but to reduce DRAM use.
HDD vs. SSD: The flash-cloud model
Some facilities have moved to a flash-cloud model where the cloud is used for cold storage. How does this change the pricing model? Because cloud latencies are considerably slower than HDD latencies, this approach appears to be a misfit for the storage-memory hierarchy of Figure 1. By inserting on-premises HDD storage before the cloud, users should be able to use an SSD plus HDD plus cloud configuration to achieve the same performance of an all-flash plus cloud system at a lower price in the same way that an SSD lets users reduce dynamic RAM usage. It's a simple case of balancing resources.
Cloud storage, meanwhile, is largely HDD based, so HDDs haven't been eliminated from the memory-storage hierarchy. They have simply been moved from one facility to another, with longer latency, different management and possibly a higher cost.
I'll explain this mind-bending find with a little math. As a rule of thumb, let's assume that DRAM is 1,000 times as fast as an SSD, and an SSD is 1,000 times as fast as an HDD. Furthermore, let's assume that $100 will buy you 10 times as many gigabytes of NAND-based SSD as DRAM.
Software accesses the memory-storage hierarchy in something similar to a bell curve; certain bytes will be accessed frequently over a short time period, other bytes far less frequently. Figure 2 is an illustration of this.
The base amount of DRAM in this chart is represented by the yellow box in Figure 3. The narrow distribution software takes advantage of this relatively small DRAM to perform well, but the broad distribution software will require much more DRAM to reach its full advantage.
In Figure 4, with double the amount of DRAM, we cover the vast majority of the accesses for the narrow distribution, but leave a significant number of accesses uncovered for the broad distribution. Each of these accesses, without an SSD, would suffer a million-times access penalty.
Instead of doubling DRAM, let's explore what happens when we add the same dollar-value worth of SSD storage to the chart. In Figure 5, the second yellow box is significantly wider than one of the DRAM boxes because you can buy 10 times as many gigabytes of SSD storage capacity for a given dollar amount than DRAM, albeit at a slower speed.
This configuration satisfies nearly all accesses of both the narrow and broad distribution programs, with the million-times penalty only hitting the smallest percentage of accesses. This is the mechanism that makes SSD a better choice than DRAM in these systems.
Let's put some numbers around that. With the minimum amount of DRAM, about 55% of the narrow distribution's accesses are satisfied. That is the area under the black curve in Figure 3 in the single DRAM box. When you double the DRAM, you satisfy about 90% of the narrow distribution's accesses, shown by the area under the black curve in Figure 4 in the two DRAM boxes. If we decide not to put in the additional DRAM, but instead add an SSD that costs the same amount as the additional DRAM, then only about 1% of accesses need to go to the HDD -- represented by the area under the black curve in Figure 5 that falls outside of both boxes.
Table 1 takes all of this into account to calculate the average latency of the HDD, SSD and DRAM portion of the memory-storage hierarchy. The latency figures are based on DRAM's latency: SSD's 103 latency is 1,000 times the latency of DRAM, and HDD's 106 latency is one million times that of DRAM.
Doubling the RAM satisfies 90% of all accesses, and only 10% suffer the million-times latency of the HDD. This sounds pretty good, but that million-times latency penalty has a huge impact, causing the average latency of the system to be about one-tenth that of the HDD, or 105. In other words, the system's overall average latency is close to 100,000 times the latency of DRAM.
Using SSDs instead of additional DRAM, then DRAM satisfies 55% of accesses, another 44% are satisfied at 1000 times DRAM's latency, and only 1% suffer the million-times penalty. The result is an average latency one-tenth that of the DRAM-only system, or about 104.
You can see that the percentages for the broad distribution software would receive a much greater benefit from using SSDs because the large DRAM covers a much smaller percentage of all accesses.
It can be difficult to determine whether software has a wide or narrow distribution, so most sys admins simply try their systems with varying amounts of DRAM and SSD storage to see what provides the best price/performance.
This is where SSDs help out in servers and storage -- not as a replacement for HDDs, but as a means of coaxing more performance out of a system than could be done with larger amounts of DRAM. A growing number of sys admins are learning this, moving the focus away from HDD vs. SSD to SSD vs. DRAM.
The best system doesn't provide too much of any one element in the memory-storage hierarchy, but instead balances them for the best price/performance.
The TCO argument
The argument above only pertains to performance vs. capital costs. There is also a good argument in favor of SSDs from the total cost of ownership (TCO) perspective. You can calculate TCO using a cost model, usually an Excel spreadsheet like the free one the Storage Networking Industry Association offers.
SSDs reduce system power and cooling requirements and, in some cases, lower maintenance costs. This tips the balance toward SSDs in facilities where capital spending and operating budgets fall under the same management umbrella. This situation holds true of large data centers, but in the majority of installations, these two budgets are managed separately and the TCO argument isn't considered.
Where TCO does play a part in storage decisions, it's subject to the input numbers used in the model. The large banking firm Citi has decided to phase all of its data centers into an all-flash approach. Citi will replace HDD-based systems at the end of their lives with flash, based on a TCO model that uses a two-year average life for HDDs and a five-year life for SSDs. Few HDD makers would agree with this assumption, but instead would argue that HDDs should be given the same lifetime as SSDs in these calculations, and that would tilt the TCO model's results toward HDDs.
HDD vs. SSD in PCs
What about PCs? Why do most PCs still ship with HDDs?
The HDD vs. SSD purchasing decision has more to do with consumer buying patterns than price/performance. In a typical PC advertisement, the price and manufacturer's name are listed at the top in large print. After that, the processor is named. Immediately below that the DRAM and HDD sizes are given in gigabytes or terabytes. Everything else falls below. Depending on the size of an ad, it may list additional specs and descriptions, but they're always preceded by the above figures.
Intel Optane's role in HDD vs. SSD
Intel has recently introduced its Optane memory products in PCIe nonvolatile memory express and DIMM formats. Both of these fit between dynamic RAM and NAND in Figure 1. And they will drive sys admins to rethink their storage hierarchy balance in favor of reduced DRAM needs as the Optane layer fills a growing speed gap between NAND flash and DRAM at a price below that of DRAM.
With this limited information, the consumer, who may not be so technically inclined, will choose which PC to buy. Should it be the one that costs $500 and has 2 GB of DRAM and a 1 TB HDD, or the one that also costs $500 but has the same 2 GB of DRAM and a 128 GB SSD? Alternatively, should the buyer pay $500 for a PC with 2 GB of DRAM and a 1 TB HDD or $900 for one with the same 2 GB of DRAM and a 1 TB SSD?
When you look at it this way and are interested in getting the most for your money, an SSD doesn't look that good. This is especially true if you don't know the SSD's benefits, or if you've used an SSD and all it did was make your PC boot and launch programs faster. With the notable exception of high-end games, most PC workloads fit into a relatively small DRAM, so an SSD doesn't provide other noticeable performance improvements.
From this perspective, it's easy to understand why the majority of PC purchasers steer away from SSDs to buy HDD-based PCs.
When will SSDs be cheaper than HDDs?
For more than a decade, SSD advocates have noted that SSD and HDD prices are rapidly converging, and the price differential between them will disappear, eliminating HDD's competitiveness over SSD. In reality, although this phenomenon has been projected for more than a decade, it's still a long way off.
Figure 6 shows historical HDD and NAND flash pricing.
A careful examination of this chart indicates that the lines are gradually converging, probably in 15 to 20 years. But, overall, they're moving on roughly parallel paths, and for good reason:
- SSD prices are driven by Moore's Law, with flash prices declining by about 30% per year.
- The storage capacity of an HDD is increasing at about 30% per year, according to the ASTC Technology Roadmap. And the highest-volume HDDs on the market, those that offer the highest storage capacity for the lowest price at any point in time, tend to be sold at the same price -- about $50 -- leading to a similar annual price reduction.
The bottom line is that SSDs won't be competing directly on price against capacity HDDs any time soon.
Room for HDDs
Now that we've looked at the use of HDD vs. SSD in the data center and PC applications, some simple conclusions are clear:
- The data center will continue to use HDDs for storage in order to maintain the best price/performance benefits, while carefully balancing capacity at all stages of the memory-storage hierarchy.
- PCs will continue to embrace HDDs for their capacity-per-dollar value even though sophisticated users may prefer the advantages SSDs offer.
There is, indeed, room for HDDs in a solid-state world, today and in the foreseeable future.
Why flash storage almost always comes out on top vis-à-vis HDDs
Learn more about the future of flash storage
What the rise of SSDs means for HDDs in the data center