Almost every all-flash array vendor has made a claim that flash storage prices have reached parity with disk-based arrays. This is despite the fact that, on a per-gigabyte basis, hard disks are still much less expensive. So, are these price-parity claims legitimate, and if so, how are they making the math work?
Performance flash vs. performance hard disk
It is important to remember that when an all-flash array vendor claims price parity with hard disk drives (HDDs), they are not comparing the array to a high-capacity, low-performance hard disk array, but rather with a high-performance HDD array. These arrays typically use much lower-capacity drives. This is because they need a large number of HDDs to meet the performance and capacity demands of the applications or environments they support. Also, the rest of the components of these systems are tuned to deliver a high-performance environment, which means high-priced CPUs and networking. As a result, an HDD-based system may have twice as many drives as an all-flash system, yet still deliver less performance while being more expensive than the equivalent flash array.
Beyond that, the high performance of flash allows for denser virtual machine implementations. This reduces the number of servers that need to be purchased for a given virtual server or virtual desktop environment, lowering overall cost. Flash also increases database performance. With flash storage, a single database server can support hundreds, if not thousands, more users. This reduces the number of database servers that need to be purchased and simplifies database design. It is easier to create and manage a large database on a single server than one that is scaled across multiple servers.
The other part of the equation impacting flash storage prices is the widespread use of data efficiency technologies like deduplication and compression, which essentially squeeze more data onto fewer flash drives. Again, because flash does not need additional drives to meet the performance demand, having fewer drives is not an issue for an all-flash array. Thanks to the raw performance of all-flash systems, data efficiency can be applied to these systems with almost no noticeable penalty.
The effectiveness of deduplication and compression largely depends on the type of data being stored. Generally, virtual server or desktop workloads will see the best return on the deduplication investment -- as much as 9:1. Databases, on the other hand, will see much less benefit from compression, typically around 3:1. The industry has settled on effective rates of 5:1, assuming some mixture of virtualization and databases on the same system.
While it is certainly reasonable that HDD-based systems can implement data efficiency, there is a greater concern about performance impact since, unlike flash systems, they do not have performance to spare. The problem is that deduplication in particular requires an extensive meta table structure to verify the uniqueness of data. The time it takes to traverse that table on an HDD-based system would have a noticeable performance impact on production applications.
The net impact of fewer physical devices (SSDs vs. HDDs) and the capacity gains created by deduplication and compression create a scenario where all-flash systems have not only reached price parity with disk-based systems but, in some cases, they actually have a price advantage. And these are hard cost savings that do not consider the soft cost savings of all-flash simplified storage tuning and design.
The all-flash delusion
Today, some vendors are predicting the coming of the all-flash data center. This prediction assumes HDD manufacturers and HDD storage system vendors will sit still and stop innovating. While there were a couple years where capacity per drive paused, those days are over. Thanks to helium drives and shingled magnetic recording, capacity per drive is back on the rise. Eight TB drives are slowly working their way into the data center and 10 TB drives should be a reality by the end of 2015.
Storage system vendors are offering new ways to deliver high-capacity drives to the data center -- particularly with object-based storage. These systems are scale-out in nature and use new data protection technologies to eliminate lengthy drive rebuilds. They also have the ability to store an unlimited number of objects (files) per volume. From a capacity standpoint, they can reach a much lower price per TB than flash systems, without using the data efficiency techniques described earlier.
These systems will be used to store unstructured data, the fastest growing type of data in the data center. Not only are users creating files from office productivity applications, but file data is also coming from machines and sensors. This data has an extremely high creation rate and it is often unique (not deduplicate-able). It will be very difficult, if not impossible, for flash storage prices to reach parity with these systems.
So, flash storage prices have indeed reached parity with performance HDD-based arrays and that part of the storage infrastructure will likely become all-flash in the near future. But virtualization and database environments will grow more slowly than the unstructured data sets, which HDD-based systems are much better equipped to handle.
Because of this, most data centers will likely have two tiers: A flash tier for virtual infrastructures and databases, and a high-capacity, object-based storage system to store the unstructured data set. The data center may never be able to consolidate all data into a single storage system, but the two-tier architecture is the best application of available tools for two vastly different jobs.
Solid-state storage buying guide
The real cost of flash storage
The most The most important ssd price factor considerations