Published: 04 May 2017
The concept of an all-flash data center has been floating around for several years, driven by the continual decrease in cost of the media and ongoing improvement in the technology that surrounds it. The result is almost every all-flash vendor is claiming price parity with hard disk-based systems.
If you can substantiate the claim of price parity with HDDs, and apply it liberally, then there is no reason for an organization not to go all flash. Imagine a data center where every I/O request is answered instantly, no matter the type of data. It would permanently eliminate data management and performance tuning headaches. The future of all-flash storage systems is bright. The cost per gigabyte of flash media continues to decline, while the software technology managing those systems delivers increasing levels of reliability. Storage system software is also evolving, becoming increasingly multithreaded to extract additional performance out of CPU cores to keep pace with flash performance in a more cost-effective way.
Hurdles of the all-flash data center
In reality, asterisks associated with the price parity claim make creating a single-flash tier difficult. First, the price comparison is being made to high-performance hard disk systems, not more cost-effective high-capacity systems. Second, most all-flash vendors design their arrays for block-based data. And while some incorporate NAS functionality, a few have also integrated object storage, which is quickly becoming the storage method of choice for unstructured, machine-generated data. There are some exceptions, but generally, most unstructured data use cases simply can't take advantage of flash performance. Flash is almost always a perfect match for block storage, but not always the ideal storage type for NAS storage or objects. Hard disk drives are often the better, more cost-effective choice. Third, even within the flash category, there are multiple types of flash media, each better for certain use cases (see "Capacity-centric all-flash").
Then there is the data. Even though organizations create and store more data than ever, they will never access most of that data again after the initial creation and modification phase. But, of course, we keep that data "just in case." If there weren't cheaper alternatives, then storing all this data on flash would be fine. So when just in case occurs, the response would be instant.
The problem is, at least for all-flash data center proponents, there are less expensive storage options available, three to be exact: tape, high-capacity HDDs and the cloud. Each of these has its negatives: Tape is operationally expensive, cloud introduces new security considerations and all have administrative challenges with transferring data from an all-flash tier. But there is little doubt moving older data to one of these older and slower technology tiers should save money.
While high-capacity hard disk drive arrays, tape and public cloud remain less expensive on a cost-per-gigabyte basis vs. performance-oriented all-flash arrays, there are high-capacity all-flash arrays that may sway some organizations away from more traditional storage methods. These capacity-first all-flash arrays leverage an advantage that, as of yet, HDDs have not been able to compete with: density.
Because flash storage is memory-based, it is possible to squeeze far more capacity into a rack unit. Some capacity-focused all-flash arrays even deliver petabytes of capacity in just a few rack units.
These systems are flash-based, so the performance, while excellent, won't be as good as a high-performance flash array. The reason for the difference is these systems don't offer the processing power of a performance-orientated all-flash array. While the speed differences between flash media used is negligible, the ability of the surrounding system is compromised. But for particular use cases (e.g., analytics, active archive and secondary storage), performance is well ahead of any other competing technology.
That brings us back to the original question, "How much flash is enough?" The answer -- as is almost always the case in IT -- is, it depends. Generally, use as little flash as possible, as long as you can find cost-compelling alternatives. To realize these savings, you will need a method to move old data from high-performance storage to cost-effective storage as seamlessly as possible, however. If transparent data movement isn't achievable, then the savings gained by implementing a cost-effective tier of storage will quickly be lost.
That means the deciding factor to the all-flash data center is the effectiveness of your data movement strategy.
Effective data movement strategies
The concept of moving inactive data from an expensive tier of storage to a less expensive tier has been around for decades. The problem is most data movement options were terrible, and IT professionals came to the conclusion that managing their data wasn't worth the effort. But things changed in recent years, and moving data between tiers of storage is now easier to implement and manage.
The first step making the movement of data easier was the introduction of hybrid storage systems. These storage systems move data within themselves, typically from a small flash tier to a large hard disk tier. Interestingly, most hybrid vendors now also provide all-flash products and indicate their all-flash arrays outsell their hybrid systems.
Some vendors point to their all-flash products outselling hybrid systems as proof that the move to the all-flash data center is well underway. While hybrid systems make sense for many environments, the appeal of all-flash is too alluring. The challenge for hybrid systems is they need to be capable of delivering all-flash performance and hard disk performance. Since the internals -- compute, memory, networking -- of a hybrid system must sustain the performance capabilities of flash, the price of those internals will be higher than if it were a standard array that only needed to sustain the performance capabilities of hard disk drives.
To an extent then, all-flash vendors are correct in stating that for high-performance, active or near-active data, a couple of flash systems may be ideal. The future of hybrid systems (see "The role of hybrid arrays") is in managing different types of flash, not in managing flash and HDDs.
The role of hybrid arrays
With all-flash arrays capturing more and more share of the primary storage market, what is the role of hybrid arrays today and in the future? Essentially, hybrid arrays were gateways to all-flash arrays. While the use case for a single storage system with mixed flash and hard disk drives is still practical, the reality is most organizations will decide that all-flash is simply the path of least resistance. As a result, most hybrid-array vendors now deliver all-flash configurations.
Despite the all-flash movement, hybrid vendors will see a renaissance in the use of their technology. It will be able to move data from faster, lower-latency forms of flash-like memory buses, or nonvolatile memory express flash, to SAS- or SATA-connected flash.
So, in spite of what you may have heard, hybrid vendors are well positioned to leverage nonvolatile dynamic RAM as a small, primary tier and then destage to a flash tier. It won't be hard disk, but it will still be hybrid.
There is still inactive data to contend with, but there must be a more cost-effective way to store it. A storage system solely focused on that data type enjoys significant cost-per-gigabyte advantage over hybrid arrays, but it introduces a separate system. Data centers need to move data between these different types of storage systems on a policy-driven basis. The good news is there are a growing number of software products to do just that.
Hardware-independent data movement
Hardware-independent data movement products come in several forms. The first is a global file system. These are file systems that stretch across physical storage hardware components and move data between them based on policy. The types of hardware that can be part of a global file system is almost unlimited, with most supporting movement between flash and hard disk-based systems, some supporting movement to cloud storage (private and public) and some even supporting movement to tape. Given that global file systems are enhanced file systems, they limit support to data that can reside on NFS or SMB mounts. All the storage hardware part of a global file system must also be able to create an NFS or SMB share for it to manage.
In addition to global file systems, there's a new type of hardware-independent data movement from companies such as Primary Data, ioFabric and Strongbox. Their software does not require all data be hosted on a file share or all participating storage hardware to present its storage as an NFS or SMB share. Some of these can even encompass block or object-based storage systems and move data between them.
In either case, hardware-independent data movement allows you to implement all-flash arrays to increase performance of active data and less expensive, capacity-optimized storage systems for inactive data. It transparently moves data between these task-specific systems based on policies established by the organization, the most common of which would be "If data has not been accessed in a set number of days, move it from the flash arrays to capacity storage."
So while all-flash arrays are at price parity with hard disk-based arrays for active data, there are still less expensive alternatives for inactive data. High capacity HDD arrays, object storage, public cloud storage and even tape perform adequately at a lower price than flash arrays. The case for an all-flash data center is that the effort involved in deciding what data should reside where, and how to move that data to the alternate storage types can be more costly than the upcharge for flash everywhere. The problem is this stance ignores just how inexpensive capacity storage products are and how easy it is becoming to seamlessly move data to them.
Most data centers should only consider all-flash arrays for active and near-active data, at most 20% of total capacity. The remaining 80% of data center storage should reside on capacity storage -- high-capacity NAS, object storage or public cloud storage.
So maybe we should be talking about one-fifth flash data center instead of an all-flash data center.
2016 Products of the Year in the all-flash category
All-flash storage holds cost advantage over hard disk drives
SolidFire CTO goes over company's all-flash qualities