Flash storage systems changed the enterprise. The move to an all-flash array has almost eliminated storage performance...
problems, for now. But user expectations and sophistication of applications will quickly catch up, and the need to improve performance will never stop. Storage vendors are not standing still, however, and there are several innovations on the horizon that will allow all-flash systems to stay ahead of users' expectations.
Making flash faster
The key to making flash faster has very little to do with the media itself, actually, and more to do with the infrastructure that surrounds the media. For the most part, flash -- as density increased -- has actually gotten slower, especially on write I/O. That said, performance is still substantially better than hard disk alternatives and still much faster and less latent than the components that surround the media.
The challenge facing flash vendors is the media is so fast and so latent-free that the rest of the solid-state package slows it down. Whether it is a flash drive or an all-flash array, vendors need to improve the packaging in order to improve performance.
It's the CPU
Today's flash storage systems are primarily software and most often run on relatively standard Intel server hardware. At the heart of the hardware is the CPU. The faster the CPU, the faster the software executes and the faster the all-flash array appears to be. In fact, most performance upgrades to all-flash arrays over the past three to four years have had much more to do with the power of the CPU than improvements to the media itself.
The problem facing storage software vendors is that the way CPUs are becoming more powerful is not as much from raw speed boosts as it is from increasing core density. Only a few vendors have fully exploited multithreading to correctly leverage the cores in storage hardware that their software runs on. Those vendors who have exploited multithreading have achieved industry-leading performance with fewer CPUs (since they can leverage all the available cores), providing them with a competitive cost advantage.
More efficient storage services
Storage systems, by and large, are known for the features they deliver -- especially all-flash storage systems. In addition to standard software features like snapshots and replication, most all-flash arrays provide cost-saving features like deduplication and compression. Hybrid flash storage systems, meanwhile, automatically move data between flash and HDD tiers. Eventually, this data movement may happen between multiple types of flash offering different levels of performance.
Are hard drives dead?
With all the advancements in flash storage and the publicity that the technology gets, it is fair to ask about the future of the hard disk drive. Most flash vendors now claim price parity with HDD-based systems. If an organization can get a flash array for the same price as a hard drive system, why buy a hard disk?
First, you have to closely examine the first part of that question. Have flash systems really reached price parity with HDD systems? When comparing flash to high-performance hard disk arrays, the answer is yes. But when comparing to capacity HDDs, the answer is, in general, no. Modern object storage systems can safely use 8 TB-plus hard drives, and even apply deduplication to them and maintain a considerable cost advantage over flash systems.
Certainly, there is a significant performance difference, but for data that is being archived or simply doesn't require the performance of a flash array, these systems are more cost-effective options.
The problem is each of these features requires computational overhead and, in most cases, adds to the I/O burden. Software vendors are working on making their applications more efficient so they reduce the amount of latency their products add to the overall flash storage system. Obviously, one way to address this is to leverage multicore processors, as described above. In addition, vendors need to improve duplication and compression efficiency. This improvement will come largely by changing the way the array manages the metadata overhead that each of these features requires.
NVMe: Faster flash connections
Another area to explore is the connections within the flash array. Today, most all-flash arrays are essentially servers running storage software. Those servers have CPUs connected to the flash drives, typically through a SAS connection. While SAS has plenty of raw bandwidth, the technology was designed in the hard drive era, not the flash era. That means it uses standard SCSI protocols to attach SAS flash drives.
The SCSI protocol adds latency, so vendors looked for something better, with some even creating their own proprietary protocols. While these proprietary protocols improved performance, if left to continue, every flash vendor offering would require its own driver. In the enterprise, this means that one server would need a flash driver for each flash device it wants to store data on. The vendors would also have to develop drivers for every OS and environment.
What vendors and IT professionals needed was a standard protocol specifically for accessing flash storage systems. The industry responded with nonvolatile memory express (NVMe), a standardized protocol designed specifically for memory-based storage technology.
NVMe streamlines the software I/O stack by reducing the unnecessary overhead introduced by the SCSI stack. It also supports more queues than standard SCSI, increasing queues to 64,000, from the one queue supported by the legacy Advanced Host Controller Interface (AHCI). And since each NVMe queue can support 64,000 commands (up from the 32 commands supported by AHCI in its single queue), it should mean that an NVMe drive is 2x to 3x faster than SAS or SATA connections. Also, since it is an industry standard, NVMe drives from one vendor should interoperate with another vendor's drives.
Flash drive vendors are quickly adopting and implementing NVMe in their drives, while most flash array vendors have either announced or are set to announce NVMe-based versions of their products. The result is the movement of data within the storage system should improve significantly over the next year. For shared storage systems, however, there is still a storage network that needs traversing.
Most major networking vendors, including Brocade and Cisco, have announced support for NVMe over Fabrics, which should be available in both Ethernet and Fibre Channel flavors. This standard will take longer to work its way into the data center, but over the next few years, many data centers will make the transition. The good news is that most products coming to market will support both legacy SCSI-type access and NVMe simultaneously.
For now, most gains in connectivity will come from continuing increases in bandwidth and the more intelligent use of that bandwidth.
Most NVMe products install through the PCIe interface, but there is a faster channel available to storage memory providers: the memory bus itself. While the PCIe bus is a shared bus used for a variety of connections, the only device used in the memory bus is memory. Obviously, the memory bus has primarily been the domain of dynamic RAM (DRAM), but now, flash manufacturers are looking to exploit this high-speed path to the CPU as well. While a flash DIMM is slower than DRAM, it offers a much higher capacity per DIMM and is much less expensive.
Vendors have delivered two forms of flash DIMM technology. In the first form, the flash DIMM looks like a flash drive, and it is used as a high-speed storage device. The DIMM-as-storage option is an ideal place to put very active files like virtual memory paging files.
The other form of flash DIMM technology is to have the flash DIMM act as memory instead of storage. The same advantages apply -- density and cost -- and the disadvantage -- lower performance than DRAM -- is not as significant as you might think. In most designs, the flash DIMM acts as a cache to the DRAM DIMM. New writes are written to DRAM and then are destaged to the large flash area when it needs to be read again.
Flash memory is not the endgame of memory-based storage technology. Remember that DRAM is still faster (especially on write I/O), and it is more durable. But DRAM's volatility is its biggest weakness. The next step in memory evolution is to add persistence to DRAM. Known as nonvolatile memory, there are several technologies competing for the attention of systems manufacturers and IT professionals.
One of those technologies is Intel's 3D XPoint. Intel claims that these devices will have lower latency, higher write performance and better durability for about double the price of flash memory. But Intel is not the only company offering nonvolatile memory products. Companies like Crossbar, Everspin and others are also bringing products to market.
The key payoff for flash as system memory is the potential of deploying twice as much memory per server at about half the cost. That combination is ideal for modern scale-out applications like Cassandra, Couchbase, Spark and Splunk. Most of these environments face the challenge of managing node proliferation, but that proliferation is caused by a shortage of memory, not CPU performance.
Another interesting use for flash DIMM is to prevent servers from ever losing data on a system crash. Think of a server that acts like a laptop. It simply goes to sleep if it loses power, instead of losing data. Then, when you restore power, it picks up where it left off.
For the first time, enterprises have the opportunity to provide more flash performance than most of their applications and users will need. But this is not true of all applications. In addition, as environments become more virtualized and applications continue to scale, this performance surplus will evaporate quickly.
Vendors remain focused on improving performance, but the next step will be harder than just adding flash to our normal system configurations. Keeping pace will require more efficient software as well as the improved internal and external connectivity outlined in this article.
2015 Products of the Year all-flash storage system finalists
Flash-based storage caching vs. tiering