Nonvolatile memory express represents one of the biggest changes to the way storage devices connect to servers...
in years. The flash storage protocol puts storage closer to the CPU, reducing latency and increasing the number of parallel sessions that can be in play to a single storage device. NVMe over Fabrics, meanwhile, aims to do what Fibre Channel did for SCSI by creating a high-speed network for accessing shared storage that retains the benefits of centralization
The question is: How will enterprises adopt NVMe flash storage, and how can we expect it to work with existing technology? In other words, how much will the adoption of NVMe technology rock the enterprise boat? Let's find out.
NVMe technology primer
NVMe is a protocol, like SAS or SATA, that defines how a server's processor communicates with persistent storage. Traditional SAS and SATA hard drives and solid-state drives connect to servers either through internal controllers or external host bus adaptors (HBAs) that attach to the PCIe bus. NVMe devices connect directly to the PCIe bus, putting them physically closer to the processor and eliminating some hardware overhead. At the same time, NVMe flash storage significantly simplifies the I/O software stack, reducing the impact of many layers of indirection on storage I/O performance.
The tech industry developed the NVMe protocol specifically for SSDs, which are capable of handling many simultaneous I/O requests. Managing multiple traffic sources was always a performance issue for HDDs. The mechanical nature of a hard drive with a single read/write head means it can service only one read or write request at any time. As an example of the difference in parallelization achieved, NVMe can process 65,535 I/O queues each with up to 65,535 requests compared with SATA's single queue depth of 32 and around 254 for SAS, depending on the implementation.
Generally, NVMe storage is used in NAND flash devices. There are also NVMe drives on the market using 3D XPoint technology, which Intel and Micron developed jointly. 3D XPoint has lower latency and greater resilience than NAND. Intel brands its products under the name Optane. Micron has yet to release any 3D XPoint products.
NVMe devices deliver about 10 times the performance for IOPS and bandwidth as SAS and SATA devices with about a 20% reduction in latency. Optane can reduce latency to as low as a 10th of what's seen with flash.
Let's look at how NVMe's low latency, high throughput and highly parallel I/O will help the enterprise.
We'll start with NVMe's impact on hyper-converged infrastructure (HCI) systems. One benefit of hyper-convergence is the colocation of storage in the server. I/O latency is lower than what's experienced with shared storage that has to traverse either a Fibre Channel (FC) or Ethernet network. This localization means applications should run faster, particularly those with a bias toward read I/O.
NVMe technology brings an immediate performance improvement, as HCI systems can benefit from the lower latency and higher bandwidth. One caveat with NVMe is how write I/O is managed. HCI systems protect data by writing updates to multiple nodes -- usually at least the local node and one other, depending on the protection scheme. NVMe-based HCI systems require a fast, low-latency network node interconnect to achieve the benefits of NVMe.
We can expect a significant efficiency benefit from using NVMe with HCI, though the amount of performance improvement will depend on the way in which the HCI software stack is implemented. NVMe flash storage performance is so good that inefficiencies in the I/O data path, such as how data services are implemented, will negate any benefit from using faster flash media. HCI vendors are therefore under pressure to optimize their internal storage stacks. Native hyper-converged storage products, such as VMware vSAN and Scale Computing Reliable Independent Block Engine, will likely see the most benefits because they deliver storage out of the hypervisor kernel. Products that that manage storage within a virtual machine (VM) will see less benefit.
Datrium is a startup developing a slightly different HCI architecture it bills as open convergence. In this model, servers and storage are physically separated while each server has a local high-speed read cache. The architecture delivers application reads from cache, with write I/O written through cache to the shared storage.
One benefit of this design is the resiliency of the architecture. If a host running VMs fails, the work can be brought up on another server without any reprotection of data. NVMe flash drives could be used tactically in the host to deliver fast I/O, with cheaper SAS and SATA devices comprising a capacity layer.
Because Datrium compute servers deploy a cache rather than store data in the host, the move to Optane should be easy. That will provide an instant uptick in VM performance from the reduced latency.
Array and appliance vendors can benefit from deploying NVMe in their products. Today, storage arrays use SAS for back-end device connectivity, having migrated from Fibre Channel Arbitrated Loop years ago. The next stage is to use NVMe as the back-end protocol. Vendors are already touting NVMe-ready arrays that use NVMe SSDs in place of SAS and SATA storage devices. While NVMe will definitely increase performance, you must ask by how much and whether it's worth the price increase.
Let's look for a moment at the standard architecture of all-flash arrays. Most are either dual-controller or of limited scale-out design. All I/O is funneled through these controllers to implement features such as deduplication, compression and snapshots. Even the fastest Intel Xeon processors can't fully drive NVMe storage. Where HDDs and SSDs used to be the system bottleneck, NVMe technology is so fast that even two or three drives will max out a single processor.
The risk for storage array vendors in refitting existing architectures to be NVMe-capable is not fully exploiting the bandwidth NVMe offers. NVMe drives are sold at a premium compared with SATA and SAS, so every drop of performance needs to be used up to make them cost competitive.
The degree to which vendors can effectively use NVMe in existing all-flash arrays depends on the platform architecture. We expect that vendors are working hard to optimize code paths and derive the most benefit from NVMe. In addition, arrays that have multiple layers of internal caching may see quick wins simply by upgrading these caches to NVMe devices. The same logic applies to adding a small amount of Optane storage to an array. We've already seen HPE 3PAR arrays and Western Digital's Tegile systems take advantage of this.
Bespoke NVMe arrays
What about flash arrays specifically built to use NVMe technology? There are few of these around so far. Vexata's VX platform is an example of a new array architecture built for NVMe. It can be integrated into existing storage environments that use FC. Performance is quoted to be as low as 200 microseconds (µs) with NAND flash and 40 µs with Intel Optane. Vexata can also support NVMe over Fabrics (NVMe-oF).
NVMe over Fabrics
One NVMe approach under development by vendors -- typically startups -- is to disaggregate components of shared storage architectures to remove the bottleneck the storage controller imposes. The client host talks directly to the NVMe flash storage devices across a high-speed network rather than funneling the I/O through one or more controllers.
In this design, clients connect to storage using NVMe-oF, a spec that can use FC, Ethernet or Infiniband as the physical network. NVMe over FC uses existing Fibre Channel hardware -- HBAs and switches -- although it requires a minimum hardware level. NVMe can also be implemented over Remote Direct Memory Access through custom adaptor cards that implement the RDMA over Converged Ethernet standard. For Infiniband, RDMA is implemented with iWARP, or Internet Wide-Area RDMA Protocol.
Directly connecting hosts and storage without going through centralized controllers will provide more scalability than traditional architectures. We're already starting to see products from newcomers such as E8 Storage and Exelero.
The only disadvantage to disaggregation is services previously delivered by the storage array controllers are now pushed to the client. Each client needs additional software to implement compression and other features. This consumes some host resources, but clearly there's a trade-off by easing the bottleneck of implementing these services in a shared controller.
Vastly improved storage performance is great, but how does this help the enterprise? Most applications are constrained either by storage throughput or latency. Structured databases, for instance, are generally latency-dependent, so reducing the waiting time for the application can result in higher transaction throughput and host utilization. Where database licenses are based on socket count, optimizing the use of every processor through NVMe flash storage adoption can be a cost savings or avoidance. If you've already bought the sockets and licenses, you can put in NVMe drives and increase utilization, cutting back on the need to buy more sockets and licenses.
NVMe fast facts
- NVMe drives deliver low latency and high throughput while handling many more concurrent requests. The challenge for vendors is implementing NVMe in products that can exploit the technology's full potential.
- This next level of performance and latency improvements was last seen with the move to all-flash. Initially, NVMe and all-flash will see a cost disparity that will erode over time.
- NVMe introduces and demands new architectures. Vendors are using techniques, such as disaggregation, to remove the bottlenecks seen with traditional architectures.
- Disaggregated storage requires new skills and new technologies. So it will be implemented for applications that will benefit most from ultra-low latency and high performance.
- NVMe offers additional benefits, including improved processor utilization and optimized applications based on per-socket licenses.
The same logic applies to other I/O-dependent applications such as analytics, including AI and machine learning, where lower latency results in shorter execution times. For HCI, NVMe may let end users run more VMs per server, so densities could increase significantly.
NVMe technology adoption
Looking at how NVMe flash storage could be adopted and then upgrading to an NVMe array or HCI is an easy choice that delivers quick wins for enterprises because of the immediate improvement in performance. Meanwhile, bespoke NVMe arrays, such as Vexata, could offer even better performance without the need to rip and replace existing infrastructure because they can use existing technology, such as FC, for host connectivity.
Finally, the disaggregated architecture approach to NVMe technology adoption will take time to implement compared with traditional networking like FC SANs or FC-enabled NVMe SANs. Storage administrators must learn new skills on technology, such as Infiniband, which enterprises aren't widely using. Obviously, disaggregated offerings could have a bigger impact on application design and will be targeted initially at only applications requiring high performance, such as financial trading applications where the lowest latency is critical.
We can see a parallel in NVMe adoption with the early days of all-flash, when the cost of all-flash arrays precluded their deployment across all application uses. The same scenario is likely to play out with NVMe in the enterprise. Over time, NVMe-based storage will become the default form of connectivity. This may take a few years to evolve, but it will happen.