For many years, SAS and SATA HDDs have been the data center's media of choice. The introduction of NAND flash has...
pushed SAS and SATA devices to their performance limits due to restrictions of the storage protocol. To fully exploit flash, the storage industry created a new protocol -- nonvolatile memory express. This architecture lets storage array vendors and end users unlock the performance potential of both flash and new solid-state media, delivering better performance from NVMe SSDs and other storage devices.
Storage device protocols have evolved in both server and consumer-based devices. Advanced Technology Attachment (ATA) developed from PC architecture, which became PATA (Parallel ATA) and then SATA (Serial ATA) that we know today. SCSI was developed in the late 1970s as a server-based connection for HDDs and other storage devices like tape. All of this evolved into Serial-Attached SCSI, or SAS.
SCSI is the underlying storage protocol for Fibre Channel-based SANs and iSCSI Ethernet networks. (We'll discuss the impact of NVMe on storage networks later.) Even today, both SAS and SATA protocols need a host bus adapter or chipset controller to operate. This holds true for Fibre Channel and iSCSI as well.
SAS and SATA derive from a time when physical storage media was slow compared with processors and system memory. Hard drive access times were measured in milliseconds compared with nanoseconds for memory -- some six orders of magnitude, or 1 million times, faster. NAND flash SAS and SSD devices fall somewhere in-between at around 100 microseconds, still many orders of magnitude faster than traditional hard drives.
With this level of performance improvement in the transition from HDD to SSD, cracks in the SAS and SATA protocols started to appear. Both offer only single queues for accessing I/O, with SATA having a relatively small queue depth of 32. SAS is better at around 254, but having only one queue of I/O to process doesn't fully exploit the parallel capabilities of writing to NAND flash media.
What is NVMe?
NVMe is a new protocol that addresses many of the shortcomings of SAS and SATA. This includes a more direct connection to the processor, optimized I/O channels and a simplified software stack. NVMe devices are designed to connect to the PCIe root complex, putting them closer to the processor, sitting on what used to be called the Northbridge. This reduces latency, and introduces new ways to talk to remotely connected devices.
Drive manufacturers have implemented NVMe in a range of form factors, including standard add-in cards that plug into PCIe motherboard slots, 2.5-inch drives that require an NVMe interface adaptor and smaller M.2 devices that look similar to DIMMs in size. The specific choice of device can be dictated by a combination of capacity, performance and environmental requirements such as power and space. Typically, M.2 and add-in card devices aren't hot-swappable.
- NVMe offers significantly more efficient storage I/O compared with traditional SAS and SATA. Without some change in shared architecture, however, NVMe drive IOPS will be wasted, and customers will overpay. Disaggregation and abandoning traditional architectures are two ways to fully use the capabilities of NVMe SSDs.
- Traditional storage networking is set to embrace NVMe, with NVMe over Fabrics on Fibre Channel and Ethernet offering customers a degree of investment protection.
- It's all about latency and throughput. Moore's Law drives processors and system memory ever faster. Storage must keep up. NVMe will help storage do just that.
- The most interesting NVMe development probably will be in how the nonvolatile-DIMM form factor brings persistent storage onto the memory bus. It could be time for some new software paradigms.
NVMe introduces parallelism through the use of multiple I/O queues -- up to 64,000 -- and greater queue depths -- also up to 64,000. Flash storage is capable of processing many requests in parallel, because devices are built from multiple NAND chips, each of which contains many individual silicon dies. It therefore makes sense to process requests in parallel to get the best bandwidth.
Simplification of the I/O stack includes new signaling methods used to indicate when I/O requests are ready to process. This includes the doorbell concept, where the NVMe device signals an I/O completion to the host rather than the host having to continually check the status. This process translates to savings in the CPU overhead on the server and less time in software to process each I/O.
New storage architectures
The adoption of NVMe SSDs leads to a range of new storage architectures, some of which build on what we have today. Server vendors, such as Dell EMC and Hewlett Packard Enterprise, already support NVMe drives in traditional PCIe slots or the 2.5-inch form factor through adaptor cards. This approach makes it possible to build NVMe into a stand-alone server application or as part of a hyper-converged infrastructure (HCI). Most operating systems already support NVMe, so there are little or no issues with drivers or compatibility.
Platforms such as VMware vSphere also support NVMe. We have seen significant performance improvements in HCI from companies such as Scale Computing and X-IO Technologies with its Axellio edge computing systems.
What about NVMe in a traditional storage array? Some vendors have announced products as being NVMe-ready and capable of accepting NVMe SSDs. Physically, this simply means they support NVMe connectivity and offer the same functionality that was available with SAS and SATA, such as being able to hot swap failed drives or dynamically add capacity.
There's a potential problem with traditional architectures that results in the benefits of NVMe not being fully exploited. A typical, single NVMe drive can deliver 300,000 or more random read IOPS and 40,000 to 50,000 write IOPS with up to 3 gigabytes per second read and 1 GBps write throughput. That's at least an order of magnitude greater than what's possible with SAS and SATA solid-state drives. Put a dozen of these devices into a server, and even the latest Xeon processors would be unable to drive these devices to anywhere near their capability. Furthermore, front-end connectivity with Fibre Channel, for example, wouldn't be able to deliver the throughput required to keep these drives busy.
This represents a problem for vendors upgrading existing products to use NVMe. There's an increase in performance because the limitations on SAS and SATA throughput have been eliminated. However, the controller-based architecture becomes the next bottleneck, and solving that means using scale-out or new types of architecture.
Vendors, particularly startups, are separating the data and control paths of a traditional array. This disaggregation approach allows host servers to write directly to NVMe drives across high-speed networks.
Rather than use traditional Fibre Channel, vendors are implementing storage networks using Converged Ethernet and InfiniBand. Technologies such as Remote Direct Memory Access over Converged Ethernet and Internet Wide-area RDMA Protocol allow host servers to talk directly to an NVMe drive without having to go through a traditional storage controller.
Companies, including E8 Storage and Apeiron Data Systems, have built new storage products that package NVMe SSDs into a storage enclosure that offers features such as hot-swappable drives and centralized monitoring and management. The data path between the enclosure and the host flows over Ethernet -- usually 40 and 100 Gigabit Ethernet -- with no intermediate controller.
To support this model, host servers must have RDMA or InfiniBand capable network cards and run vendor-provided client software and drivers. The need for drivers is obvious. The additional client software is required to manage the metadata defining LUNs and volumes mapped to each host. The functions that the controller in an array would perform are dispersed to the client, resulting in some overhead, but with significant I/O performance improvement.
Excelero has taken a different route, building out a software-based offering that can be used for hyper-convergence. NVMesh is software that permits multiple servers to be connected together via either Ethernet or InfiniBand and have any server write to any NVMe device. Excelero uses its proprietary Remote Direct Drive Access, or RDDA, protocol, claiming linear scaling and near-100% utilization of the performance of each NVMe drive. The vendor achieves these levels of claimed performance because remote hosts can talk to the drives in another server directly as if the drive was on the local PCIe bus, bypassing the CPU where the drive is installed.
Classic storage area networking
What about the future of traditional networking like Fibre Channel? We've already discussed the use of NVMe over Fabrics with fast networks, such as InfiniBand and Ethernet. The NVM Express workgroup is developing a standard for using NVMe with existing Fibre Channel networks, known as FC-NVMe. Products based on the new standard are expected to work with existing hardware, albeit with a limit on how far back it will support legacy host bus adapters.
What this means is FC-NVMe should support existing technology, with performance gains achieved from the use of NVMe rather than the SCSI protocol. There should be no need to rip and replace the latest Fibre Channel technology.
Identifying use cases
What are the likely use cases for the technology, given an expected cost premium for using NVMe? Looking back to the early days of SSDs and all-flash storage, there's a parallel with the introduction of NVMe. Solid-state drives were initially expensive and used as server cache and for specific applications where latency was a problem. This opportunity extended to all-flash systems that were used to fix applications deemed too expensive to rewrite.
The deployment of NVMe will no doubt follow the same path. NVMe within the server will deliver low-latency flash to improve application performance. Compared to the move from HDD to SSD, however, NVMe flash reduces the application overhead, so it may also provide much better resource utilization than previously.
At the array level, NVMe will improve latency and throughput. Disaggregation will offer low latency, but without the mature set of features, such as deduplication and compression, delivered on all-flash arrays. This may result in slower NVMe adoption, because many all-flash systems were sold on a dollar-per-gigabyte ratio that exploited the savings of deduplication.
Today, we're seeing application NVMe use cases such as high-frequency financial trading, high-performance analytics and other latency-sensitive applications. NVMe also will be a good fit for traditional applications like relational databases where individual I/O response time is critical.
Looking ahead, we're starting to see new classes of persistent storage, such as 3D XPoint, also known as Intel Optane. This technology offers lower latencies than flash, with higher endurance that comes with a higher-dollar-per-gigabyte cost.
We're starting to see a hierarchy of NVMe-enabled persistent memory devices, with a range of characteristics. The future of distributed systems and shared storage will likely be a blend of technologies that exploit the features of each of these types of media.
Should you move to NVMe? Well, that depends on your requirements. NVMe SSDs and Optane drives provide another tool in the storage administrator's armory and more price, performance and endurance options. If you have the need, then NVMe could well be the route for you.