The NVMe interface is rapidly becoming the preferred interconnect for flash disks and all-flash arrays because of its superior performance and lower system overhead. Indeed, even as IDC recently warned of a sales slowdown for external storage in the EMEA region, it noted that the AFA market was red hot -- up 20% annually to 35% market share. Storage buyers are clearly ready to spend precious capital on better performing NVMe devices rather than SSDs using legacy storage interfaces.
NVMe has begun displacing SSDs in servers. However, with organizations using clusters of VMs and container servers for most workloads, it hampers workload portability if storage is locked down to a particular system. Fortunately, NVMe-oF has finally made it through the specifications process and NVMe-oF products are starting to provide a viable option for NVMe-based shared storage.
A fundamental feature of NVMe-oF is binding support, namely the ability to operate over various underlying transport fabrics including Fibre Channel, InfiniBand and Ethernet. Protocol bindings serve as the connection between NVMe and the network transport, but due to the technical differences of various network protocols, binding also places restrictions on NVMe capabilities and defines how NVMe is managed using the underlying fabric.
We gave advice on evaluating NVMe-oF options in this article. Although the specification for NVMe-oF over TCP lagged the other bindings, it was finally ratified in late 2018 and ushered in an era in which NVMe-oF hosts and controllers can communicate over any standard IP network. While the TCP specification focuses on software implementations using the TCP stacks on host OSes and AFAs, it doesn't preclude hardware-accelerated implementations.
This article is part of
As a young technology, the ecosystem of NVMe-oF products is rapidly evolving. However, it has now reached a degree of maturity, making NVMe-oF devices suitable for production workloads. The following is a look at some of NVMe-oF product categories, devices, software and supported systems.
The state of NVMe integration testing
Each week, the tech news wires are full of NVMe product announcements, many of which won't be released for months. The hype makes it hard to keep track of the market, much less figure out which products are compatible with one another.
Fortunately for technology buyers, the University of New Hampshire InterOperability Laboratory (UNH-IOL) has long served as a clearinghouse for reliable information on network devices and their support of various network standards. UNH-IOL has been testing NVMe-oF products for two years and its integrators list is an excellent resource for finding standards-compliant products.
The list includes several product types, such as:
- NVMe host platforms
- NVMe drives
- NVMe switches
- NVMe management interfaces
- NVMe-oF hardware and software targets: Fibre Channel, TCP and remote direct access memory (RDMA) over Converged Ethernet, or RoCE
- NVMe-oF initiators
- NVMe-oF switches
Compliance testing is a tedious process that's dependent upon vendors submitting products they believe will pass. Hence, it necessarily lags product introductions, particularly in a market as dynamic as NVMe.
The following is a sample of some of the storage systems supporting NVMe-oF:
- Apeiron Data has a proprietary alternative to NVMe-oF it calls NVMe over Ethernet (NoE). It delivers NoE via a 2U appliance supporting as much as 24 2.5-inch drives and 32 40-GbE ports. The design is reminiscent of the ATA-over-Ethernet systems built by Coraid, but with an NVMe performance boost.
- E8 Storage offers 1U and 2U appliances with as many as eight 100-GbE ports and 68 to 136 TB of usable capacity with drivers for the following Linux systems: RHEL 7.1 and above or CentOS 6.7 and above; Ubuntu 14 and above; SUSE Linux Enterprise 12 and above; and Debian 8.6 and above. E8 also offers a software product for servers with NVMe drives and compatible network interface cards (NICs) that provides similar capabilities to its hardware by aggregating capacity into volumes available over an NVMe fabric.
- Excelero's NVMe-based storage software platform pools NVMe capacity on standard servers, regardless of the local or network file system used. It creates distributed block volumes over any network fabric and protocol. For example, a capacity-optimized platform with 24 NVMe drives provides almost 1 million IOPS at less than 200 microseconds latency and can scale to millions of IOPS in a single-server rack.
- Pavilion Data has a 4U array providing 14 TB to 1 petabyte of capacity, using 2.5-inch NVMe drives with up to 20 million IOPS and 20-microsecond latency. It can scale from one to 40 100 GbE network ports and two to 20 active controllers. The array connects via RDMA-capable Ethernet, but also supports NVMe-over-TCP for clients that don't support RDMA. Pavilion has drivers for RHEL, CentOS 7.4 or greater and Ubuntu 16 or greater.
- Pure Storage Flash Array//X with Purity software offers from 55 TB (3U) to 3 PB (6U). The Purity DirectFlash software supports RoCE and provides a range of storage services like data compression, deduplication, high availability, snapshots, data encryption and Windows file services.
- Vast Data offers both software and hardware products using NVMe-oF. They provide a scale-out, transactional file system that can scale to server clusters of up to 10,000 systems. Vast has developed what it calls a Disaggregated, Shared-Everything architecture with three distinctive elements:
1. Separate control and data plane hardware with x86 servers running the storage software and data spread across NVMe-oF disk enclosures.
2. A high-speed Ethernet or InfiniBand network fabric connecting compute and storage nodes. Compute nodes are typically housed in 2U quad-server enclosures with four 100-GbE NICs, while the storage nodes are also a 2U chassis with 44 x 15.36 TB quad-level cell flash disks and 12 x 1.5 TB U.2 XPoint devices.
3. Distributing metadata across the storage nodes on dedicated fast Intel Optane 3D XPoint memory in the NVMe-oF enclosures. Such a distributed metadata design enables the compute nodes to be stateless and use non-redundant, lower-cost systems.
OS and application support
As the UNH-IOL test data indicates, most major Linux OSes, including RHEL, CoreOS, SUSE and Ubuntu, provide NVMe support, as do all recent versions of Windows Server (2012, 2016) and the Microsoft Windows 10 client.
NVMe-oF is a critical foundation for composable infrastructure that enables physical hardware components -- servers, storage capacity and network interfaces -- to be logically carved up into virtual instances that are allocated to particular VMs, container clusters and application servers. Aside from being the foundation for next-generation cloud data centers, NVMe-oF products are particularly attractive for analytics, AI training and high-performance computing applications with high levels of storage IO that can take advantage of RDMA to significantly reduce IO overhead on compute servers.