Work around NVMe storage controller drive limitations

Innovation in the storage and memory market helps with the challenge of NVMe drive limits. Be sure to make the right comparisons and scale out effectively.

Storage controllers have limitations when it comes to the number of NVMe drives they can support. This is a PCIe bandwidth issue -- there isn't enough.

This problem exists regardless of whether the NVMe storage controller CPU is Intel, AMD, ARM or custom ASICs.

PCIe bandwidth is dependent on the PCIe generation as seen in the table below.

PCIe evolution chart

Most storage controllers today are PCIe 3.0, which has a maximum bandwidth of about 32 GBps. With PCIe 4.0 and PCIe 5.0 storage controllers becoming more prevalent, the bandwidth will double and quadruple. This is very much needed and great news for storage consumers.

However, not all that bandwidth can be assigned to the NVMe SSDs. The front-end network interconnect will consume a lot of it. Bandwidth in that arena is quickly escalating faster than PCIe bandwidth. Ethernet NICs and InfiniBand host channel adapters are rapidly reaching new bandwidth highs, with throughput per port that reaches 400 Gbps (50 GBps). Even Fibre Channel is now available at 64 Gbps (8 GBps). It does not take too many of these NICs or adapters to saturate the PCIe.

How can storage systems increase their total performance? The answer is quite simple per the storage vendors -- add more controllers.

That means the PCIe bandwidth for the back-end interconnect to NVMe drives is going to be limited for any given storage controller regardless of the PCIe generation. To determine the maximum number of NVMe drives is a balancing act between the front end, back end and required throughput. The PCIe 3.0-based NVMe storage controller commonly tops out at 24 drives. That's not necessarily a hard and fast limit. It ultimately depends on total throughput requirements. It is, however, a practical one.

The conventional wisdom in the industry is that as PCIe 4.0 and 5.0 become more prevalent in storage controllers, so should the practical number of NVMe drives supported. That's possible, but there are other issues that suggest differently. NVMe drive throughput is also going to increase. Everything connected to the PCIe is increasing bandwidth consumption commensurately with the PCIe bandwidth.

Then there's the storage software stack efficiency. Software bloat will likely need to be addressed, otherwise the CPU is prone to becoming the performance bottleneck. It is possible, even probable, that the total supported drives per storage controller will not change. If that's the case, then how can storage systems increase their total performance? The answer is quite simple per the storage vendors -- add more controllers.

Scaling the NVMe storage controller

The current most common shared storage system is the active-active controller. Two controllers each have access to the other's drives in case one controller fails. Two controllers do not generally increase the number of fully addressable NVMe drives. This is to prevent huge performance degradation when one NVMe storage controller is unavailable. Scaling the storage controller number beyond two requires a bit more creative architecture.

Scale out the number of controllers. Block storage scale-out is typically through some type of clustering. Scale-out for file and object storage is generally a global namespace, but not always. All scale-out is either shared everything or shared nothing architectures. Block is more likely to be shared everything, but not always. Shared nothing is more common for file and object.

One issue with some scale-out block architectures is diminishing marginal returns. Each new controller adds less performance than the one before until the next one negatively affects performance. This is generally an issue for clustered shared everything architectures. Shared nothing architectures are typically more likely to scale near linearly to much greater numbers. However, all scale-out architectures increase the number of rack units (RU), switch ports, switches, NICs or adapters, cables, transceivers, cable management, conduit, power, cooling, UPS, etc. This will add noticeably more cost to the storage systems.

Products help solve NVMe drive limits

There is some good news. There has been meaningful innovation by several vendors over the past few years specifically aimed at solving the NVMe drive limitations. This article highlights four of them. But by no means are they the only four.

Pavilion Data Systems' HyperParallel Flash Array, powered by Pavilion HyperOS. Pavilion cleverly figured out how to fit 20 storage controllers and 72 NVMe drives in a single 4 RU chassis. All controllers have access to all drives via an internal PCIe switch. This appreciably reduces consumed rack space, power, cooling, UPS, etc., as it scales. They support block, file and object storage and can scale out each linearly, simply by adding additional chassis. The product clusters block while utilizing a file global namespace and an object global namespace. This is a clever mix of shared everything and shared nothing architectures.

StorOne S1 storage systems. StorOne is currently only a scale-up active-active architecture uniquely focused on the storage software stack and services efficiencies. It supports block, file and object storage. Their more efficient software frees up CPU for greater performance to squeeze maximum performance from the NVMe and SAS/SATA drives attached. Although this does not increase the number of addressable NVMe drives, it decreases latency and increases IOPS and throughput of those drives within the storage system.

Fungible. This product is block only. What makes Fungible innovative is the complete separation of control from data path. Storage services logic and metadata are separated from storage I/O in out-of-data-path separate controllers. All the storage I/O controllers are in the data path and share nothing with each other. The out-of-data-path controllers handle routing, snapshots, replication and management in general.

Vast Data Universal Storage. Universal Storage is file and object storage that linearly scales into the exabyte range. It is a disaggregated shared everything architecture. Vast Data also innovatively separated the storage logic control and storage I/O controllers. The storage logic controllers have a global namespace and is what the application servers connect to. A Vast Data storage logic controller can be an appliance or a container in a server. Each of the storage logic controllers then access QLC SSD capacity storage I/O controllers via NVMe-oF Ethernet. Each storage I/O controller is front-end cached with 18 TB of NVMe storage class memory. Every one of the logic controllers has access to every one of the capacity storage I/O controllers.

These are not the only innovative, scalable NVMe storage systems on the market. Startups such as Excelero (block), WekaIO (file) and Rozo (file and object) all have clever innovative architectures to deal with the NVMe drive storage controller limitations. Even the major traditional storage players, including Dell, HPE, IBM, NetApp and Vantara, have comparatively innovative scale-out storage products that deal with NVMe storage controller drive limits.

Comparing these system architectures is difficult. Their architectures are how they solve storage problems, such as NVMe storage controller drive limitations. Don't compare the architectures; compare the results as to how they meet the organization's requirements. Make sure the comparisons are apples to apples on the requirements, not how they meet them.

It is best to determine required performance and capacity over the projected life of the storage system. Then measure how each can meet those requirements and their total cost for doing so.

In the interest of full transparency, none of these vendors are current clients of this author. However, most of them have been at one time or another over the past 23 years.

Dig Deeper on Flash memory and storage

Disaster Recovery
Data Backup
Data Center
Sustainability and ESG
Close