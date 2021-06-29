Storage controllers have limitations when it comes to the number of NVMe drives they can support. This is a PCIe bandwidth issue -- there isn't enough.

This problem exists regardless of whether the NVMe storage controller CPU is Intel, AMD, ARM or custom ASICs.

PCIe bandwidth is dependent on the PCIe generation as seen in the table below.

Most storage controllers today are PCIe 3.0, which has a maximum bandwidth of about 32 GBps. With PCIe 4.0 and PCIe 5.0 storage controllers becoming more prevalent, the bandwidth will double and quadruple. This is very much needed and great news for storage consumers.

However, not all that bandwidth can be assigned to the NVMe SSDs. The front-end network interconnect will consume a lot of it. Bandwidth in that arena is quickly escalating faster than PCIe bandwidth. Ethernet NICs and InfiniBand host channel adapters are rapidly reaching new bandwidth highs, with throughput per port that reaches 400 Gbps (50 GBps). Even Fibre Channel is now available at 64 Gbps (8 GBps). It does not take too many of these NICs or adapters to saturate the PCIe.

How can storage systems increase their total performance? The answer is quite simple per the storage vendors -- add more controllers.

That means the PCIe bandwidth for the back-end interconnect to NVMe drives is going to be limited for any given storage controller regardless of the PCIe generation. To determine the maximum number of NVMe drives is a balancing act between the front end, back end and required throughput. The PCIe 3.0-based NVMe storage controller commonly tops out at 24 drives. That's not necessarily a hard and fast limit. It ultimately depends on total throughput requirements. It is, however, a practical one.

The conventional wisdom in the industry is that as PCIe 4.0 and 5.0 become more prevalent in storage controllers, so should the practical number of NVMe drives supported. That's possible, but there are other issues that suggest differently. NVMe drive throughput is also going to increase. Everything connected to the PCIe is increasing bandwidth consumption commensurately with the PCIe bandwidth.

Then there's the storage software stack efficiency. Software bloat will likely need to be addressed, otherwise the CPU is prone to becoming the performance bottleneck. It is possible, even probable, that the total supported drives per storage controller will not change. If that's the case, then how can storage systems increase their total performance? The answer is quite simple per the storage vendors -- add more controllers.

Scaling the NVMe storage controller The current most common shared storage system is the active-active controller. Two controllers each have access to the other's drives in case one controller fails. Two controllers do not generally increase the number of fully addressable NVMe drives. This is to prevent huge performance degradation when one NVMe storage controller is unavailable. Scaling the storage controller number beyond two requires a bit more creative architecture. Scale out the number of controllers. Block storage scale-out is typically through some type of clustering. Scale-out for file and object storage is generally a global namespace, but not always. All scale-out is either shared everything or shared nothing architectures. Block is more likely to be shared everything, but not always. Shared nothing is more common for file and object. One issue with some scale-out block architectures is diminishing marginal returns. Each new controller adds less performance than the one before until the next one negatively affects performance. This is generally an issue for clustered shared everything architectures. Shared nothing architectures are typically more likely to scale near linearly to much greater numbers. However, all scale-out architectures increase the number of rack units (RU), switch ports, switches, NICs or adapters, cables, transceivers, cable management, conduit, power, cooling, UPS, etc. This will add noticeably more cost to the storage systems.