michelangelus - Fotolia
If you're not an aficionado of contemporary science fiction or theoretical quantum physics, you may not be familiar with the term multiverse. But if your job entails working with the storage of electronic data, you should probably bone up on the concept.
A theoretical multiverse assumes the existence of an infinite number of parallel universes -- that is, universes existing in parallel with our own. A parallel universe may have an alternate timeline to the one we know (for example, a historical event didn't happen or had a different outcome). Or it may have a completely different set of physical laws (for example, gravity may not apply). Or it may exist on a plane orthogonal to what we perceive (so Mr. Spock may have emotions and a goatee).
If only we could harness multiverse physics, then the idea of dimensional compression or "flat-space technology" would become possible. Dimensional compression assumes "bubbles" in the multiverse in which an entire universe can be stored. With it, we could create a storage device with virtually unlimited capacity -- like the bauble containing an entire galaxy in Men in Black, or the wristband in the 2006 sci-fi action romp, Ultraviolet, that the hero used to store an insane amount of weaponry and ammunition.
A dimensionally compressed storage medium could, in essence, store all data forever. Unfortunately, I saw no sessions on the subject at any of last year's storage conferences. While not as mind-boggling as parallel universe theory, discussions of how parallelism can be harnessed to improve workload performance -- especially workloads virtualized under hypervisors -- are actually happening. They are at least as confusing and irritating, however.
Storage: Not guilty
Many consumers of virtual computing have simply accepted, without question, the claims of hypervisor vendors that slow storage is responsible for slow virtual machine (VM) workload performance. The I/O performance of shared electromechanical storage devices (such as HDDs) configured in shareable topologies using bus extending cables and switches (e.g., SANs, NAS, shared arrays) is creating, they claim, a choke point in the I/O path. This causes latency and "back pressure," slowing VMs to a crawl.
Think of a clogged bathroom sink. So many I/Os are queued, awaiting their turn to be written, whether as a function of slow storage device speeds or slow bus extending interconnects, producing the equivalent of a clog of hair and soap scum that blocks the flow of water down the drain. As a result, the sink backs up, and you need to stop the workflow (shaving, washing and so on) until the clog is cleared.
To remedy I/O queuing, hypervisor vendors recommend ripping out and replacing shared storage with internal or direct-attached storage, which they now choose to call converged or hyper-converged storage because it sounds cool. Jumping on the bandwagon, flash storage vendors are, of course, recommending we take the opportunity to replace all aging electromechanical storage with silicon-based nonvolatile flash memory.
"Do these things and your VM performance problems will be solved," the pitchman promises. Only this diagnosis doesn't fit the reality, or the budgets, of our particular universe.
The culprit is ...
A simple check of slow VM systems usually shows I/O queues are either "shallow" or nonexistent. In short, I/Os aren't queuing up, waiting to be written to storage. Given this fact, it's clear that neither the storage interconnect (cables and switches) nor the shared storage platform (SAN, NAS and so on) can reasonably be viewed as the source of latency in the system.
You will also find the system CPU cycling at an above normal rate -- "running hot." This is usually a reflection of some impediment to raw I/O handling at the CPU, such as sequential I/O processing. The workload performance generates I/O, but the chip can't unload it fast enough onto the I/O bus. So, in reality, its multicore chips have made the problem of sequential I/O processing more pronounced.
Instead of just being a north-south problem -- how quickly a single workload can process its I/O onto the bus -- multicore chips have added an east-west problem: several adjacent sequential I/O processing functions running in multiple logical cores that must be handled by a single sequential I/O handler on the CPU.
The parallel view
Flat-space technology isn't required to resolve this problem with workload performance, thank goodness. Instead, we need to take what is sequential and make it parallel so that a lot of sequential I/Os from adjacent logical cores in the multicore chip make it onto the bus efficiently. There's a rub, however: Parallelism means different things to different vendors in the storage multiverse.
To ioFABRIC, for example, parallelism refers to a storage design intended to alleviate latency and scale workload performance by simply spreading operations across a lot of targets -- multiple disk or flash drives, for example. This enables more efficient use of resources. According to ioFABRIC, its software delivers up to 250,000 IOPS from each chip core using its parallelization strategy.
To nonvolatile memory express (NVMe) folks, who want everyone to adopt their strategy for deploying flash storage on a PCIe bus, parallelism refers to the use of many -- up to 64,000 -- parallel pathways from the CPU to expedite I/O delivery to each flash chip in their kit. This should enable much faster transfer of data from the CPU to the NVMe storage than was possible with writes to flash storage mounted as SAS/SATA disk emulation. The benefits of lower latency and faster VM performance haven't been so much practically demonstrated as theoretically expounded, though.
Then there's the adaptive parallel I/O technology from DataCore Software, which delivered more than 5 million IOPS with a response time of 0.28 milliseconds in a Storage Performance Council benchmark in 2016. Its technology works with idle or unused logical cores of the multicore chip itself, using them to create a "parallel I/O processing engine" that services the north-south/east-west output of all the other logical cores with great alacrity.
Are these parallelism techniques mutually exclusive, or can they be used in conjunction to derive the highest order of throughput and the lowest possible latencies? Perhaps in another part of the multiverse, the possibilities have already been fully explored.
Issues with app workloads hinder data storage performance
VM workload balance helps security, performance
Stay on top of cloud workloads