cycreation - Fotolia

Parallel I/O technology boosts data storage performance

DataCore Software's use of parallel I/O has caught the data storage industry's attention as a low-cost option for fast performance with existing hardware.

The most recent Storage Performance Council SPC-1 benchmark results from DataCore Software have helped pique interest in parallel I/O technology. Using its software with a generic server-storage kit comprising a multicore Lenovo server and some commodity solid-state drives and hard drives, DataCore achieved I/O speeds rivaling the most expensive high-performance storage arrays on the market at a fraction of the cost.

Parallel I/O technology is not new. It was part of development efforts around multiprocessor system designs from the 1970s until the mid-1990s. That effort fell on hard times when Intel and others introduced Unicore chips with high clock rates that doubled approximately every two years. That rate of uniprocessor chip capacity and speed improvement squashed the demand for multiprocessor and parallel computing designs.

When continuous clock speed improvements dropped off, the chip industry embraced multicore chip design -- a single chip die with many physical CPU cores. That once again created a multiprocessor environment that parallel computing architecture could exploit. But most applications had been designed to leverage Unicore processing via a sequential program execution model, so parallelism wasn't fully exploited.

Hyper-threading offers an opportunity

Intel added hyper-threading to its chips to facilitate multitasking and multi-tenant or hypervisor-based computing at the chip level. Threads are logical constructs that leverage shared resources across the cores on the die, enabling an abstraction called a logical core. Even with multiple logical cores per physical core on a chip, these resources were not exploited very efficiently by applications, operating systems or hypervisors.

Leveraging multicore chip capabilities, DataCore Software engineers introduced a workable form of parallel I/O. Simply put, parallel I/O uses a portion of the logical cores in a multicore chip and dedicates them to processing the I/O emanating from all the applications and virtual machines (VMs) serviced by the other logical cores on the chip. Parallel I/O technology establishes a highly efficient engine to handle multiple concurrent read and write operations between the logical core business application workloads and the back-end storage resources. The results from DataCore's SPC-1 benchmark test bolster its promise as a low-cost storage performance enhancer, producing 459K IOPS from low-cost commodity storage delivered for approximately eight cents per SPC-1 IOPS.

Parallel I/O technology establishes a highly efficient engine to handle multiple concurrent read and write operations between the logical core business application workloads and the back-end storage resources.

The results have drawn interest from data storage manufacturers exploring ways to create their own parallel I/O products, especially given DataCore's claims that its preliminary results as certified by SPC were only the first step in the development of continuously massive acceleration of both I/O throughput and reduced cost per I/O.

Parallel I/O works with multicore processing systems, which means it will probably work with most currently deployed servers. DataCore's software was designed to be installed directly on an application server or hypervisor host where it can operate in conjunction with the server's applications and VMs.

How DataCore does parallel I/O

DataCore's parallel I/O can be set up to adaptively use available cores, or a user can designate a portion of the logical cores for I/O processing to create a parallel I/O processing resource on the server. The number of logical cores provided to this resource can be adjusted to ensure a proper balance is struck between processor cores available for discreet application processing and general I/O servicing. In the future, the service could be even more granular, allowing parallel I/O resources to be preferentially allocated to specific workloads to provide quality of service guarantees.

Behind the parallel I/O engine, the data storage infrastructure itself remains pretty much the same. However, DataCore optimizes caches, interconnects and storage media through its own form of storage virtualization, familiar to anyone who has used the vendor's SANsymphony product. The technology works with legacy storage infrastructure and contemporary software-defined storage stacks and products, or with a combination of both. There's no hardware lock-in or rip-and-replace requirement with DataCore's implementation.

I/O acceleration alternatives may not suffice

Other vendors are beginning to seize on the terminology of parallel I/O, using it to describe any number of ways to accelerate I/O. However, most of these approaches are either spoofs or grounded in expensive and proprietary hardware configurations. One popular technique, for example, involves buffering reads and writes to DRAM or flash memory, thereby spoofing applications into believing their data has been written to target storage devices before writes happen. That strategy may result in a bump in speed that translates into perceived application performance improvement, but it's rarely a cost-effective or reliable solution to the I/O throughput challenge. In virtual server environments, that approach will likely run afoul of the I/O blender effect.

One pixel Howard Marks, chief scientist at, discusses the I/O blender

That has led some firms to introduce technologies such as log restructuring, to re-sort data from different VMs into a more coherent and less random form prior to writing the cached data to a storage device. That not only increases the delayed write of data to its target but introduces a potential weak link in storage reliability that may require a significant investment in solid-state storage to serve as the temporary repository for data while it is restructured.

Some storage vendors try to address the speeds-and-feeds problem by dedicating explicit storage resources to specific workloads using technologies like VVOLs. The complexity of infrastructure configuration, and the challenges involved in managing infrastructure, increase labor costs and potentially the downtime associated with such strategies. DataCore has announced support and certification for its universal VVOLs so users can get parallel I/O performance and still drive their storage policies directly from their VMware vSphere environment.

The software approach used by DataCore is readily implemented, and easily adjusted and maintained, requiring little if any additional infrastructure or personnel. If it achieves market success, we're likely to see other vendors leveraging variants of parallel I/O technology in competing products.

Next Steps

Is parallel I/O technology a good fit with applications?

Flash can fix random I/O challenges

How virtualization causes I/O bottlenecks

Dig Deeper on Storage optimization