itestro - Fotolia

Primary Data primes the pump for data virtualization software

Startup Primary Data prepares GA launch of DataSphere data virtualization software, which is designed to make data available any time on any storage.

Startup Primary Data today filled in details about its DataSphere data virtualization software, which the vendor plans to make generally available in a few months.

Primary Data CEO Lance Smith said DataSphere will manage data across storage from different vendors, regardless of its location, protocol and media type. He said the startup will demonstrate DataSphere at VMworld next week, showing how it can manage data across EMC Isilon, NetApp FAS, Intel NVMe and Amazon Simple Storage Service (S3) cloud storage under a global dataspace.

"DataSphere has a metadata engine that produces a layer of expansion between an application's logical view of data and where it's physically placed," Smith said. "It puts data where and when it's needed based on an application's demand."

He said approximately 50 customers have been testing DataSphere data virtualization software, mostly for testing and development. He said it will be ready for primary storage when it becomes generally available in late fall.

Smith said a long testing period is necessary because DataSphere is looking to manage large organizations' critical data.

"We're going after nothing but the enterprise," he said. "We're not talking to SMBs [small and medium-sized businesses]."

He listed media and entertainment, health services, and oil and gas exploration as verticals with the most interest in DataSphere data virtualization software.

One pixel How virtualization affects data storage

DataSphere can run on any tier of storage, including flash or hard drives in storage arrays or servers. It even works with storage in the cloud, Smith said. It can manage file, block or object data. DataSphere will support Amazon S3 and Glacier, as well as OpenStack Swift at release, with plans to add Microsoft Azure and Google clouds.

DataSphere operates out of band, which Primary Data claims avoids performance impact and scales linearly.

"We split the control path from the data path," Smith said. "When a file is accessed from an application, we go to our metadata engine to find the location and provide that to the client. And the client accesses the file natively through any type of storage."

Customers set policies based on service-level agreements to move and manage data across the tiers. The software's analytics track storage utilization and performance.

Primary use cases include improving application performance, migrating data across tiers, load balancing, archiving and disaster recovery. Smith said DataSphere can provide better visibility for application owners, as well as storage administrators, and help troubleshoot problems.

Primary Data will sell DataSphere software as a physical or virtual appliance, with a subscription based on the number of files managed. Each appliance -- usually sold in pairs for high availability -- will have a subscription for 10 million files or 1 billion files, starting at $25,000 per year.

Smith said the hardware appliance usually provides better performance, but the virtual appliance allows customers to deploy faster.

Primary Data describes DataSphere in similar terms to what EMC used for ViPR when it launched the software-defined storage platform in 2013. Smith said there are similarities to ViPR and EMC's open source Project CoprHD, but EMC only supports a handful of storage vendors' arrays.

"This technology needs to be storage agnostic, yet high performing," Smith said. "It's not easy to do."

Arun Taneja, founder and consulting analyst of the Taneja Group Inc., based in Hopkinton, Mass., said Primary Data is the latest to try storage virtualization. The concept of pooling data across different types of storage has been available in arrays (Hitachi Data Systems, NetApp V-Series), controllers (IBM SVC), network devices (EMC Invista) and software (DataCore, ViPR), without ever taking off.

"I think of Primary Data as the 21st century version of what the industry was trying to do in the early part of the century with heterogeneous virtualization," Taneja said. "But in this era, one can start to believe that's possible. It was pretty much impossible in 2003 and 2004, not only for technological reasons but political reasons [because vendors were reluctant to enhance competitors' products].

"Primary Data is trying to break through the barrier. ViPR will always favor EMC products, but Primary Data has an opportunity to do something as a third-party independent where there is no favoritism."

Primary Data's founders David Flynn (CTO) and Rick White (chief marketing officer) come from server-side flash pioneer Fusion-io, as does Smith. Apple founder Steve Wozniak is chief scientist at Primary Data, a role he also played at Fusion-io. Fusion-io rode early success to become a public company, but put itself on the market after sales suddenly dropped and was acquired by SanDisk for $1.1 billion in 2014.

Next Steps

Storage virtualization overview: Types, capabilities and benefits

How to efficiently manage data in the enterprise

Differentiating software-defined storage and storage virtualization

Dig Deeper on Storage virtualization