Mopic - Fotolia
Startup Formation Data Systems offered a sneak peek of its data and storage virtualization software this week at VMworld in San Francisco, weeks after the release of its first product in limited availability.
The FormationOne Dynamic Storage Platform runs on standard x86 hardware, bare metal or virtual machines (VMs), and supports block, file, object storage, Amazon Simple Storage Service (S3) and Hadoop Distributed File System (HDFS) through data connectors via its eXtensible Data Interface (XDI). The software aims to provide an abstraction layer to enable storage functions, such as deduplication, backup and replication, to run the same way across all data and applications.
Chairman and CEO Mark Lewis hopes the third time will be the charm with his company's FormationOne product. Lewis blogged nearly a year ago about the "miserable failure" of his past two attempts at creating a "data virtualization layer" -- first, with VersaStor at Compaq/Hewlett-Packard, and later, with Invista at EMC, where he spent nearly 10 years in various senior executive roles, including CTO.
Formation Data Systems plans to do an official launch at year's end after customers, references and other key pieces of the puzzle are in place. In the meantime, we caught up with Lewis to discuss his company's strategy and technology.
Mark Lewis: The major radical thing that we did differently at Formation was we're not embracing legacy storage models. We're not trying to do what Invista did, what ViPR tried to do, what all these things tried to do. When you're a big company and you have a lot of products, and you want to keep selling those products, and you've sold all those products, you try to build software that runs on top of those products to make all those complicated products look simple.
The problem with that is that it's not simple, and it's not cheap. You still need to manage the core products, and then you've just added more software on top. So, the lesson learned in products like Invista was: You can't really add complexity on top of complexity and create simplicity. It just doesn't work.
If you look at something like [Amazon Web Services] AWS and public cloud in Amazon, they don't use anybody's storage below their system for AWS. They used bare metal. They wrote software. And they wrote modern software.
We're taking the approach that in order to reinvent, you have to adopt as-a-service models, on-demand models. And so our real movement is to move people from the network storage generation to the cloud storage generation.
What do you see as the prime use cases for your product?
Lewis: Initially, we're looking at three use cases. The first is we have a lot of customers looking at truly elastic, on-demand data center architectures to build out their true, next-generation, software-defined data center. We're not trying to replace the storage you own. But if you look at deploying a brand new data center and want to save 75% to 90% of your costs in deploying storage, you have to look at new models. We're getting a lot of folks looking at us to be the storage as a service, data as a service layer within these new private clouds.
Secondary is Hadoop. Hadoop has been in a lot of business unit deployments and is starting to move over the wall into production deployments, IT infrastructure deployments. One of the data connectors we offer is a direct HDFS connection. The benefit there is that we can provide things like inline deduplication, snapshotting, cloning, multiple forms of protection and replication, other forms of data management and governance, [and] all for your Hadoop infrastructure without changing anything.
Our third is multi-tier cloud object store. We did see there's a general shift away from [Network File System] NFS environments toward more scalable object stores. Most of those object stores are S3 compatible. And we built a very scalable, very performant object store for use within private data centers.
Is the Formation Data Systems product primarily geared for tier 1 storage, or for secondary storage and archiving?
Lewis: We are a data virtualization platform across all three. If you take Cohesity, Rubrik, Actifio or guys like that, they're trying to create a very deep silo -- kind of like Oracle and Exadata. 'We're going to own the app. We're going to own the database. We're going to own the hardware. We're going to own the storage.' And they try to build that all into a single-function silo.
We're trying to do the exact opposite. We're trying to say that flash and disk are hardware resources, even in an erasure-coded infrastructure. It's kind of a core service. And we're going to provide a data virtualization layer onto that, and to be able to migrate data across those tiers, provide [quality of service] QoS and do other functions. But we're not trying to be this super-high stack. We're not trying to be the database. We're not trying to be the backup management tool. All tools can connect to us. We're virtualizing infrastructure and virtualizing the data layer.
How do you define data virtualization?
Lewis: With data virtualization, the idea is that you can take a set of resources -- flash or disk hardware -- and have a consistent data layer. Then, when we build capabilities -- snapshots, replication, deduplication -- all of those features can be put in at the data layer. So, no matter what type of data you want to project -- blocks, files, objects -- we have this consistent data layer for data management, and that saves us a fortune.
Mark Lewischairman and CEO, Formation Data Systems
Our objective is to write that [code] once for a virtual data platform, and not need to write block data replication, file data replication, object data replication, and on and on. It gives us a great advantage for data management and consistent data management for the consumer.
How is the Formation Data Systems FormationOne architected?
Lewis: At the base layer, we will virtualize storage infrastructure -- meaning flash or disk drives -- across x86 commodity infrastructure. You can start with as few as four nodes, and you can scale to literally 1,000 nodes within the system. It's all built on a flexible, scale-out infrastructure architecture.
The second layer we've added is a universal data virtualization layer that virtualizes any type of data. No matter how in your environment you're dealing with different types of data, different data services, different needs, we normalize everything into a specific data virtualization model that makes it easy for us to manage the data, provide different service levels, do replication [and] deduplication. All of that happens on a common data model.
The third tier of our architecture is built around a data connector model. You can think of it as an application model, the same as an app model on your iPhone. Storage resources are virtualized, just like on a smartphone. You really have a screen, [and] a few buttons. You have a camera, a wireless connection and a few other things. You can kind of build any device from it. That's our model for data connectors. We have a universal data model that virtualizes the data. And above that, we are providing data connectors and others can create data connectors for any type of data service they wish to provide. We have a block data connector, a file NFS data connector, HDFS data connectors, [and] an S3 object data connector.
Is your product software-only?
Lewis: We are software-only, but we really run on our own hardware, so only an appliance. We'll be selling, initially, as a subscription model only. And while we will help you select the hardware for your particular use cases, we want you to have your choice of vendors on the hardware side. What we found in customers is most of them have one or two vendors. They're already using IBM, or HP, or Dell, or Supermicro or some flavor of hardware. So, we're pretty much saying, 'We'll make sure whatever system you've already bought will be OK for what you want to do.'
How do you get performance out of the system?
Lewis: First and foremost, Formation is built on a true hyperscale architecture approach. The closest thing to our architecture is Google Search. The way Google gets performance out of Google Search is by distributing the load, by distributing the data and by being able to use lots of small nodes to deliver a lot of horsepower, and then use that scale-out model to be able to scale up performance and capacity.
Exploring what storage virtualization brings to the table
The basics on storage as a service model
Software-defined storage and storage virtualization aren't one in the same