idspopd - Fotolia
Published: 01 May 2015
For the longest time, the aim of all storage arrays was to execute on one simple task -- complete an I/O request as quickly as possible and get that response back to the requesting host. It's not really surprising that this was the main goal for a number of reasons.
First, hard drives were (and still are) slow devices and processor speeds are, by comparison, light years faster. Second, data was typically stored as blocks -- the ubiquitous LUN or volume. Storage arrays had no idea of what the content within a block looked like, which naturally provided little opportunity to optimize the data path or the data flow (such as implementing features like quality of service).
As storage systems have evolved, arrays have become content-aware, initially through the introduction of NAS appliances and, later, object storage systems. Both NAS and object stores provide the ability to store extra information that helps describe the data being stored; metadata, in other words, data about the data. In the case of NAS, this data is stored in a file system and presented to the connecting host using NFS or SMB protocols. The metadata includes the most basic concepts such as file name, date/time written, file size and access permissions. However, this can be extended to include additional metadata information such as performance and availability requirements.
In object-based systems, information is not stored in the hierarchy of a file system, but in a "flat namespace" where data is represented as a binary object, referenced by an object ID. Objects can be any type of information including traditional files, media (audio/video) or more complex data such as satellite telemetry or seismic data. The object ID acts as a reference to retrieve the data at a future stage. In the future, it is likely we will be able to use object stores to manage block devices, with (for example) each object representing a block within a LUN.
Object stores provide the ability to store metadata with each object, typically in the format of what is called a key/value pair. The "key" identifies the type of data (e.g., object owner), while the "value" is the data associated specifically with that object (e.g., a user name or department).
Until now, object stores have been used as large-scale repositories for data that is accessed on an infrequent basis. This is because the method of storing and retrieving data from an object store involves accessing the entire object, which can be a relatively long process when objects are large (although some platforms do allow access to parts of an object). Object storage and retrieval can also be slow when data protection measures such as erasure coding are used, especially with geographically dispersed configurations. When slices of data are dispersed across the available hardware; retrieval runs at the speed of the slowest node/server.
Object storage has become more popular as the technology addresses the most common problem experienced by large organizations -- managing data growth. Block-based storage systems aren't designed for the data that is growing at the fastest rate, namely unstructured data and, more recently, machine-generated data. However, because object storage can manage abstract data formats (by simply storing large binary objects) and associate large amounts of metadata with each object, it is well suited for these types of data.
Intelligent storage systems
So now we have set the scene, how can object stores and metadata be used to create more intelligent storage systems?
Scalability. Object stores are capable of scaling to the multi-petabyte level within a single system. This kind of capacity is only achieved by only the most expensive high-end storage arrays, scale-out NAS systems and tape libraries. However object stores are more flexible in their approach to growing capacity, which is achieved simply by adding extra nodes (or servers) to the configuration, in many cases, based on commodity hardware. This ability to grow in a scale-out fashion increases reliability and can provide for more geographic resiliency compared to deploying more monolithic or single-instance, block-based storage systems. The metadata within object stores scales with the data, either embedded in the data itself or managed in dedicated parts of the infrastructure.
Versioning. Object stores allow multiple versions of objects to be retained, based on policy settings. For example, a system may allow up to 10 object copies to be retained for a set period of time. Object version retention provides the ability to implement data recovery features such as snapshots and continuous data protection.
Extensibility. At their heart, object-based systems are relatively simple in nature; they store objects and object metadata. However, the use of metadata provides the ability to extend the capabilities of an object store by keeping information that allows the system to take action and manage the objects with a degree of intelligence. For example, an object store could be used to associate storage tiering levels with each object, implementing automated migration to cheaper storage as the object ages over time, or to track the users accessing and updating a particular file.
The ability to assign attributes to data means actions can be taken on object store contents in an automated fashion without storage administrator involvement. This is achieved through the use of policies that can be used to assign service-level attributes such as data protection, availability and resiliency. The ability to automate actions based on policies is key to achieving high levels of scalability, as this level of management typically can't be performed manually at scale.
Object stores and metadata are being used in two discrete ways today. First, there are object store platforms that are used simply to store object data. In many cases, these are supplemented by gateways or additional functionality that permits the use of non-object based protocols. Some vendors such as Scality are integrating protocol support natively into their software, providing the capability to perform analytics on data content.
More interestingly is the second use case where we see object and metadata technology being used as the basis for storing data within storage platforms without explicitly exposing an object storage interface. Today, there are a number of storage vendors selling these kinds of systems.
Intelligent storage vendors
Coho Data has developed a storage platform that combines SDN (software-defined networking) and an object store to deliver a scale-out storage platform with NFS protocol support. The system is built from a number of MicroArrays (small servers), connected through pairs of redundant Ethernet switches that manage the load balancing and data placement tasks across the infrastructure. Coho's product is designed for high-performance environments and, in particular, server and desktop virtualization.
Data Gravity has developed a product that performs data analytics on file and block-based content as it is written and ingested by its Discovery series appliance. The architecture is based on dual controllers, one of which actively serves data (the primary node), while the other provides data management and analysis (the intelligence node). The Data Gravity design assumes that, in a dual-controller architecture with active/passive design, the passive controller is generally inactive and can be used for analytics work. Data Gravity claims the Discovery platform is capable of identifying and analyzing more than 400 different file and data types.
Exabloxhas been delivering its OneBlox system to the market for the last two years. The system provides scale-out NAS capabilities (with SMB and CIFS support) using an object store as the underlying architecture. Customers buy the appliance but bring their own disk storage, enabling any SAS/SATA drive (including the latest 6 TB models) to be deployed in the system. Features such as variable block deduplication, CDP (continuous data protection) and non-uniform disk sizes are enabled through the division of files into objects, which are then distributed around the storage "ring" of connected devices. For example, CDP is easily implemented through the retention of multiple versioned updates to objects, a standard feature of all object stores. Today, OneBlox systems scale up to seven nodes, each provides 48 TB of raw capacity.
Primary Data is a new start-up vendor that only emerged from stealth in November 2014. The company acquired technology developed by Tonian Systems, an Israeli start-up that was rumored to be working on a product based around pNFS (parallel NFS). The Primary Data storage offering operates as a data hypervisor, but doesn't sit inline with the data. Instead, it separates the data plane from the management plane, using a cluster of highly available "data directors" that store information on the physical location of data at the hardware level. This separation removes the overhead of passing all data through a central appliance, allowing the product to scale much further than could be achieved with traditional inline products. However, the trade-off is the need to deploy driver software on each client accessing data, which is presumably where the original pNFS software comes in. The metadata directors can move data around the infrastructure to best meet performance and availability needs.
Qumulo recently emerged from stealth with a platform that is described as the "world's first data-aware scale-out NAS." The company was formed by a number of the original founders of Isilon, which was acquired by EMC. Based on the team's previous experiences, we can expect to see some similar scale-out NAS features in their main product known as Qumulo Core. The Core system is delivered as a set of appliances (scaling from four to 1000 nodes), but the main focus is on software and in particular data analytics, which are claimed to be delivered in real-time.
Like many other products, Qumulo has developed its own file system, QSFS (Qumulo Scalable File System) that sits atop the storage layer. Owning the file system allows the Qumulo software to collect and analyze statistics at the file level, providing more insight and actionable information than can be achieved with traditional scale-out NAS systems.
Why object storage systems are growing in popularity
Object storage system: An alternative to scale-out NAS
- Tiered Storage - Optimizing the Storage Infrastructure –Fujifilm Recording Media USA, Inc.
- Illuminating Insight for Unstructured Data at Scale –IBM