Managing data with an object storage system
A comprehensive collection of articles, videos and more, hand-picked by our editors
Object technology has received a lot of attention lately, so there should be plenty of use cases for it. But object storage might be the answer to a problem not yet discovered.
At a recent analyst meeting held by a F100 computer manufacturer, the senior vice president and general manager of the storage business made the bold claim that NAS will disappear and object-based storage technology is the future of unstructured data. He isn't alone in his enthusiasm for object technology. Barely a week goes by without another major storage vendor (and a few minor ones) staking their claim on the object storage market. At last count, there are nearly 20 vendors in the object storage market, including all the major storage vendors. There aren't nearly as many object stores on the market as there are DAS, SAN and NAS, but there is a groundswell of support that is gaining momentum.
But that momentum seems more of a vendor push rather than a customer pull. Many smaller vendors offering object storage are struggling for customer adoption and acceptance; users just don't know why they need to introduce yet another storage platform (non-standards-based) into their environment. It may just be that object stores are still a bit ahead of their time and IT isn't in enough pain yet to compel the technology change. As one vendor recently told me, "it solves a problem that's still around the corner." How fast that problem is approaching often depends on the industry and use case.
Object storage defined
Object storage operates differently from standard file system storage. With a standard storage infrastructure, content is managed through a hierarchical file system using an index table that points to the physical storage location of each file and tracks only simple metadata. This approach limits the number of files that can be managed in a single directory. Object storage data is organized into containers of flexible sizes ("objects"). Each object has a unique ID (instead of a file name) with metadata that can include detailed attributes. This metadata can be used to set up automatic storage policies such as the migration of aging data from high-performance to more cost-efficient capacity-based disk or the deletion of data when it expires. Object storage offers a simpler design and greater scalability, easily managing billions of individual objects. Historically, the disadvantage of object storage has been performance, as data retrieval is generally considered slower than with a file system. However, recent market entries from vendors like DataDirect Networks and Scality are challenging that notion.
Object storage has been around for many years. One of the earliest (and perhaps best known) object storage systems was EMC's Centera, which came out in the early 2000s. Centera was called content-addressable storage (CAS) because it derived an object ID directly from the content itself, generating a digital fingerprint of the data. It was targeted at long-term "active archive" storage -- data that needed to be retained, grew like kudzu, was occasionally required for something and needed to be retrieved in a timely manner (records for e-discovery or medical images for patient care, for example). Centera integrated with applications via a proprietary API, and EMC built a strong ISV partner ecosystem and sales channel to drive adoption.
Centera was not a high-performance system, and it didn't need to be. Many of the next-generation object stores now on the market are still targeted at "semi-active data" use cases like active archive or Web content depots, with performance attributes that match the need. The scalability and manageability of object stores make them a natural back end for cloud deployments.
Some object stores are designed for higher performance primary storage use cases, applications that require high throughput such as those found in media and entertainment, research and development, and analytics. Object stores are architecturally a good match for these use cases because of the scale-out nature of these systems that provides a basis for greater bandwidth and overall throughput. But these offerings have somewhat of an uphill battle -- they face the adoption challenges object stores have seen in general and they need to overcome the perception that object stores are mainly archive solutions. However, both DataDirect Networks and Scality (and to some degree Cleversafe with its Hadoop integration) have seen some uptake.
The general adoption challenge, regardless of use case, is that these object stores require integration with proprietary APIs; they don't "talk" to applications in standard interfaces such as NFS, CIFS/SMB or SCSI. That lack of a standards-based interface has been somewhat of an adoption inhibitor primarily because users don't want to write new application interfaces. But perhaps a bigger inhibitor is the lock-in this creates to a single vendor's storage architecture. This has been an issue for the past couple of years, but is likely to be a non-issue soon as most object storage vendors have added NFS and CIFS support (or will soon). Additionally, many have added support for Amazon Web Services Simple Storage Service APIs, which are quickly becoming the de facto RESTful interface standard. These advancements should finally remove big issues with adoption and stimulate business.
Netting it out
Object store adoption is still largely confined to service providers or large enterprises that have the money and resources to deal with integration. In enterprise IT, NFS and CIFS will still be around. (Nothing ever dies in IT. HP even builds a couple of minicomputers every year for legacy applications that can't be ported to other systems.) But over time, as interfaces standardize and applications are developed with object storage back ends in mind, expect object stores to show measurable gains as a platform of choice even in the enterprise.
Over time, object stores will likely coexist with NFS and CIFS as just another way to store and manage unstructured data. Adoption will be use-case dependent. When data growth, manageability, cost and/or throughput trump random throughput and advanced features such as point-in-time copy and synchronous remote replication (as in the case for long-term archive or data-intensive high growth apps), object will likely rule the day. That's not to say object stores will never have advanced feature sets, but they're playing catch-up in that area and enterprise storage vendors aren't standing still. And for specialty use cases, some of the integrations with Hadoop that take advantage of the parallel, scale-out object store architecture look extremely promising as a way to store and process massive amounts of data affordably. Ultimately, it will come down to cost and manageability.
About the author:
Terri McClure is a senior storage analyst at Enterprise Strategy Group, Milford, Mass.