Unstructured data storage showdown: Object storage vs scale-out NAS
A comprehensive collection of articles, videos and more, hand-picked by our editors
Object storage is the latest alternative to traditional file-based storage, offering greater scalability and (potentially)...
better data management with its extended metadata. Until recently, however, object storage has largely been a niche technology for enterprises, while simultaneously becoming one of the basic underpinnings of cloud storage.
Rapid data growth, the proliferation of big data lakes in the enterprise, an increased demand for private and hybrid cloud storage and a growing need for programmable and scalable storage infrastructure are pulling object storage from its niche existence to the mainstream.
An expanding list of object storage products, from both major storage vendors and startups, is another indication of object storage's increasing relevance. Moreover, object storage is reaching into network-attached storage (NAS) use cases, with some object storage vendors positioning their products as viable NAS alternatives.
Object storage use cases
Backup and archival: Object storage systems are cost-effective, highly scalable backup and archival platforms, especially if data needs to be available for continuous access.
Enterprise collaboration: Geographically distributed object storage systems are used as collaboration platforms where content is accessed and shared across the globe.
Storage as a service: Object storage powers private and public clouds of enterprises and service providers.
Content repositories: Used as content repositories for images, videos and other content accessed through applications or via file system protocols.
Log storage: Used to capture massive amounts of log data generated by devices and applications, ingested into the object store via a message broker like Apache Kafka.
Big data: Several object storage products offer certified S3 HDFS interfaces that allow Hadoop to directly access data on the object store.
Content distribution network: Used to globally distribute content like movies using policies to govern access with features like automatic object deletion based on expiration dates.
Network-Attached Storage (NAS): Used in lieu of dedicated NAS systems, especially if there is another use case that requires an object storage system. – J.G.
To accomplish the lofty goal of overcoming the limitations of traditional file- and block-level storage systems to reliably and cost-effectively support massive amounts of data, object storage systems focus on and break new ground when it comes to scalability, resiliency, accessibility, security and manageability. Let's examine how object storage systems do this.
Scalability is key to object storage
Complexity is anathema to extreme scalability. Object storage systems employ several techniques that are simple in nature, but essential to achieving unprecedented levels of scale.
To start with, object storage systems are scale-out systems that scale capacity, processing and networking resources horizontally by adding nodes. While some object storage products implement self-contained multifunction nodes that perform access, storage and control tasks in a single node, others consist of specialized node types. For instance, IBM Cleversafe, OpenStack Swift and Red Hat Ceph Storage differentiate between access and storage nodes; conversely, each node in Caringo Swarm 8 and EMC Elastic Cloud Storage (ECS) performs all object storage functions.
Unlike the hierarchical structure of file-level storage, object storage systems are flat, with a single namespace in which objects are addressed via unique object identifiers, thereby enabling unprecedented scale. "With 1038 object IDs available per vault, we support a yottabyte-scale namespace, and with each object segmented into 4 MB segments, our largest deployments today are north of 100 petabytes of capacity, and we are prepared to scale to and beyond exabyte-level capacity," according to Russ Kennedy, IBM senior vice president product strategy, Cleversafe.
Furthermore, object storage vendors are quick to note their systems substitute the locking requirements of file-level storage to prevent multiple concurrent updates (with versioning of objects on update), enabling capabilities like rollback and undeleting of objects as well as the inherent ability to access prior object versions. Finally, object storage systems replace the limited and rigid file system attributes of file-level storage with rich customizable metadata that not only capture common object characteristics, but can also hold application-specific information.
Object offers greater resiliency
Traditional block- and file-level storage systems are stymied by fundamental limitations to support massive capacity. A case in point is data protection. It's simply unrealistic to back up hundreds of petabytes of data. Object systems are designed to not require backups; instead, they store data with sufficient redundancy so that data is never lost, even while multiple components of the object storage infrastructure are failing.
Keeping multiple replicas of objects is one way of achieving this. On the downside, replication is capacity-intensive. For instance, maintaining six replicas requires six times the capacity of the protected data. As a result, object storage systems support the more efficient erasure coding data protection method in addition to replication. In simple terms, erasure coding uses advanced math to create additional information that allows for recreating data from a subset of the original data, analogous to RAID 5's ability to retrieve the original data from the remaining drives despite one failing drive. The degree of resiliency is typically configurable in contemporary object storage systems. The higher the level of resiliency, the more storage is required.
Erasure coding saves capacity, but impacts performance, especially if erasure coding is performed across geographically dispersed nodes. "Although we support geographic erasure coding, performing erasure coding within a data center, but using replication between data centers is often the best capacity/performance tradeoff," said Paul Turner, chief marketing officer at Cloudian. With large objects yielding the biggest erasure coding payback, some object storage vendors recommend data protection policies based on object size. EMC ECS uses erasure coding locally and replication between data centers, but combines replication with data reduction, achieving an overall data reduction ratio similar to that of geo-dispersed erasure coding without the performance penalty of the latter.
The ability to detect and, if possible, correct object storage issues is pertinent for a large, geographically dispersed storage system. Continuous monitoring of storage nodes, automatic relocation of affected data, and the ability to self-heal and self-correct without human intervention are critical capabilities to prevent data loss and ensure continuous availability.
Object storage is accessed via a HTTP RESTful API to perform the various storage functions, with each product implementing its own proprietary APIs. All object storage products also support the Amazon Simple Storage Service (S3) API, which has become the de facto object storage API standard -- with by far the largest number of applications using it. It also has extensive and beyond simple PUT, GET and DELETE operations and supports complex storage operations.
The one thing to be aware of, though, is that most object storage vendors only support an S3 API subset, and understanding the S3 API implementation limitations is critical to ensuring wide application support. Besides Amazon S3, most object storage vendors also support the OpenStack Swift API.
File system protocol support is common in object storage systems, but implementations vary by product. For instance, EMC ECS has geo-distributed active/active NFS support, and with ECS' consistency support, it's a pretty strong geo-distributed NAS product. Scality claims EMC Isilon-level NAS performance, and the NetApp StorageGRID Webscale now offers protocol duality by having a one-to-one relationship between objects and files.
Other object storage products provide file system support through their own or third-party cloud storage gateways like the ones offered by Avere, CTERA Networks, Nasuni and Panzura. Both Caringo Swarm and EMC ECS offer Hadoop HDFS interfaces, allowing Hadoop to directly access data in their object stores. HGST Active Archive System and Cloudian provide S3-compliant connectors that enable Apache Spark and Hadoop to use object storage as a storage alternative to HDFS.
Encryption provides needed security
A common use case of an object storage product by service providers is public cloud storage. Although at-rest and in-transit encryption are a good practice for all use cases, encryption is a must for public cloud storage. The majority of object storage products support both at-rest and in-transit encryption, using a low-touch at-rest encryption approach where encryption keys are generated dynamically and stored in the vicinity of encrypted objects without the need for a separate key management system.
Cloudian HyperStore and HGST Active Archive System support client-managed encryption keys in addition to server-side managed encryption keys, giving cloud service providers an option to allow their customers to manage their own keys. Caringo Swarm, the DDN WOS Object Storage platform and Scality RING currently don't support at-rest encryption, relying on application-based encryption of data before it's written to the object store.
LDAP and AD authentication support of users accessing the object store are common in contemporary object storage systems. Support of AWS v2 or v4 authentication to provide access to vaults -- and objects within vaults -- is less common and should be an evaluation criterion when selecting an object storage system.
Object storage minimizes management
Object storage systems are designed to minimize human storage administration through automation, policy engines and self-correcting capabilities. "The Cleversafe system enables storage administrators to handle 15 times the storage capacity of traditional storage systems," claims Kennedy.
Object storage systems are designed for zero downtime, and all administration tasks can be performed without service disruption -- from upgrades, hardware maintenance and refreshes to adding capacity and changing data centers. Policy engines enable the automation of object storage behavior, such as when to use replication vs. erasure coding, under what circumstances to change the number of replicas to support usage spikes and what data centers to store objects in based on associated metadata.
While commercial object storage products typically provide management tools, technical support and professional services to deploy and keep object storage systems humming, the open-source OpenStack Swift product demands a higher degree of self-reliance. For companies that don't have the internal resources to deploy and manage OpenStack Swift, SwiftStack sells an enterprise offering of Swift with cluster and management tools, enterprise integration and 24-7 support.
Without question, object storage systems are on the rise. Their ability to scale and access via APIs makes them suitable in use cases where traditional storage systems simply can't compete. They're also increasingly becoming a NAS alternative, with some object storage vendors claiming parity with NAS systems.
With a growing list of object storage products, choosing an object storage system becomes increasingly challenging, however. Overall features and ensuring that your use cases are supported, cost and vendor viability are primary decision criteria when investigating object storage systems. Still a relatively new technology with capabilities varying and in flux, reference checks and (if possible) performing proof-of-concept testing are highly advisable before finalizing your object storage product selection.
How object storage technology differs from software-defined storage
Why object storage popularity has eclipsed file storage
How object-based systems compare with scale-out systems