News Stay informed about the latest enterprise technology news and product updates.

Object storage systems in demand for big data, archiving

Object storage system use cases are growing, and their unique metadata capabilities are a good match for big data environments.

Object storage systems first gained attention because of characteristics that make them a good fit for the cloud. Now, the technology is finding use cases outside of the cloud, helping the adoption of object storage to grow significantly.

One use case where object storage is getting a lot of attention is in big data circles, specifically in vertical markets such as oil and gas, and media and entertainment. In these markets, scalability is crucial for "incredibly large amounts of data," said Randy Kerns, a partner at Boulder, Colo.-based Evaluator Group who oversees SAN and network-attached storage analysis and education.

"So companies are in a big data environment," he said. "They want to do analytics. They do not want to use shared primary storage -- meaning they want to bring it in, analyze it on the fly and have node-centralized storage in those environments."

Kerns recently wrote a report, Usage of Object Storage, which highlighted three use cases for object storage outside of the cloud: content repository, big data analytics and archiving. The content repository can consist of research data or libraries of assets (material from multiple geographic locations).

Increasing number of use cases draw on metadata capability

Object storage remains a common choice for cloud storage because of features such as multi-tenancy and geographical awareness. But other object characteristics, including users' ability to define metadata however they wish, offer advantages in other use cases.

One of the organizations Kerns examined in his report was the Albert Einstein College of Medicine of Yeshiva University, a New York research university focused on biomedical investigation.

"The Einstein case [is an example of] a repository," Kerns said. What data the scientists collected was going "directly from the microscopes to object storage." That organization's choice of DataDirect Networks' Web Object Scaler (WOS) was a surprise because the IT team did not set out to look for an object storage offering, Kerns said.

"The selection was made based on the characteristics to meet potentially very large scaling in the number of images, immutability of the images, and ease of management of the storage system. Some of the surprises from the deployment of the object storage system included the similarity to an Amazon cloud instead of a traditional file system, and the ease in which researchers adjusted to the use of the objects," Kerns wrote in his report.

DataDirect Networks Inc. CEO Alex Bouzari said his company has several customers using 10 PB or more of storage with WOS, including one customer with 70 billion objects under management. He credited Amazon Web Services with helping customers become comfortable with the notion of object storage. He sees it as complementary to file storage for unstructured data.

"Amazon has done a good job of evangelizing the notion," Bouzari said. "Organizations are starting to understand why it might make sense. A lot of early offerings were lacking in features and robustness, and it's just starting to get to the point where products do have features and reliability that customers expect to see. I don't think it will grow into tens of billions of dollars in a short period."

Customers are "looking for solutions that can handle objects and work under a framework that also serves files," Bouzari said. "Many are deploying WOS in conjunction with our file system-based products. The ability to combine these things is getting people to adopt that."

Newcomer Exablox said its OneBlox appliance with a file-based interface to object storage complements SANs and can be used as a backup target for organizations with rapidly growing storage.

"People are not deploying us to displace SANs, but to complement their block storage," Exablox CEO Doug Brockett said. "People who have an expensive, high-performance SAN that they use to support virtual machines or a [virtual desktop infrastructure] VDI deployment are using us to offload all the other stuff they don't want to put on their expensive SAN. They may use us for a lot of primary file storage. We've become the storage for home directories, storage for app data, storage for backup, storage for snapshots. We look good for people with big unstructured data problems."

Brockett said Exablox's focus market is companies with 250 to 2,000 employees, and from 20 TB to 200 TB of unstructured data.

'Fragmented, proprietary APIs' remain a challenge

A report this month by Gartner Research, Critical Capabilities for Object Storage, said a combination of factors, including cloud and mobile options, are "propelling object storage to relevance in environments that demand scalability, ease of access and security."

However, the report pointed that the object storage market "continues to suffer from fragmented, proprietary APIs," despite the growing number of vendors that offer support for the Amazon Simple Storage Service (S3) API.

The report cited a new class of startups -- vendors that are "pushing the performance limitations of object storage [in order] to position it for a variety of cloud and big data workloads."

Startup Scality Inc. said bookings of its Ring object storage software increased 500% from 2012 to 2013. Its customer base is still largely service providers, but CEO Jerome Lecat said the company is picking up customers in new markets, such as banking, pharmaceuticals, healthcare and media companies.

"We think all medium-sized businesses will use the cloud in the form of Software as a Service with providers like Salesforce, Box or Amazon, while large companies will be big enough to run their own IT," he said. "These large companies' IT will be software-defined data centers, and we plan to be the storage piece for them."

Lecat said Scality's deals typically range between 200 TB to 6 PB of storage.

In Kerns' report, many cloud service reporters cited object storage as "the best way to continue to scale their environment with a single-level platform" and as a way to avoid growing into hierarchal systems.

Marc Staimer, founder of Beaverton, Ore.-based Dragon Slayer Consulting, and who has written extensively on the topic of object storage, said: "Object storage is rapidly becoming the ideal repository for online passive or secondary data."

He added: "With so many backup and archive applications now writing natively to object storage -- while concurrently many of the object storage systems have added standard storage interfaces such as NFS, CIFS, HDFS and iSCSI -- the trend has been quietly crystalizing."

(Senior news director Dave Raffo contributed to this story).

Dig Deeper on Object storage

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

The article is good but it misses one of the major benefits of object storage: as long as the objects are replicated the content does not need to be backed up again as the objects never change. This is assuming that the object storage system is properly designed and has has no dependency on a single lookup table or database or file system. Another added benefit is I can validate that whatever I read is identical to what I originally wrote.
Well, the three major RESTful APIs for object storage are the S3 API, the Swift API and the CDMI API. The AWS S3 API is the de facto standard and widely supported to various degrees by most object storage software vendors. The Swift API is part of the Swift storage project (program), which is part of the OpenStack framework, so it is widely supported thanks to the growing support for OpenStack. Many of the object storage software vendors, like Basho, Caringo, Cloudian and Scality support both S3 and Swift. CDMI has been promulgated by the Storage Network Industry Association (SNIA). It is probably the least used of the object storage APIs. Scality supports CDMI. The important thing about RESTful object storage APIs is the ecosystem of third party software, appliances and gateways that can be used with them. Currently the AWS S3 API compatible ecosystem is the largest and most dominant, which should not come as a surprise. AWS now stores over 2 trillion objects in S3. Archiving is a valid use case for object storage but the author didn't venture far into the subject of active actives or deep archives and the use of AWS Glacier or new entrants in the object storage archiving market like SageCloud or Spectra Logic with its BlackPearl DS3 (Deep Simple Storage Service).