An expert's guide to big data storage architecture

Last updated:July 2015

Editor's note

Designing storage systems that can handle the requirements of big data applications is a task that many storage administrators are starting to tackle in their own environments. In this SearchStorage Ask the Expert guide, John Webster, a senior partner analyst at the firm Evaluator Group based in Boulder, Colo., answers some of the most frequently asked questions on big data storage architecture, including how to spec a storage system for use by big data applications, guidance on protecting big data sets, advice on selecting an object or scale-out file system to manage data, and what to look for when using the Hadoop Distributed File System. Whether you're concerned about using RAID or defining a data warehouse project, Webster's expert analysis offers practical insight into big data architectures and the technology that supports them.

1Big data needs big protection

The capacity requirements of big data operations mean new challenges when protecting your organization's data. Find out what Webster has to say about RAID technology today and whether it's up to par for securing the massive, multi-petabyte data sets associated with big data architectures.

2Making the storage decision

Object-based systems and scale-out file systems are both suitable for handling the requirements associated with a big data storage architecture, but they're not created equal. Scale-out systems offer a global namespace file system for easy management of network-attached storage, writes Webster, while object storage's use of metadata means better performance for larger file sets.

3Gain new insights with big data analytics

Big data analytics and data warehousing are both methods of processing large amounts of data, but they aren't one in the same. Traditional data warehousing isn't always equipped to handle data common to big data environments that needs to be frequently accessed or updated. According to Webster, there are several reasons big data analytics and related technologies such as Hadoop can better pull valuable information from big data sets.

4Extending the uses for Hadoop in the data center

Considering how expensive it can be to build a Hadoop storage architecture, IT administrators are looking for ways to implement Hadoop systems for multiple applications. Webster details some of the Hadoop uses he's seen, including the incorporation of Hadoop in preconfigured products and using shared storage pools for data protection, archiving and security purposes.

5Working around common Hadoop issues

Hadoop might be an effective way to handle the storage and performance requirements of big data systems, but it still has issues. In this answer, Webster tackles the most common Hadoop problems, as well as what to expect in Hadoop 2.0.