An expert's guide to big data storage architecture

Last updated:July 2015

Essential Guide

Browse Sections

Editor's note

Designing storage systems that can handle the requirements of big data applications is a task that many storage administrators are starting to tackle in their own environments. In this SearchStorage Ask the Expert guide, John Webster, a senior partner analyst at the firm Evaluator Group based in Boulder, Colo., answers some of the most frequently asked questions on big data storage architecture, including how to spec a storage system for use by big data applications, guidance on protecting big data sets, advice on selecting an object or scale-out file system to manage data, and what to look for when using the Hadoop Distributed File System. Whether you're concerned about using RAID or defining a data warehouse project, Webster's expert analysis offers practical insight into big data architectures and the technology that supports them.

1Making the storage decision

Object-based systems and scale-out file systems are both suitable for handling the requirements associated with a big data storage architecture, but they're not created equal. Scale-out systems offer a global namespace file system for easy management of network-attached storage, writes Webster, while object storage's use of metadata means better performance for larger file sets.

2Gain new insights with big data analytics

Big data analytics and data warehousing are both methods of processing large amounts of data, but they aren't one in the same. Traditional data warehousing isn't always equipped to handle data common to big data environments that needs to be frequently accessed or updated. According to Webster, there are several reasons big data analytics and related technologies such as Hadoop can better pull valuable information from big data sets.

3Working around common Hadoop issues

Hadoop might be an effective way to handle the storage and performance requirements of big data systems, but it still has issues. In this answer, Webster tackles the most common Hadoop problems, as well as what to expect in Hadoop 2.0.