Essential Guide

An expert's guide to big data storage architecture

Analyst John Webster answers six questions regarding storage architectures for today's big data applications. Find out what he has to say about system specifications, protecting your big data and Hadoop.


Designing storage systems that can handle the requirements of big data applications is a task that many storage administrators are starting to tackle in their own environments. In this SearchStorage Ask the Expert guide, John Webster, a senior partner analyst at the firm Evaluator Group based in Boulder, Colo., answers some of the most frequently asked questions on big data storage architecture, including how to spec a storage system for use by big data applications, guidance on protecting big data sets, advice on selecting an object or scale-out file system to manage data, and what to look for when using the Hadoop Distributed File System. Whether you're concerned about using RAID or defining a data warehouse project, Webster's expert analysis offers practical insight into big data architectures and the technology that supports them.

1Storage specs-

Evaluating big data app requirements

Big data applications vary by organization, but according to Webster, there are two general categories they fall under: large-capacity applications that need to hold hundreds of terabytes of data, and performance-intensive big data analytics apps. In this Ask the Expert, Webster details why bandwidth and response times are such critical factors when dealing with both of these application types.


What storage system specifications do I need to worry about with big data apps?

Big data applications require varying bandwidth and capacity when it comes to storage. Find out how to determine the storage specs for big data sets. Continue Reading

2Data protection-

Big data needs big protection

The capacity requirements of big data operations mean new challenges when protecting your organization's data. Find out what Webster has to say about RAID technology today and whether it's up to par for securing the massive, multi-petabyte data sets associated with big data architectures.


Is RAID sufficient to protect big data sets?

The rise in popularity of multi-terabyte single-disk capacities is pushing RAID to the sidelines in the data protection game, says Webster. Continue Reading

3Infrastructure choice-

Making the storage decision

Object-based systems and scale-out file systems are both suitable for handling the requirements associated with a big data storage architecture, but they're not created equal. Scale-out systems offer a global namespace file system for easy management of network-attached storage, writes Webster, while object storage's use of metadata means better performance for larger file sets.


Should I choose object-based or scale-out file systems for my big data apps?

When comparing highly scalable storage systems for a big data architecture, the difference is in the metadata. Continue Reading


Gain new insights with big data analytics

Big data analytics and data warehousing are both methods of processing large amounts of data, but they aren't one in the same. Traditional data warehousing isn't always equipped to handle data common to big data environments that needs to be frequently accessed or updated. According to Webster, there are several reasons big data analytics and related technologies such as Hadoop can better pull valuable information from big data sets.


What are the differences between big data storage analytics and data warehousing?

Find out what Webster believes are the three distinguishing characteristics to consider when comparing big data analytics and data warehousing operations. Continue Reading


Extending the uses for Hadoop in the data center

Considering how expensive it can be to build a Hadoop storage architecture, IT administrators are looking for ways to implement Hadoop systems for multiple applications. Webster details some of the Hadoop uses he's seen, including the incorporation of Hadoop in preconfigured products and using shared storage pools for data protection, archiving and security purposes.


How do I use Hadoop for applications that aren't related to big data?

With more storage vendors incorporating Hadoop in their products, users can now move away from the do-it-yourself approach to analytics. Continue Reading

6Problem areas-

Working around common Hadoop issues

Hadoop might be an effective way to handle the storage and performance requirements of big data systems, but it still has issues. In this answer, Webster tackles the most common Hadoop problems, as well as what to expect in Hadoop 2.0.


What problems might I encounter when using Hadoop Distributed File System?

Users can expect NameNode single points of failure to be resolved in the next version of Hadoop. Continue Reading