Designing storage systems that can handle the requirements of big data applications is a task that many storage administrators are starting to tackle in their own environments. In this SearchStorage Ask the Expert guide, John Webster, a senior partner analyst at the firm Evaluator Group based in Boulder, Colo., answers some of the most frequently asked questions on big data storage architecture, including how to spec a storage system for use by big data applications, guidance on protecting big data sets, advice on selecting an object or scale-out file system to manage data, and what to look for when using the Hadoop Distributed File System. Whether you're concerned about using RAID or defining a data warehouse project, Webster's expert analysis offers practical insight into big data architectures and the technology that supports them.
Evaluating big data app requirements
Big data applications vary by organization, but according to Webster, there are two general categories they fall under: large-capacity applications that need to hold hundreds of terabytes of data, and performance-intensive big data analytics apps. In this Ask the Expert, Webster details why bandwidth and response times are such critical factors when dealing with both of these application types.
Big data applications require varying bandwidth and capacity when it comes to storage. Find out how to determine the storage specs for big data sets. Continue Reading
Big data needs big protection
The capacity requirements of big data operations mean new challenges when protecting your organization's data. Find out what Webster has to say about RAID technology today and whether it's up to par for securing the massive, multi-petabyte data sets associated with big data architectures.
Making the storage decision
Object-based systems and scale-out file systems are both suitable for handling the requirements associated with a big data storage architecture, but they're not created equal. Scale-out systems offer a global namespace file system for easy management of network-attached storage, writes Webster, while object storage's use of metadata means better performance for larger file sets.
When comparing highly scalable storage systems for a big data architecture, the difference is in the metadata. Continue Reading
Gain new insights with big data analytics
Big data analytics and data warehousing are both methods of processing large amounts of data, but they aren't one in the same. Traditional data warehousing isn't always equipped to handle data common to big data environments that needs to be frequently accessed or updated. According to Webster, there are several reasons big data analytics and related technologies such as Hadoop can better pull valuable information from big data sets.
Find out what Webster believes are the three distinguishing characteristics to consider when comparing big data analytics and data warehousing operations. Continue Reading
Extending the uses for Hadoop in the data center
Considering how expensive it can be to build a Hadoop storage architecture, IT administrators are looking for ways to implement Hadoop systems for multiple applications. Webster details some of the Hadoop uses he's seen, including the incorporation of Hadoop in preconfigured products and using shared storage pools for data protection, archiving and security purposes.
With more storage vendors incorporating Hadoop in their products, users can now move away from the do-it-yourself approach to analytics. Continue Reading
Working around common Hadoop issues
Hadoop might be an effective way to handle the storage and performance requirements of big data systems, but it still has issues. In this answer, Webster tackles the most common Hadoop problems, as well as what to expect in Hadoop 2.0.
Users can expect NameNode single points of failure to be resolved in the next version of Hadoop. Continue Reading