Designing storage systems that can handle the requirements of big data applications is a task that many storage administrators are starting to tackle in their own environments. In this SearchStorage.com Ask the Expert guide, John Webster, a senior partner at Boulder, Colo.-based analyst firm Evaluator Group, answers some of the most frequently asked questions on big data storage architecture, including how to spec a storage system for use by big data applications, guidance on protecting big data sets, advice on selecting an object or scale-out file system to manage data, and what to look for when using the Hadoop Distributed File System (HDFS). Whether you're concerned about using RAID or defining a data warehouse project, Webster's expert analysis offers practical insight into big data architectures and the technology that supports them.
Table of contents:
Big data applications vary by organization, but according to John Webster, there are two general categories they fall under: large-capacity applications that need to hold hundreds of terabytes of data, and performance-intensive big data analytics apps. In this Ask the Expert, Webster details why bandwidth and response times are such critical factors when dealing with both of these application types.
The capacity requirements of big data operations mean new challenges when protecting your organization's data. Find out what Webster has to say about RAID technology today and whether it's up to par for securing the massive, multi-terabyte data sets associated with big data architectures.
Object-based systems and scale-out file systems are both suitable for handling the requirements associated with big data storage architectures, but they're not created equal. Scale-out systems offer a global namespace file system for easy management of network-attached storage, writes Webster, while object storage's use of metadata means better performance for larger file sets.
Considering how expensive it can be to build a Hadoop storage architecture, IT administrators are looking for ways to implement Hadoop systems for multiple applications. Webster details some of the Hadoop uses he's seen, including the incorporation of Hadoop in preconfigured products and using shared storage pools for data protection, archiving and security purposes.
Find out what Webster believes are the three distinguishing characteristics to consider when comparing big data analytics and data warehousing operations.
Hadoop might be an effective way to handle the storage and performance requirements of big data systems, but it still has issues. In this answer, Webster tackles the most common Hadoop problems, as well as what to expect in the newest version.