An expert's guide to big data storage architecture
A comprehensive collection of articles, videos and more, hand-picked by our editors
Can you compare RAID against other, newer forms of data protection?
The acronym RAID once stood for Redundant Arrays of Inexpensive Disks. Vendors worked to change the Inexpensive to Independent, I presume, because they wanted to sell RAID arrays to enterprise users who would balk at the thought of using inexpensive, PC-quality disks in production data center environments. But the creators of RAID wanted to build highly scalable disk storage systems from small inexpensive form-factor disk drives that were commonly found in early versions of the PC. Back then, drive failure during operation was a basic assumption, so we needed ways to engineer around drive failure without data loss and disruption. RAID data protection (e.g., RAID 0, 1, 3, 5, 6, 10 and so on) offered ways to achieve those two goals.
However, with the advent of multi-terabyte, single-disk capacities, RAID is beginning to show signs of obsolescence. When the failure of a single multi-terabyte drive in a large RAID array takes from many hours to days to rebuild, it’s time to look for an alternative. One of the most popular ones goes by three different terms that all generally refer to the same data protection information dispersal algorithm: Reed-Solomon, forward error correction and erasure coding. Here, the use of terabyte-capacity drives in large-capacity arrays where multiple concurrent drive failures could occur are basic design assumptions. Such storage systems can tolerate and recover quickly from multiple drive failures, including the failure of an entire drive module, without disruption.
Related Q&A from John Webster
John Webster describes how changes to HDFS and the NameNode can help to improve Hadoop infrastructure.continue reading
Analyst John Webster details issues with Hadoop architecture and what users can expect from Hadoop Version 2.0.continue reading
Understanding big data analytics, and how it differs from data warehousing, depends on time to information, content complexity and cost.continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.