This article can also be found in the Premium Editorial Download "Storage magazine: Best practices for cloud backup integration."
Download it now to read this article plus other related content.
This could mean an opportunity for storage and IT infrastructure companies. As data sets continue to grow with both structured and unstructured data, and analysis of that data gets more diverse, current storage system designs will be less able to meet the needs of big data. Storage vendors have begun to respond with block- and file-based systems designed to accommodate many of these requirements. Here’s a listing of some of the characteristics big data storage infrastructures need to incorporate to meet the challenges presented by big data.
Capacity. “Big” often translates into petabytes of data, so big data storage systems certainly need to be able to scale. But they also need to scale easily, adding capacity in modules or arrays transparently to users, or at least without taking the system down.
Big data also means a large number of files. Managing the accumulation of metadata for file systems at this level can reduce scalability and impact performance, a situation that can be a problem for traditional NAS systems. Object-based storage architectures, on the other hand, can allow big data storage systems to expand file counts into the billions without suffering the overhead problems that traditional file systems encounter. Object-based storage systems can also scale geographically, enabling large infrastructures to be spread across multiple locations.
Latency. Big data may also have a real-time component, especially in use cases involving web transactions or finance. For example, tailoring web advertising to each user’s browsing history requires real-time analytics. Storage systems must be able grow to the aforementioned proportions while maintaining performance because latency can produce “stale data.” Here, too, scale-out architectures enable the cluster of storage nodes to increase in processing power and connectivity as they grow in capacity. Object-based storage systems can parallelize data streams, further improving throughput.
Many big data environments will need to provide high IOPS performance, such as those in high-performance computing (HPC) environments. Server virtualization will drive high IOPS requirements, just as it does in traditional IT environments. To meet these challenges, solid-state storage devices can be implemented in many different formats, from a simple server-based cache to all-flash-based scalable storage systems.
This was first published in April 2012