Choosing storage for streaming large files in big data sets
A comprehensive collection of articles, videos and more, hand-picked by our editors
In this video, Ben Woo, managing editor of Neuralytix Inc., sat down with Rich Castagna, editorial director of TechTarget's Storage Media Group, to discuss Hadoop and storage for big data projects.
Big data is a term that's often used by vendors and IT pros, especially in the fields of health care and media/entertainment, but finding a definitive way to classify it can be difficult.
According to Woo, dealing with large sets of unstructured data isn't so much about the volume, but about putting that data "in context to create value." Big data projects usually involve external factors such as Hadoop or other data processing technologies.
The large and unstructured nature of big data means environments might not have enough power to handle the high transaction rate of I/O requests, or sufficient bandwidth to deliver data in a timely manner. It can also lead to storage sprawl and management difficulties.
The most popular storage for big data projects, Woo said, is scale-out object storage because of its ability to handle metadata at a more granular level. "It's not like traditional NAS [network-attached storage] or SAN. It's more distributed -- it's object-based," he said of big data environments. "It's requiring scale-out capabilities, which we've tended to not have done in the past."
Also gaining popularity hand in hand with big data is Hadoop, though it isn't the only approach to processing big data. Still, the benefit of the Hadoop Distributed File System (HDFS) -- more unified data management -- is causing some vendors to adapt their products to better work with it. "We can probably envision a time 10 or 20 years in the future in which HDFS, or some variant of that, will be our sole file system across all types of applications and storage," Woo said.