Choosing storage for streaming large files in big data sets
A comprehensive collection of articles, videos and more, hand-picked by our editors
When considering goals for a big data project, Neuralytix managing director Ben Woo encourages clients to think of the "ability to associate any piece of data with any application or any context at any time. It gives rise to more opportunity to create more innovation and value."
Need more tips for your big data project?
View another excerpt of Woo's big data presentation
In this video excerpt from Woo’s Storage Decisions conference on big data, Woo talked about the first step toward data consolidation, or centralized data. "Naturally most people would say NAS," Woo said. "You aren’t wrong. NAS is a great solution."
However, Woo encouraged storage pros to take a "look at a scale-out option. The reason for that is a singular namespace. It just makes life easier and reduces the chances for human errors along the way," Woo told conference attendees.
There are also a number of systems that support HDFS, natively -- and Woo is a fan of those because "now all of a sudden you can have data that's accessed in more traditional NAS protocols but also with this additional component of HDFS -- which, again, means your data doesn't have to move."
A crucial step in big data projects is "minimizing data movements within your storage architecture," Woo said. "Ultimately, what we’re building toward is essentially not only a virtualized infrastructure but also virtualized data, where the data exists independent of the application and the infrastructure," he said.
In order to consolidate data, simplifying the network is a necessary step, and there are several possibilities to consider, Woo said. He offered this short list:
- Simplify the storage network(s): FCoE, NFS/SMB and HDFS
- Consider running RDBMS over NFS (or SMB3)
- Consider running VM over NFS/SMB
- Consider running VDI over NFS/SMB
"Right now, we have virtualized servers [and] virtualized OS, but at least for now, our data is still tied one-to-one to the storage -- and that's what we have to break," explained Woo.
"That's where concepts like object-based storage systems, HDFS, etc., which are all variants of the same thing, become very applicable. Object-based storage systems allow you to add metadata and metadata is key here. If I can associate any piece of data with any application or any context at any time, it gives rise to more opportunity to create more innovation and value."