This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
4. - Big data storage: Experts weigh in: Read more in this section
Explore other sections in this guide:
- 1. - Foundations of storing big data sets
- 2. - Exploring big data storage options
- 3. - Choosing storage to accommodate big data analytics
Why a big data project requires virtualized data, simplified networksDate: Jan 30, 2014
When considering goals for a big data project, Neuralytix managing director Ben Woo encourages clients to think of the "ability to associate any piece of data with any application or any context at any time. It gives rise to more opportunity to create more innovation and value."
Need more tips for your big data project?
View another excerpt of Woo's big data presentation
In this video excerpt from Woo’s Storage Decisions conference on big data, Woo talked about the first step toward data consolidation, or centralized data. "Naturally most people would say NAS," Woo said. "You aren’t wrong. NAS is a great solution."
However, Woo encouraged storage pros to take a "look at a scale-out option. The reason for that is a singular namespace. It just makes life easier and reduces the chances for human errors along the way," Woo told conference attendees.
There are also a number of systems that support HDFS, natively -- and Woo is a fan of those because "now all of a sudden you can have data that's accessed in more traditional NAS protocols but also with this additional component of HDFS -- which, again, means your data doesn't have to move."
A crucial step in big data projects is "minimizing data movements within your storage architecture," Woo said. "Ultimately, what we’re building toward is essentially not only a virtualized infrastructure but also virtualized data, where the data exists independent of the application and the infrastructure," he said.
In order to consolidate data, simplifying the network is a necessary step, and there are several possibilities to consider, Woo said. He offered this short list:
- Simplify the storage network(s): FCoE, NFS/SMB and HDFS
- Consider running RDBMS over NFS (or SMB3)
- Consider running VM over NFS/SMB
- Consider running VDI over NFS/SMB
"Right now, we have virtualized servers [and] virtualized OS, but at least for now, our data is still tied one-to-one to the storage -- and that's what we have to break," explained Woo.
"That's where concepts like object-based storage systems, HDFS, etc., which are all variants of the same thing, become very applicable. Object-based storage systems allow you to add metadata and metadata is key here. If I can associate any piece of data with any application or any context at any time, it gives rise to more opportunity to create more innovation and value."