Data storage options: Exploring clustering, pooling, unified systemsDate: Sep 30, 2013
One of the biggest obstacles for storage administrators today is determining the best way to store large and growing volumes of data. To discuss several data storage options that can help with this, the Storage Media Group's Editorial Director Rich Castagna spoke to Phil Goodwin, principal architect in Cognizant’s IT Infrastructure Services Group. Watch the video or read the transcript below to hear Goodwin's take on data storage options, such as clustering network-attached storage (NAS), pooling capacity or using unified storage systems to deal with growing sets of data.
Clustering NAS systems can be an effective way to accommodate big volumes of data, but what are the drawbacks to clustered NAS?
Phil Goodwin: I'm actually a pretty big proponent of clustered NAS because a lot of the clients I talk to have the problem where they have sprawl. Maybe they have 14 or more NAS systems and clustering them can be one way to bring those all together and to reduce the management.
The bad news is that you now have a monolithic system. At least functionally it's a monolithic system, where you have all of your eggs in one basket. Yes, it's clustered. Yes, it has some failover. It has some other advantages to it. But, nevertheless, you still have to deal with this one massive environment that may have 16 PB of information. I talked to one client at Storage Decisions yesterday that had 13 PB in a single NAS environment. We're talking really big scale these days, and being able to manage that is one of the challenges IT organizations are facing.
Another alternative is to link servers and pool their internal capacity, like Google and Amazon do on a huge scale. Is that a good alternative, or is it too much of a science project for most companies?
Goodwin: Well, what you're really seeing in implementations on that scale are things like Hadoop where you're scaling these massive numbers of multiple systems together. The limitation of having the internal storage on each one of those systems is that you now have to have a metadata server that will manage all the metadata between those different internal systems to give you the cache coherency; to give you the ability of all the systems to see the data on all the other environments.
Ben Woo had a really nice session on it yesterday at Storage Decisions. His guidance -- and I agree with him -- is really to have it on more traditional storage, because it solves that metadata problem of that environment.
Now, to answer your original question: Is it a science project? The fact is, if you have petabytes and petabytes of information that you're trying to use on a big data environment, then it's a problem you have to deal with. So, it really isn't a science project. For the average IT organization, it probably is something of a science project, but if you're a really large-scale environment, then no, it's something you have to deal with.
Multiprotocol or unified storage systems combine file and block in a single box. Are there any shortcomings to those data storage options?
Goodwin: Yeah, I think the real use case for unified storage is when you have vendors A, B and C, and you're trying to consolidate them into a single environment to reduce the number of systems you have to manage and the number of software applications you have to manage in terms of storage applications and things like that. That's really the primary use case. However, once you do that, you've put one system really as the gate for everything else.
So, now you have a unified environment. That's a nice thing, but you now have the limitations of a single system that's controlling all of the data movement and all of the system access to the data through that one point. They're massively scalable, so you could overcome some of those issues, but we have consolidated it back into a single unit at that point.
About the expert:
Phil Goodwin is a senior manager and principal architect in Cognizant’s IT Infrastructure Services group, where he assists clients in the development of adaptive storage architectures, storage management best practices, backup and recovery, disaster recovery and data archiving.