Primary storage deduplication and compression may not be widespread today, but industry analysts expect the data reduction techniques to become more pervasive as growing numbers of vendors roll out products in the coming years.
Explosive data storage growth could ultimately drive more IT shops to start listing deduplication and compression in their requests for proposals (RFPs) for primary storage as they cope with mandates to do more with less, according to Tony Asaro, a senior analyst and founder at The INI Group LLC.
In this podcast interview, Asaro discusses the challenges associated with data reduction in primary storage, and offers up advice to IT shops preparing for a future that will inevitably include data deduplication and compression in their primary storage systems.
You can read the transcript below or listen to/download the MP3 file.
Download for later:
Primary storage deduplication and compression
• Internet Explorer: Right Click > Save Target As
• Firefox: Right Click > Save Link As
SearchStorage.com: Deduplication gained a foothold in the backup space. What are the main challenges associated with deploying primary storage deduplication?
Asaro: There aren't a lot of solutions out there in primary storage today, and now vendors have to make sure they don't overwhelm their systems and add more latency. Data deduplication does require CPU processing cycles. It requires memory, so there's a cost and performance issue with it. They have to be very careful about how they implement and architect dedupe solutions, and they also don't want to get in the business of rehydrating data. When they're taking a look at how it's going to impact performance -- and performance is always an issue with primary storage -- they want to make sure they implement an architecture that's cost effective and doesn't interject any latency into the process.
SearchStorage.com: NetApp paved the way for deduplication in primary storage by promoting it, in particular, for use with VMware virtual machine disk files. For what other types of data is primary storage dedupe especially helpful?
Asaro: It's all relative. The VMware story is a great one because there's a lot of duplication in that. But there's a lot of duplication in databases. There's a lot of duplication in files as well. I think that in people's minds they have backup dedupe and say 'Oh, we can get 20:1, 30:1.' But even if you get 2:1 or 4:1 in a primary dedupe environment, that's a lot of savings. If you have 100 TB of data and can store that on 50 TB or 25 TB of disk, that has a lot of impact. Across the board, in database applications, in certain type of file types, you'll get a great return on it, but it's all what your expectations are. If you expect 20:1 in a non-VMware environment, you're probably not going to get that. But even 2:1, 4:1, 5:1 is a great return on your investment.
SearchStorage.com: Why haven't deduplication and compression for primary storage caught on in a bigger way?
Asaro: No. 1, people haven't implemented it. And when vendors were taking a look at it, they didn't necessarily have all the core competencies to do the dedupe. They have to go out and research it and understand it, and the market wasn't demanding it in primary storage. It's not until the competition starts throwing it at them that they say 'Well, wait a minute, we're losing deals now. We have to go out and address this.'
You have NetApp with it. BlueArc and Xiotech just announced it. Compellent has announced some stuff there, and you're going to see more and more people announcing dedupe [for primary storage]. Dell went out and bought Ocarina. There's a lot of attention beginning to come to the surface, and other storage vendors are now trying to address that and say 'OK. Now we're going to have to take a defensive mode and do something along these lines.'
But it doesn't come cheap; there's a price associated with it. You're going to have to have memory. You're going to have to have CPU. You have to have a strategy of how you're going to address those things, and it isn't easy to architect it. It isn't just plopping something in. It has to require some design, thought and effort to do it in a way that isn't going to impact performance or break the bank.
SearchStorage.com: In what ways do you foresee primary storage deduplication and compression technology changing in the next few years?
Asaro: We've begun to see more and more announcements. You begin to see acquisitions occurring. I think you're going to see more and more storage vendors stepping up. And I think you'll see in 2011 more products coming out midyear and then after that, more and more products coming out. In the year 2012, you're going to see it widely deployed and supported by a number of vendors out there and, as a result, you're going to see a lot more education. You're going to be seeing end users putting that on their RFPs, and I believe that it's going to become much more widespread than it is today.
SearchStorage.com: Do you foresee the day where the majority of end users will be using dedupe and/or compression for primary storage?
Asaro: It's an inevitable thing. You talk to IT professionals, and they're being mandated with doing more with less. Data compression is a tried and true technology; it's just a matter of implementing it into your environment. If you can get 2:1, 4:1 lossless data compression, it kind of is a no-brainer, depending on the implementation of it, of course. And then dedupe on top of it -- these are complementary technologies.
If you're going to be living in a world where it's going to be normal for people to have hundreds of terabytes and even petabytes of capacities, and 10 years from now hundreds of petabytes and potentially exabytes, then I do think it's an inevitability.
SearchStorage.com: What advice do you have for end users right now?
Asaro: We're still on the threshold. There are only a handful of solutions that are even being talked about today and fewer that are actually in the marketplace. It's important for end users to ask their vendors what their strategies are in this area. Is it going to impact performance? How much is it going to impact costs? You're going to save me costs on the back end, but what am I going to invest on the front end? What kind of modes? Is it real-time? Is it a hybrid process, a post-process? Am I going to be able to support it across multiple volumes? What's my limit in terms of how much within the storage system you can support this with? What kind of applications is it better with? What are my best practices? Is it going to impact certain applications and not other applications? I would ask a series of questions as to what the actual details are behind the implementation.