Data deduplication is used mainly for data backups today, but Valdis Filks, a research director for storage technologies and strategies at Gartner Inc., predicts that primary storage deduplication will have a much greater prominence in the next few years.
You must have Adobe Flash Player 7 or above to view this content.See http://www.adobe.com/products/flashplayer to download now.
Download for later:
Primary storage deduplication podcast
• Internet Explorer: Right Click > Save Target As
• Firefox: Right Click > Save Link As
To what degree are end users now doing primary storage data deduplication?
Valdis Filks: There's actually very few people who use [primary storage data deduplication] today. However, there are some companies that have had deduplication for primary storage for a long time, primarily [using] NetApp with its appliance products. However, there will be many more coming quite soon, and there's a lot of development in the area and promises and announcements in that direction.
In what ways do you think the landscape for primary storage data deduplication will change during the coming year?
More on primary storage data reduction
Filks: Primary [storage data] deduplication will predominantly start in the NAS [network-attached storage] space for NAS appliances and then move into block devices. So, we will see more vendors, especially NAS vendors or companies and vendors who have products in the NAS space, rolling out primary deduplication on those devices first. There will be quite a few in the next six to 12 months. Probably within 12 to 18 months some of the block vendors will come up with primary deduplication as well. It will really become ubiquitous and available in many areas of the stack, and some companies already have deduplication in the file systems.
It will exist in some of the servers, some of the server operating systems, their file systems. It will exist in some of the hypervisors and those file systems used within those. Also, the NAS devices, as I mentioned earlier, will be the first ones to have these features, and then later on, they will come out in the block Fibre Channel, FCoE, iSCSI type devices. When that happens, obviously, the price of using and purchasing primary deduplication or any deduplication functions will drop, and it will probably shake up the deduplication market a bit.
What's the primary piece of advice you would offer organizations considering primary storage data deduplication?
Filks: My primary piece of advice is, it's mature. Deduplication algorithms and technologies are pretty mature. Obviously, test the appliances first. Some, like the NetApp products, have had them for a long time. With all new products, do a proof of concept first.
The problem with deduplication is that every company's data is different, and we're not really sure until we store data on those devices what the deduplication ratios and savings will be. Because dedupe will happen at source, or on primary storage, this may cause problems with existing dedupe backup and restore and archiving products. So I recommend that people really try and architect their backup and restore solutions to take into account primary [storage] deduplication. They may have to change their deduplication systems using the back end. And systems, similar to nature, hate complexity. If data is deduped at source, we will probably not want to undedupe it and dedupe it just for the backup systems.
Most people say that the world gets turned upside down by many technologies. That often isn't quite true. However, with primary [storage data] deduplication, the analogy is quite fitting because deduplication will get turned upside down from today being used primarily in the backup space, and in the future, it will be used primarily in the primary space.
This was first published in December 2009