This article can also be found in the Premium Editorial Download "Storage magazine: Primary storage dishes up dedupe."
Download it now to read this article plus other related content.
Depending on the type of data, a sub-file-level deduplication system can reduce the size of data quite a bit. The most dramatic results using this technique are achieved with virtual system images, and especially virtual desktop images. It's not uncommon to achieve reductions of 75% to 90% in such environments. In other environments, the amount of reduction will be based on the degree to which users create duplicates of their own data. Some users, for example, save multiple versions of their files on their home directories. They get to a "good point" and save the file, and then save it a second time with a new name. This way, they know that no matter what they do, they can always revert to the previous version. But this practice can result in many versions of an individual file -- and users rarely go back and remove older file versions. In addition, many users download the same file as their coworkers and store it on their home directory. These activities are why sub-file-level deduplication works even within a typical user home directory.
The advantage of sub-file-level deduplication is that it will find duplicate patterns all over the place, no matter how the data has been saved. The disadvantage of this approach is that it works at the macro level as opposed to compression that works at the micro level. It might identify a redundant segment of 8 KB of data, for example, but a good compression algorithm might reduce the size of that segment to 4 KB. That's why some
|Is archiving data reduction?|
Some vendors consider archiving and hierarchical storage management (HSM) to be data reduction technologies. Both archiving and HSM systems can reduce the amount of disk you need to store your primary data, but they do so by moving data from one storage system to another. While they may save you money, they're not truly reducing the size of the data -- they're just moving it to less-expensive storage. Therefore, while these are good technologies that companies with a lot of data should explore, it's not data reduction per se.
This was first published in April 2010