The terms compression and deduplication are often used interchangeably, but they are different data reduction methods...
that are very similar to one another.
Compression has been around for decades in one form or another. One form of compression that has been around since the 1980s is zip files, which combines multiple files into a single file and removes redundant copies of text strings and binary data. When the contents of a zip file are extracted, the data is rehydrated by a process that inserts the data that has been removed.
Compression can also be used in other ways. Media files such as JPEGs and MPEGs are natively compressed file formats that are designed to consume as little space as possible.
Like compression, deduplication exists in many different forms. Many of the deduplication offerings in use today remove redundancy at the storage block level.
Virtual machines and data reduction methods
When it comes to data reduction methods for virtual servers, both deduplication and compression have their place. Compression usually works at the file level, while deduplication tends to work at the block level.
Compression is probably best suited for use on file servers that contain seldom-accessed data, such as archive data. Because of the way compression works, compressing everything usually isn't an option.
Another common use for compression in a virtualized environment is NTFS file system compression, which some administrators use to reduce the data footprint on the underlying physical storage volume. But NTFS compression is a legacy feature that is beginning to fall out of vogue because it consumes CPU cycles. As such, compression is a poor choice for use on virtual machines (VMs) that run CPU-intensive workloads. More important, some Windows Server features such as Continuous Availability are not compatible with NTFS compression.
Deduplication can be implemented at the storage level if the storage hardware supports native deduplication, and it can work from outside of the VM. The nice thing about this type of deduplication is that it can help eliminate the redundancy that exists across VMs. For example, VMs that run the same operating system have identical system files. Deduplication can help to remove this redundancy, reducing the amount of physical storage required by the VMs. Deduplication is currently one of the major data reduction methods of choice in virtualized environments, with volume-level compression being used with less frequency.
How the various data reduction technologies stack up
How to incorporate data reduction methods into your VDI environment
Alternatives to compression and deduplication for data reduction