In the second video in a series of five, Microsoft MVP Brien Posey discusses how Windows Server 2012 data deduplication features work and shows users which tools Microsoft has included to help with estimating cost and space savings. Read some of Posey's remarks below or view his presentation above to learn more.
Data deduplication is also great for virtual machines because you can shrink the footprint of your virtual machines. In virtual data centers, you often have a single volume that contains multiple virtual machines. That means a lot of the virtual machines are potentially going to be running the same operating systems -- or maybe even some of the same applications.
View the rest of Brien Posey's WS 2012 tip series:
Video tip 1: Native iSCSI Target Software
Video tip 3: Resilient File System (ReFS)
Video tip 4: Windows Storage SpacesVideo tip 5: Offloaded Data Transfer (ODX)
If you can perform volume-level data deduplication on a volume that is running a lot of these virtual machines, then you can greatly reduce the amount of space that those virtual machines take up.
Of course, file system data deduplication isn't just for virtual environments. It can be used on physical servers as well. In either case, the use of data deduplication might make use of solid-state storage more practical. If you use data deduplication, and you're able to shrink the footprint of your data, you might be able to get away with using solid-state storage without breaking the bank in the process.
The Windows Server 2012 data deduplication mechanism is what's known as "post process." This means that data deduplication doesn't happen in real time as files are being written to the storage medium. Instead, files are written in an uncompressed form, and then there's a scheduled process that comes along later and does the actual data deduplication.
The most important thing you need to know about post-process data deduplication is that even though the goal is to save some storage space, you could actually consume more storage space -- at least temporarily. That's because there is some space that's needed for performing the data deduplication process. You're going to have an uncompressed copy of your data right alongside a compressed copy, at least temporarily while the data deduplication is taking place.
Of course, once the data deduplication completes, the uncompressed data can be removed, and you get all that storage space back.
Probably the biggest question that gets asked with regard to data deduplication is that of reliability. When you're applying deduplication, you're pulling out chunks of data and throwing that data away. What's to say that you're going to be able to maintain file integrity?
Windows actually takes several different steps to preserve integrity and make sure the volume doesn't crash on you. For one thing, Windows automatically creates duplicate copies of any metadata. Also, some chunks are more popular than others. Chunks that have been referenced more than 100 times are automatically duplicated, so if something happens to that chunk of data, you've got a redundant copy that you can fall back on.
Microsoft provides a tool that can help you to determine what types of space savings you're actually going to see. When you run that tool, you can run that against a volume, and it will actually look at the data on the volume and come back with information about the amount of space you could save if you data-deduplicated that volume. That's a great tool to use, and I would definitely suggest using that before you implement data deduplication.
About the presenter:
Brien Posey is a regular SearchStorage.com contributor and a Microsoft MVP with two decades of IT experience. Before becoming a freelance technical writer, Brien was CIO for a national chain of hospitals and health care facilities. He also served as a network administrator for some of the nation's largest insurance companies and for the Department of Defense at Fort Knox.