This article can also be found in the Premium Editorial Download "Storage magazine: Slimmer storage: How data reduction systems work."
Download it now to read this article plus other related content.
Optimizing for data protection
It’s imperative to bring those optimizations as close to the production workload as possible. To enable this, VMware vStorage 5 provides Changed Block Tracking (CBT) that allows the hypervisor and storage to track which disk blocks have been written to since the last backup, eliminating much of the comparative or other I/O operations. Other hypervisors’ file systems provide similar
In addition, because the hypervisor and its storage have to send less data, the network, backup server and its storage are less hampered, which usually results in faster backups, and backup servers or appliances that protect more production servers.
Dedupe in a virtual world
Along with overall optimization, deduplication has particular ramifications for protecting virtual environments. Aside from where dedupe happens (source, backup server or storage), the “how” and “how wide” must also be considered.
At its simplest, some deduplication works only on iterations of each file being protected. For example, if a VM is made up of two virtual hard disk (VHD) files, then somewhere on a production hypervisor’s storage system are two VHD files. If one were to back up a Word document and then change only a portion, it might be acceptable to only keep those new block chunks of the .doc file. Some dedupe methods only apply that logic to VHDs, so each time the VHD is backed up only the new blocks are maintained. But the file is constantly changing, and, moreover, there are likely many blocks within the VHD that are part of many other VHDs, such as the blocks that make up the OS for each VM.
Thinking broader than per-VHD deduplication, other methods will retain only those unique components across VHDs but limit them to a hypervisor’s LUN or volume. So, if a hypervisor has four volumes, each with 10 VMs all running Windows, volume-centric deduplication would end up with four iterations of the Windows OS or other application binaries. That’s better than 40 unique VHDs, but not ideal.
Other dedupe scenarios might have the unique blocks per hypervisor (so one Windows OS instance across those 40 VMs) but not deduplicate among the multiple hypervisors that are likely being protected by the same backup server or appliance. This is most often caused by the design of the source-side deduplication when it doesn’t have any awareness of what else is protected beyond what it can see from its hypervisor-centric view.
The last deduplication consideration is file-/object-level deduplication across both physical and virtual servers. Most environments aren’t 100% virtualized, so some physical servers will remain. Moreover, the block-level logic used by some deduplication mechanisms may not identify files that reside on physical servers as matching those that reside within VHDs, or across VHDs when spread across a hypervisor farm with varying storage systems.
The call to action is to understand:
- What optimization methods are in use to reduce potential I/O impact on the protected VM, its neighboring VMs, the host and its storage?
- Which method(s) are in use to determine what can be deduped within VHDs, across VHDs, and across hosts and their storage?
You’ll also need to watch for the evolution of hypervisors, whose plumbing (such as VMware vStorage APIs for Data Protection or Microsoft Hyper-V VSS) enables most of the backup software and hardware products to achieve what they do for better backups in virtualized environments.
BIO: Jason Buffington is a senior analyst at Enterprise Strategy Group. He focuses primarily on data protection, Windows Server infrastructure, management and virtualization. He blogs at CentralizedBackup.com and tweets as @JBuff.
This was first published in October 2012