Leveraging storage replication for VM disaster recovery
By Chris Wolf
What you will learn: When you're sizing up storage solutions for DR in a virtual environment, you should consider the issues of vendor support; storage architecture; replication options; deduplication; and recovery options.
Storage replication is a popular method for synchronizing production and disaster recovery (DR) sites in virtual server environments. If you're either using array-based replication or leveraging a storage virtualization appliance for replication, there are several variables that will influence the efficiency of your storage topology as it relates to DR.
When you're sizing up storage solutions for DR, you should consider five issues:
- Vendor support
- Storage architecture
- Replication options
- Deduplication or single instance storage support
- Recovery options
Of course, there are several ways to get data from a production site to a DR site. Rather than simply give a high-level overview of these alternative virtual machine (VM) replication methods, this article will take a deeper look at specific storage array considerations. However, when it comes to architecting replication for virtual environments, this article can only scratch the surface. Many storage and DR optimization tricks are vendor-specific. Be sure to check your storage and server virtualization vendors' documentation and architecture guides for details relevant to your particular environment.
Let's set a baseline by assuming the high-level storage replication architecture shown below. Note: The network storage could be network attached storage (NAS) or either a Fibre Channel (FC) or iSCSI storage array.
All major network storage vendors offer tools for replicating data on an array from one site to another. Most of them use asynchronous replication for site-to-site network storage synchronization, since the WAN network throughput or distance between sites is usually inadequate for synchronous replication. With asynchronous replication, writes are committed to primary storage, then replicated based on the replication policy set by the storage administrators.
Although most storage array vendors offer some form of asynchronous replication, the choice of array vendor nevertheless usually matters. When evaluating storage options, vendor support is a key criteria. A storage array should be supported on products from your environments virtualization vendor and OS vendor. Support should also be considered for enterprise application vendors that name supported storage platforms. Storage platforms that leave a portion of your infrastructure unsupported constitute a risk.
You should also look at your backup vendor's list of supported storage platforms. Many enterprise backup products are capable of managing snapshots on most popular network storage platforms. A storage platform that integrates with your existing data protection software should be given more consideration than one that that does not.
The way in which storage is architected to support virtualization can have a dramatic effect on replication performance, and thus DR response. Fault-tolerant capabilities via RAID support are required, as any storage array should be deployed as RAID level 5 at a minimum.
In terms of DR response, you need to look at how each VM's virtual disk storage is allocated, as well as how temporary file locations are configured in each VM's guest operating system. When a storage array is configured to support virtualization, you should set aside a volume set for transient or temporary data. How you deal with transient data should be determined by the service level requirements of the VMs you support. For VM data that is synchronously mirrored over dark fiber between two locations, certain application- or service-centric temporary files may be critical and will need to be replicated too. However, for VMs that are asynchronously replicated to a DR site, in most cases replicating temp files would be a waste of bandwidth and storage space.
Getting back to the storage configuration details, assume you've set aside enough volume space (e.g. storage LUN, NFS mount, etc.) for your virtual infrastructure's temporary data. Once the storage for transient data has been allocated, you should configure the virtual infrastructure so that the following files are stored on the transient data volumes:
- Hypervisor swap files;
- Virtual machine guest OS:
- Swap file
- OS and application temp folders
- User temp directories
For individual VMs, you'll need to create a separate virtual hard disk just for transient data, which in turn would be stored on the "transient" volume space of your network storage device. While this may seem like a lot of work, it can result in substantial savings in storage requirements for your DR site, since you won't have to replicate any of the transient data to your DR facility. A VM's pagefile will generally require a high degree of storage I/O, so you may want to use a dedicated virtual hard disk just for the VM's paging file or swap file to gain better control of pagefile quality of service (QoS).
Each application's service level requirements should drive the replication requirements of any storage platform. Platforms that offer synchronous and asynchronous replication features, along with block level incremental replication and granular snapshot features, are more likely to meet all of your storage replication requirements. The bottom line should always be the storage solution's ability to leverage replication in order to meet your recovery time objectives (RTOs) and recovery point objectives (RPOs).
Deduplication or single instance storage support
A high number of VMs with identical OSes, applications or services will often reside on the same storage array. Storage nodes with built-in data deduplication or single instance storage support will offer significant storage savings by eliminating data redundancy on storage blocks. Note: To realize these storage savings, the storage array should also support thin provisioning. Otherwise a virtual hard disk file (for example, a .vmdk file on a VMFS volume) would consume all of its allocated space at the time it is provisioned. Thin provisioning would allow the virtual hard disk to consume its assigned storage as the virtual hard disk grows in size. With ESX server, thin provisioning is supported by thin formatting VMDKs.
One of the key benefits to deduplicated storage is that the amount of data to be replicated to the DR site will be significantly reduced, by as much as 60%. You could optimize WAN throughput with a WAN accelerator device, but this won't reduce storage costs. Deduplicated storage will not only reduce the WAN bandwidth needed to replicate storage but will also reduce the total amount of storage needed for a given virtual infrastructure. By reducing the amount of storage you need to replicate, you'll also be able to replicate storage more frequently and thus reduce your RTO.
Many storage arrays only provide volume-level recovery for virtual machines. While volume-level recovery is usually what you need for DR, you should look at storage platforms that offer granular file level recovery for files residing in virtual hard disks. Platforms that offer you the ability to recover previous volumes or previous versions of single files from snapshots allow you to leverage the storage solution for both DR and day-to-day file recovery operations. Such solutions would save on the required storage space for data protection operations, as most file-level backups would be unnecessary since file recovery could come from previous volume-level snapshots.
About the author: Chris Wolf is a senior analyst for Burton Group and author of several IT
books. Check out a chapter
on backup from Wolf's book, Virtualization:
From the Desktop to the Enterprise.
20 Feb 2008
Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.