By Eric Siebert
Designing and implementing a disaster recovery (DR) infrastructure is often complicated, expensive and challenging. Virtualization technologies -- for both storage and servers -- can help reduce the expense with unique approaches that differ from traditional DR methods and can provide increased flexibility and responsiveness. Server virtualization encapsulates an entire server into a single file, which makes transporting it to other locations much easier. Storage virtualization presents multiple storage devices as a single storage resource, which helps hide some of the back-end complexities of the storage devices and network. Either of these virtualization technologies will ease the implementation of a DR plan; used together, they can provide a very effective DR strategy.
For most companies, the type of DR environment they devise is typically determined by balancing the amount of money they have to spend on one-time and ongoing costs, with the required recovery time to ensure that any downtime is limited and doesn't significantly impact their business. Traditional DR scenarios usually called for maintaining a lot of physical servers at an offsite location and then using tape backups/restores or storage replication to transfer data between sites. With virtualization, there are more options for DR and the hardware requirements for the recovery site are greatly reduced. Even if your production data center hasn't been virtualized, you can still leverage virtualization at your remote location and convert your physical servers into virtual machines (VMs).
A variety of virtualization approaches
We'll look at some of the methods involving both server and storage virtualization that can be used as a foundation for a DR strategy. Our focus is on products and processes related to VMware Inc.'s vSphere, but many are very similar for other hypervisors like Citrix Systems Inc.'s XenServer and Microsoft Corp.'s Hyper-V. Depending on the virtualization methods used, recovery times can vary from seconds to hours to days and, accordingly, the cost and infrastructure to implement these methods will also vary. The approach you choose may be determined by whether you want a cold, warm or hot recovery site. Cold sites have no network connectivity with the main site, and limited or no hardware. Warm sites have network connectivity, and server and storage hardware, but typically lack real-time synchronization. Hot sites are almost mirror copies of critical production systems and use real-time synchronization for minimal disruption of services. The cost and recovery times differ greatly from cold to hot, but all of these types of sites can benefit greatly from using virtualization technology in their design and implementation.
|The importance of proper quiescing|
Whatever data protection method you use -- storage replication, backups or virtual machine (VM) replication -- it's very important that you properly quiesce your VMs to ensure data integrity. Quiescing is the process of pausing the operating system and applications, and forcing all pending data in memory to be written to disk. There are two ways to quiesce. The first way is done at the operating system level and it tells the OS to write all pending data in memory to disk; however, because the OS isn't aware of what applications are doing, this could cause corrupt or incomplete application data. The second method is done at the application level where applications like Microsoft Exchange or SQL Server are notified so that they can complete any pending transactions before writing the pending data to disk. The latter is called "application consistent" quiescing, and it ensures that all application data is properly backed up without any loss of data.
Without any quiescing, a VM is considered to be in a "crash consistent" state, meaning the backup that's made is of a VM that's been powered off with any held-in memory data not accounted for. Microsoft Corp. VM's have a special Volume Shadow Copy Service (VSS) driver built into the OS to quiesce the operating system, but it often won't provide application quiescing. To achieve application consistent backups, you may have to install a special driver inside the guest OS. When choosing any backup or replication application make sure it includes the proper quiescing for the critical application data you're trying to protect.
Virtual machine replication
Virtual machine replication works at the server virtualization layer and relies on replication software that can copy all changes made to a virtual machine disk file (VMDK) to another host. It requires a warm or hot DR site with dedicated network connectivity linking the production and recovery sites. A snapshot is taken of the VM at the virtualization layer, which deflects writes to the virtual disk to a separate delta file. The virtual disk is then mounted by the replication software and any updates since the last replication cycle are copied to another identical virtual disk on a virtual host at the disaster recovery site. VMware vSphere's new vStorage APIs enhance this process because of the new Changed Block Tracking (CBT) feature. CBT provides much quicker incremental backups and replications because the VMkernel tracks which disk blocks have changed since the last replication. This allows shorter intervals between replication operations, resulting in nearly continuous data protection (CDP). A big advantage of this method is that any type of storage can be used on both the source and target virtual hosts. When it's necessary to cut over to the DR site due to an outage at the main site, you can power on the replicated VM at the DR site and begin using it; changed blocks are then tracked on the remote site VM so they can be replicated to the main site for failback. Applications that support this method include:
- Double-Take Software Inc.'s Double-Take Availability can replicate both physical servers and VMs to a virtual host at a DR site. Replication can occur either inside the guest OS or at the virtual host level.
- PHD Virtual Technologies' esXpress combines disk-to-disk backup with replication; it can do a simple full-VM copy to another site or incremental block-level updates.
- Veeam Software's Veeam Backup & Replication combines disk-to-disk (D2D) backup and replication in one product. It has built-in data deduplication and uses CBT to achieve near CDP; changed blocks are injected into the target VMDK during each replication cycle.
- VizionCore Inc.'s vReplicator is a dedicated replication product for virtual machines (vRanger Pro is their backup product). It supports CBT and Active Block Mapping (ABM) to detect white space in a VM so it can be ignored.
Storage replication works at the storage subsystem layer and is mostly transparent to virtual hosts and VMs. This approach relies on storage hardware or software that can do synchronous or asynchronous replication from one storage device to another. Because it happens at the storage layer, the virtualization layer is unaware of the process and all virtual machine data is copied to the disaster recovery site where it will be ready to be used by the virtual hosts if needed. Storage replication requires significant network bandwidth between the main site and recovery site because of the large amounts of data that must be transferred quickly. Many vendors employ technologies such as data deduplication and compression to reduce the amount of data sent over the network. Storage replication is commonly used to achieve near-CDP or CDP to allow for very fast recoveries. VMware's vCenter Site Recovery Manager (SRM) product was designed to work with this method; it relies on storage replication to copy data between the two sites and SRM handles the cutover to the DR site by bringing up the virtual machines at the DR site in case of a disruption at the main site (see "About VMware vCenter Site Recovery Manager," see below). Most storage arrays either have replication built-in or available as a software add-on; a sampling of products that support this method include the following:
- EMC Corp. has a wide variety of products that support replication, including its entry-level Celerra Replicator and MirrorView products, and higher-end RecoverPoint (journal-based) and Symmetrix Remote Data Facility (SRDF) products.
- FalconStor Software offers Network Storage Server (NSS), a storage virtualization product that supports replication, as well as Continuous Data Protector, a high-end CDP product.
- Hewlett-Packard (HP) Co. builds replication into its StorageWorks EVA and XP disk arrays, and offers add-on products such as Business Copy, Cluster Extension and Continuous Access software for both the EVA and XP product lines.
- Hitachi Data Systems has both a journal-based replication product called Universal Replicator and a high-end CDP product, TrueCopy Remote Replication.
- NetApp Inc. provides an affordable replication option with MetroCluster, and SnapMirror is the high-end flagship replication product.
|About VMware vCenter Site Recovery Manager|
VMware Inc. developed its vCenter Site Recovery Manager (SRM) product to help automate and simplify the recovery process to a disaster recovery (DR) site. SRM by itself isn't a complete solution for disaster recovery and relies on a supported third-party array replication application to handle the replication of virtual machine (VM) data to a DR site. To certify that storage arrays are supported and integrated with vCenter SRM, VMware works with many storage vendors, including 3PAR, Compellent Technologies, Dell Inc., EMC Inc., FalconStor Software, Hewlett-Packard Co., Hitachi Data Systems, IBM Corp., NetApp Inc., Sun Microsystems (now owned by Oracle Corp.) and Xiotech Corp. vCenter SRM allows you to create recovery plans using vCenter Server, extend recovery plans with custom scripts, perform nondisruptive testing, automate execution of recovery plans with a single command and reconfigure VM networking at the DR site. vCenter SRM provides a nice front-end application that both integrates storage replication with virtualization and automates DR failover in VMware environments.
Disk and tape backups
While tape backups are used less frequently today for disaster recovery, they're still useful for storing data offsite in a secure location. The most effective way to back up a VM is to back up the single large virtual disk file (image level) at the virtualization layer, rather than the traditional method of using an agent inside the guest operating system (file level). Image-level backups are very useful for disaster recovery as they provide a bare-metal restore capability for virtual machines. Instead of having to restore physical servers one by one, you can restore them all to a single virtual host. While using tape for DR is slower than other alternatives, it's still a low-cost way to restore multiple virtual machines. A disk-to-disk recovery is much faster than tape, and is very similar to VM replication as a virtual machine's virtual disk is mounted and then copied to another disk storage device. But, unlike replication, this approach is usually run on a scheduled basis and can be done incrementally or as a full backup. The disk target that's used can then be backed up to tape or copied to a DR site and used to quickly restore virtual machines as needed. Some apps that support this method include:
- EMC's Avamar Virtual Edition for VMware supports backups of both physical servers and virtual machines by operating at the guest OS or VM layer, and can also globally dedupe backup data. It can also do physical-to-virtual (P2V) and virtual-to-physical (V2P) recovery for maximum flexibility.
- PHD Virtual Technologies' esXpress also does both backup and replication, providing data protection and business continuity.
- Symantec Corp.'s NetBackup has very good virtualization integration and supports both disk-to-disk and disk-to-tape backups. It supports both physical and virtual servers, and can perform both image- and file-level virtual machine backups.
- Veeam Backup & Replication provides disk-to-disk backup and takes advantage of many of the new features in vSphere.
- VizionCore vRanger Pro is VizionCore's dedicated disk-to-disk backup product for virtual machines, and it supports many of the same features as their vReplicator app.
- VMware Data Recovery is included with some vSphere editions. While not as feature-rich as other products, it does provide dedupe as well as good integration with vSphere.
Simple and built-in methods
There are some very low-cost and simple alternatives for virtual DR, as well as some built-in tools in vSphere. At the most basic level, you can use scripts to take a snapshot of a VM's disk to deflect writes to it and then copy the data using FTP/SCP to another disk target such as a CIFS or NFS share. The disk target could be as basic as a removable hard disk that can be transported off-site or a device at a DR site that you copy to over a network connection. Once the virtual disk files are at the DR site, you load them on a virtual host and you'll be up and running. VMware vCenter Converter is another tool that can be used to copy a physical server or a virtual machine to either a disk target or a virtual host; it's not very sophisticated, but it can be scripted and scheduled to make copies of servers. vSphere has some built-in high-availability (HA) and fault-tolerance technology, as well as VMware VMotion. Those features currently all require a local-area network (LAN) and aren't suitable for long-distance wide-area network (WAN) use. VMware has announced its intention to enhance the features to function over slower WAN networks.
|DR apps for Microsoft Hyper-V and Citrix Systems XenServer|
Many of the same disaster recovery (DR) principles for VMware implementation also apply to Microsoft Corp.'s Hyper-V and Citrix Systems Inc.'s XenServer. There are also some applications designed specifically for Hyper-V and XenServer that can be used to implement a DR solution for those environments.
Watch out for virtualizations gotchas
Using virtualization technology as part of your DR plan has some great benefits, but there are also related challenges and costs. It's often assumed that server virtualization will save lots of money on server hardware. Lower operational costs will save money in the long run, but you'll have some additional up-front costs in addition to new physical servers. For example, using two or three physical servers with virtualization at your DR site in place of eight to 10 physical servers at your main site will obviously reduce hardware costs. But you'll have to consider the cost of virtualization software, management and data protection applications.
If you're already using virtualization at your main site, using it at your disaster recovery site is an easy decision. If not, expect a learning curve in understanding how to properly implement, configure and manage it. Also, virtual machines usually require management and backup applications designed specifically for virtualization that may not work with physical servers. So, you might need separate tools for virtual and physical environments, which increases costs and management complexity. Some apps, like Microsoft System Center, can manage both environments via a single interface; similarly, Symantec's NetBackup can back up both environments.
There are some clear advantages to using server virtualization at a DR site. Disaster recovery rack space is often expensive and with fewer racks your ongoing costs will be lower. Fewer physical servers also mean fewer network port requirements and less gear to maintain. You can also replicate VMs running on hosts with shared storage at your main site to hosts with direct-attached storage (DAS) at your DR site, which can result in more savings. Server virtualization allows physical hardware independence, so you can use any type of server hardware at your DR site without having to worry about operating system and application compatibility.
Virtual DR options
There are many options that you can choose from when using virtualization. The route you decide upon will likely be dictated by the amount of bandwidth available between your main and DR sites. The benefits of using virtualization as part of your disaster recovery setup include:
- Fewer physical servers needed at a DR site reduces one-time and ongoing costs, and results in less idle hardware
- Lower-cost VM-level replication is storage independent and doesn't require expensive storage arrays
- Hardware independence allows for more hardware options without compatibility issues
- Encapsulation turns a VM into a single portable file for easier transport and deployment
- Snapshots provide an effective method for backup of virtual machines
- Automated failover and easier testing
- Easier server deployment; scripting can be used to help automate many configuration and operational tasks
Virtualization can provide some clear advantages for disaster recovery; help save money, time and effort; and make the often daunting task of designing and implementing a DR plan easier.
BIO: Eric Siebert is an IT industry veteran with more than 25 years of experience who now focuses on server administration and virtualization. He's the author of VMware VI3 Implementation and Administration (Prentice Hall, 2009).