This article can also be found in the Premium Editorial Download "Storage magazine: Exploring systems that detect and repair hard disk problems automatically."
Download it now to read this article plus other related content.
One of the big benefits of employing data deduplication for secondary disk targets is its ability to reduce the size of data sets, enabling replication with lower bandwidth requirements.
By Lauren Whitehouse
If you're a storage pro, you should be familiar with the phrase "time to protection." This is the time required to complete all of the activities that must occur between the initiation of a backup and the arrival of the backup copies at an offsite location for disaster recovery (DR) purposes. For tape-based DR schemes, this includes the time it takes to execute the backup, prepare offsite tape copies and transport them to a remote location.
For disk-based DR strategies, this would be the time it takes to back up to disk and move the data offsite via replication, which can vary depending on the amount of data to transfer and the available bandwidth. One of the big benefits of employing data deduplication for secondary disk targets is its ability to reduce the size of data sets and enable replication with lower bandwidth requirements. This makes automated data electronic vaulting less time-consuming and less costly.
So we know dedupe helps, but does it also hinder? The added process of identifying and eliminating redundant data could affect performance between initiation of a backup and initiation of replication. Deduplicating during the backup process (inline, before data is written to disk) could impact backup performance, while deduplicating after the backup process is complete (post-process) could delay replication.
The path to DR readiness
When it comes to recovery, there are two points in the data path to focus on: the point of local protection, which is when a copy of production data is onsite for operational recovery; and time to protection, which is the point at which you have a copy offsite for DR.
Systems with inline dedupe capabilities -- such as those from Data Domain Inc., Hewlett-Packard (HP) Co. (with its StorageWorks D2D Backup Systems), IBM Corp. (Diligent) and NEC Corp. -- promote the efficiency of enabling replication initiation as soon as data "hits" the disk, allowing for fast time to protection. Post-process approaches take a different point of view. Vendors, including ExaGrid Systems Inc., FalconStor Software Inc., HP (with its Virtual Library System) and Sepaton Inc., would maintain it's more important for backup to disk to complete at wire speed and that initiating dedupe outside the backup window guarantees better backup service-level agreements (SLAs). Replication initiation varies here -- some vendors begin within a few minutes, while others have a longer lag time.
EMC Corp. and Quantum Corp. fall into both the inline and post-process camps because their products let an admin decide when dedupe occurs. By offering choice, policies can be set for specific backup workloads. And flexibility is good because there's a place for each approach. For example, if you have workloads where you expect a lot of redundant data, then inline dedupe may be preferred. If the workload has a lot of new data or if the backup window is small, then a post-process approach may be better.
Another dimension to consider is time to recovery. Once the data has been duplicated at a second site, how much time is required to restore data from the deduplicated DR copy? How quickly can data be read and reconstituted to an application-usable state? Some vendors keep a non-deduplicated full backup image just for this scenario. This approach will aid in providing more rapid recovery, but will use additional storage capacity.
Can the process be accelerated?
For Symantec Corp. Veritas NetBackup 6.5 customers, the Symantec OpenStorage (OST) option can help. Veritas NetBackup OST, when used in conjunction with an OST-enabled dedupe storage system (Data Domain, FalconStor and Quantum are currently certified), eliminates many of the challenges associated with the creation and management of duplicate backup images, the transportation of backup copies to an alternate site and the centralized creation of tape-based copies for long-term retention. In this case, Veritas NetBackup maintains knowledge and control of backups written to the OST-interface disk storage units of vendors' devices. Its "optimized duplication" technology improves performance for creation of replicas stored at the secondary site. For example, Data Domain, the first vendor with a certified OST interface, has been able to demonstrate replication performance improvements of 75% or more in OST environments.
The business benefits of storage capacity optimization via data dedupe are well-regarded. But dedupe can also enable significant efficiency when it comes to disaster recovery. When making an investment decision in dedupe, in addition to evaluating products based on local dedupe processing and operational recovery on the premises, it makes sense to investigate the product's viability to provide DR readiness offsite.
BIO: Lauren Whitehouse is an analyst focusing on backup and recovery software and replication solutions at Enterprise Strategy Group, Milford, Mass.
This was first published in June 2009