1. Disk-based backups
While data deduplication usually leverages disk for storage, it should not be confused with data mirroring or snapshot technologies. In most cases, data is written to disk using backup software and must be written back (restored) to a host in its native format before it can be accessed again. Although data deduplication vendors remind us that disk is faster than tape, backing up to disk is not data mirroring. In other words, if an application can tolerate little to no downtime, data deduplication is not the best choice as a primary data protection target.
2. Replication is a must
Unless deduplicated data is also replicated offsite, it only offers limited disaster recovery capability. Some organizations choose to implement deduplication onsite for backup data but still use tape for offsite storage and disaster recovery. In many cases, data is no longer deduplicated once copied to tape. This will eventually be addressed when all backup applications are dedupe-aware or -capable. In the meantime, using tapes for offsite storage will undo the benefits of data reduction and disk-based backups, which brings recoverability back to the same level as traditional tape backups.
One of the advantages of data deduplication is the ability to replicate a reduced data set to a remote location without the same network bandwidth requirements as conventional replication. However, even with this reduced bandwidth requirement, the initial replication is still likely to take a significant amount of time or bandwidth since data reduction gains are usually not immediate and typically improve over time following multiple backups. In some cases, the first replication pass is done with the replication target installed locally to work around possible network bandwidth limitations and subsequently, the secondary data deduplication appliance is sent offsite to resume replication of deduplicated data.
Any potential bandwidth limitation must be taken into consideration when planning for large restore operations typically associated with disaster recovery. It is also important to choose a suitable disaster recovery location for the remote replication target to avoid having to relocate the storage to accommodate large restores due to a lack of bandwidth or space.
There are some differences worth noting in the way data deduplication products process data. These differences can have a significant impact on recovery capabilities and must be taken into consideration. Some deduplication technologies are referred to as "out of band or off-line" which means data is first written to disk and then processed for data deduplication before the final write. While this offers a certain performance advantage during the backup process, it creates a delay in the replication process that can affect the Recovery Point Objective (RPO) for some data. In the event a catastrophic failure affecting the primary storage target took place before the data was replicated offsite, this situation would result in data loss forcing a restore from the last known good copy stored offsite.
Data deduplication vastly improves backup and archive data storage. By taking into consideration external factors and by selecting a solution that will meet the organization's recovery requirements, deduplication definitely has its place in a disaster recovery strategy.
About the author: Pierre Dorion is a certified business continuity professional for Mainland Information Systems Inc.
This was first published in February 2008