WAN backups, archives and disaster recovery

With backups and disaster recovery sets, the main consideration for the WAN is efficiency: making the most of available bandwidth to move the most data in the least time. But not every organization can afford high-bandwidth WAN connections, and those who can't are turning to other techniques to reduce the amount of data needed to perform a remote backup or disaster recovery set.

When it comes to data protection, the disciplines of backup, archiving and disaster recovery have the same goal: preserving data in accordance with regulatory compliance requirements and guarding data against loss.

But local data protection has limited effectiveness. The same fire, flood or theft that damages your production data can just as easily destroy the copies of that data. Historically, companies moved backups to a remote location by transporting tapes offsite, but the emergence of the Internet as well as other storage-saving technologies has made remote data protection more practical.

Today, a backup, archive or disaster recovery set (the group of files or data that constitutes a disaster recovery package) can be stored on the other side of the world as easily as next door. With backups and disaster recovery sets, the primary consideration for the wide area network (WAN) is efficiency: making the most of available bandwidth to move the maximum amount of data in the minimum amount of time. This ensures the shortest recovery point objective (RPO).

But as discussed in chapter 1, high-bandwidth WAN connections cost too much for many organizations. Several techniques have emerged to reduce the sheer amount of data needed to perform a remote backup or disaster recovery set.

A full backup is an essential starting point. But a full backup can take a long time, which can vastly extend the remote RPO. If it takes 36 hours to perform a complete backup across the WAN, the smallest possible RPO would be 36 hours -- far longer than most organizations can tolerate.

When backing up remotely, often to a remote virtual tape library (VTL), most organizations will start with a full backup, then revert to incremental or differential backups to save only files that have changed since the full backup. The technique of "delta differencing" saves just the changed blocks or bytes. So the initial backup or disaster recovery set of 20 TB may take many hours, but an average delta difference of 10 GB per day can be transferred in just a few hours, well within an acceptable daily backup window.

Another data reduction method is data compression, which involves searching for repetitious data segments that can be removed from a file. The mathematical algorithm used to compress the file can rebuild it again when the file is read later. Compression typically cuts data volumes in half. But since not all files compress well, the actual compression ratio varies with file type.

Use of the data reduction technique known as data deduplication continues to grow. Data deduplication saves only one unique copy of a file, block or byte to remote storage. You can learn more about data deduplication in our special report.

The traditional concepts of full backups or disaster recovery sets are changing. Storage administrators realize they don't need to back up every single PowerPoint presentation or include each MP3 file in a disaster recovery set. More businesses are focusing on protecting mission-critical applications, while ignoring secondary or nonessential file types.

Recoverability from the WAN

Because backups and disaster recovery sets are useless unless they can be recovered from a remote location, storage administrators must also be concerned with recovery time objectives (RTO). RTOs can be different than RPOs. An organization might need an extremely short RPO to minimize potential data loss, but can tolerate 12 hours to 24 hours for recovery. What's critical is that remote data can be recovered within the allotted RTO. In some cases, a business may temporarily draw additional bandwidth from a service provider in order to meet tight RTOs. Recovery drills can be used to train personnel and streamline the recovery operation.

Administrators should also be concerned not just with the reliability of the WAN itself but also the WAN hardware, such as the actual wires, routers and servers. During the disaster recovery planning process, the team should make sure that the remote site is located outside the danger area. As Hurricane Katrina demonstrated, sending data to a facility across town is pointless if the entire town can be flooded. WAN service must be reliable; any disruptions to WAN service can not only ruin a backup process but also prevent recovery. Companies with critical WAN recovery requirements may want to use a secondary or backup WAN provider.

Implications of archiving data

Unlike backups or disaster recovery sets, which are typically only accessed after a problem occurs, archival data can be accessed at any time (albeit infrequently). An example of archival data is patient records, where a doctor may only access the patient's history and medical images during an annual physical or follow-up visit. Remote archives add a measure of data protection by placing the data in another location.

With remote archives, WAN bandwidth is not a major concern because the individual files being saved or accessed are small relative to the total archive size. For example, a patient's X-ray image may only be a few megabytes that can be pulled across a low-bandwidth WAN link. But if the WAN goes down, the archive becomes inaccessible. One way to mitigate the impact of WAN disruptions is to use a local archive platform, then mirror to a remote archive for data protection.

Remote storage and security issues

Because companies are obligated to protect sensitive data, as well as customers' personal information against theft or loss, any remote operation should factor in data security. Remote data is often beyond your direct physical control, but unlike tapes that are typically vaulted, WAN connectivity carries the added risk of unauthorized electronic access (aka hacking). Consequently, remote storage should include authentication plans along with the use of encryption. Encryption is only needed for sensitive data or personal information, so it's usually not necessary to encrypt the entire backup or disaster recovery set. Encryption can be implemented in hardware or through software.

Dig Deeper on Data storage compliance and regulations