Catching up with deduplication


This article can also be found in the Premium Editorial Download "Storage magazine: Surprise winner: BlueArc earns top NAS quality award honors."

Download it now to read this article plus other related content.

However, using deduplication at the source or target introduces performance and management issues. Backup software-based deduplication products introduce a heavy initial processing toll on the host. In addition, users should carefully examine how swapping current backup software with a deduplication backup product, or running two backup software products concurrently, will affect server and application performance, as well as their stability.

Conversely, deduplicating data on a disk library may require users to deploy multiple disk libraries to handle the performance overhead created during peak backup periods. This creates more management overhead as each disk library creates its own unique deduplicated data store; administrators must also manage and direct backup jobs to multiple physical disk libraries as opposed to just one logical one. Determining which backup software, disk library or combination of them to select, and under what circumstances, is how they handle these potential bottlenecks.

Breaking the bottlenecks
Asigra Televaulting attempts to break the management bottleneck by taking an agentless approach that expedites deployments while minimizing user involvement. Users initially install the Asigra Televaulting gateway software on a Windows or Linux server. The Televaulting backup software accesses client files over the internal network using CIFS, NFS or SSH (SSH allows for security but is slower) and reads the files. As it reads

Requires Free Membership to View

each file, the Asigra Televaulting server performs a hash on the file. If the file is determined to be unique, the file is chunked up with its unique blocks stored while redundant blocks are indexed and thrown away.

All hash processing takes place on the Asigra Televaulting server, which maintains a database of all of the unique file blocks on the different servers it's assigned to protect. Once the initial backup and index is done, subsequent server backups execute faster because they can use this common repository of unique blocks created from the first server's backup.

This approach still doesn't completely eliminate the performance toll of deduplication. By running the deduplication on a central server, the Televaulting software transfers the performance overhead from the client servers to the Televaulting server. Multiple servers with unusually large daily data change rates (more than 10%) or large numbers of servers (100 or more) needing to run backups at the same time could impact backup times and force the deployment of more Asigra Televaulting servers to manage the overhead.

Click here for a chart showing Deduplicating backup software (PDF).

This was first published in June 2007

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: