Catching up with deduplication


This article can also be found in the Premium Editorial Download "Storage magazine: Surprise winner: BlueArc earns top NAS quality award honors."

Download it now to read this article plus other related content.

EMC Avamar and Symantec Veritas NetBackup PureDisk take a slightly different approach to address the performance issue. They use agents that utilize computing resources on each client server to do the initial file hash. As part of this process, the agents communicate with the main backup server, which maintains a central database of the unique file hashes. As the Avamar or PureDisk agents on the servers hash the files, they check with the central server to see if the generated hash already exists. If the hash exists, the agent ignores the file; if it doesn't exist, it breaks the file into smaller segments and looks for new unique file segments to store. From that point, EMC Avamar and PureDisk deviate in their product implementation.

EMC Avamar allows server storage capacity to grow to approximately 1.5TB in size. Although Symantec Veritas NetBackup PureDisk servers can grow to manage nearly 4TB of PureDisk storage capacity, EMC Avamar uses segment sizes that are about one-fourth the size of PureDisk's. This allows it to better identify redundant data in files, asserts Jed Yueh, EMC Avamar's VP of product management. If users should need to grow in capacity and scale, EMC Avamar uses a redundant array of independent nodes (RAIN) clustering architecture. This allows organizations to add more server nodes into the RAIN cluster to increase server capacity and performance by striping the data across multiple nodes.

In a PureDisk environment, a single server

Requires Free Membership to View

can manage 4TB of PureDisk storage and up to 100 million files which equates, according to Symantec, to a little more than 80TB of source data. Additional servers can be added to expand PureDisk's storage capacity or to handle larger number of files.

PureDisk manages file meta data outside of the file system using MetaBase Server and MetaBase Engines. As an environment grows, a storage manager uses PureDisk to add new instances of MetaBase Engines; because the MetaBase Server controls communication to all MetaBase Engines, expanding the deduplication environment is a relatively simple process. This separation of the file meta data from the file system allows PureDisk to improve search- and maintenance-related activities on the underlying storage system, grow to hundreds of terabytes and billions of files, and retain a single logical instance of deduplicated data across the enterprise.

Click here for a chart showing deduplicating disk libraries (PDF).

Early adopters
Early adopters of EMC Avamar and Symantec Veritas NetBackup PureDisk report minimal issues with installing backup software agents or server performance hits, but there are some specific circumstances that they monitor more carefully: the initial round of backups and the age of the server on which agents are deployed.

This was first published in June 2007

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: