Catching up with deduplication


This article can also be found in the Premium Editorial Download "Storage magazine: Surprise winner: BlueArc earns top NAS quality award honors."

Download it now to read this article plus other related content.

The difference between the two determines what size environments they best fit. You can't add more controllers to ExaGrid to allow it to deduplicate the large amounts of data that enterprise backups generate. Sepaton uses a grid architecture in S2100-ES2 so additional controllers for more processing and capacity can be added as deduplication requirements grow.

Hidden issues
Regardless of the deduplication approach, there are some hidden issues. For postprocessing disk libraries, as the amount of data increases, it may take much longer to deduplicate the data once the backups are complete. If the deduplication takes longer than the time between the end of one backup window and the start of the next, all of the data from the first backup won't be deduplicated so users will need to ensure they can add more processing power to handle this load.

Another potential problem may arise with inline or postprocessing disk libraries that aren't replicating the data to a remote disk library: the need to create tapes. The disk library needs sufficient time to first deduplicate the data and then undeduplicate a copy of the data to be spun off to tape. Both ExaGrid Systems' ExaGrid and Sepaton's S2100-ES2 avoid this undeduplication overhead because the last backup is only compressed, not deduplicated, so users can copy the job directly to tape.

Other postprocessing disk libraries like Spectra Logic Corp.'s nTier appliance allow users to run a local master

Requires Free Membership to View

or media server within their nTier appliance that alleviates some of the pain of this process. The nTier appliance eliminates the need to move data from host to media server to deduplication box to media server to tape, and allows the data to move from host to nTier appliance to tape. This design also eliminates the need to undeduplicate the data before storing it to tape.

Deduplicating backup software products that must operate in conjunction with enterprise backup products like Symantec Veritas NetBackup or EMC NetWorker face a different problem--allowing the enterprise backup software product to recognize and catalog the data it has backed up. While neither Asigra Televaulting nor EMC Avamar have any formal integration in place with any enterprise backup software product yet, Symantec Veritas NetBackup PureDisk 6.1 includes a NetBackup export engine that allows an administrator to copy a backed-up data selection from a PureDisk content router to NetBackup. NetBackup then catalogs the data and copies it to tape or disk and, from the NetBackup administration console, the storage administrator can treat those files as if they were native NetBackup files. Both EMC and Symantec anticipate tighter integration between their enterprise and deduplicating backup software products in the near future.

This was first published in June 2007

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: