Feature

Dedupe myths and methods

Ezine

This article can also be found in the Premium Editorial Download "Storage magazine: Using two midrange backup apps at once."

Download it now to read this article plus other related content.

Data deduplication products can dramatically lower capacity requirements, but picking the best one for your needs can be tricky.

Exaggerated claims, rapidly changing technology and persistent myths make navigating the deduplication landscape treacherous. But the rewards of a successful dedupe installation are indisputable.

"We're seeing the growing popularity of secondary storage and archival systems with single-instance storage," says Lauren Whitehouse, analyst at Enterprise Strategy Group (ESG), Milford, MA. "A couple of deduplication products have even appeared for use with primary storage."

The technology is maturing rapidly. "We looked at deduplication two years ago and it wasn't ready," says John Wunder, director of IT at Milpitas, CA-based Magnum Semiconductor, which makes chips for media processing. Recently, Wunder pulled together a deduplication process by combining pieces from Diligent Technologies Corp. (deduplication engine), Symantec Corp. Veritas NetBackup and Quatrio (servers and storage).

Assembling the right pieces requires a clear understanding of the different dedupe technologies, a thorough testing of products prior to production, and keeping up with major product changes such as the introduction of hybrid deduplication (see "Dedupe alternatives," below) and the emergence of global deduplication.

    Requires Free Membership to View

Dedupe alternatives

Until recently, deduplication was performed either in-line or post-processing. Now vendors are blurring those boundaries.

  • FalconStor Software Corp. offers what it calls a hybrid model, in which it begins the post-process deduping of a backup job on a series of tapes without waiting for the entire backup process to be completed, thereby speeding the post-processing effort.


  • Quantum Corp. offers what it calls adaptive dedup-lication, which starts as in-line processing with the data being deduped as it's written. Then it adds a buffer that can increase dynamically as the data input volume outpaces the processing. It dedupes the data in the buffer in post-processing style.

"Global deduplication is the process of fanning in multiple sources of data and performing deduplication across those sources," says ESG's Whitehouse. Currently, each appliance maintains its own index of duplicate data. Global deduplication requires a way to share those indexes across appliances (see "Global deduplication," below).


Global deduplication
"Global deduplication is the process of fanning in multiple sources of data and performing deduplication across those sources," says Lauren Whitehouse, analyst at Enterprise Strategy Group (ESG), Milford, MA. Global dedupe generally results in higher ratios and allows you to scale input/output. The global dedupe process differs when you're deduping on the target side or the source side, notes Whitehouse.
  • Target side: Replicate indexes of multiple silos to a central, larger silo to produce a consolidated index that ensures only unique files/segments are transported.


  • Source side: Fan in indexes from remote offices/ branch offices (ROBOs) and dedupe to create a central, consolidated index repository.

This was first published in September 2008

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: