Feature

Catching up with deduplication

Ezine

This article can also be found in the Premium Editorial Download "Storage magazine: Surprise winner: BlueArc earns top NAS quality award honors."

Download it now to read this article plus other related content.

Diligent Technologies' inline ProtecTier Data Protection Platform attempts to avoid the performance penalty required by hash lookups by doing a computational compare. Using its proprietary HyperFactor technology, it avoids opening the backup data stream to examine content and instead scans and indexes the data stream, looking for data that's similar to data already stored.

When the ProtecTier Data Protection Platform finds data it considers similar to data already stored in its index, it does a byte-level compare of the two sets of data; if it matches, it discards the match and references it. Diligent claims this compare-and-compute technique allows its ProtecTier Data Protection Platform to scale to manage hundreds of terabytes. However, this technique still requires some processing power on the part of the disk library to do the computational compare and to compress the data after it has been deduplicated.

NEC Corp. of America's Hydrastor also uses an inline approach, but it employs two different techniques to offset the performance overhead. In the first phase, Hydrastor deduplicates larger, variable-sized chunks of data to eliminate large pieces of redundant data. In the second phase, Hydrastor analyzes smaller, variable-sized chunks of data. In both cases, unique data is compressed.

To compensate for the performance overhead this multiphased approach creates, Hydrastor uses a grid architecture. This allows users to add additional nodes to the cluster

    Requires Free Membership to View

at any time, which are designed to deliver additional performance or capacity. Unlike some other disk libraries, Hydrastor doesn't offer an option to present itself as a virtual tape library. Rather, it presents itself to hosts as a NAS filer using standard NFS and CIFS interfaces and creates one large storage pool on the back end. The Hydrastor architecture may present a problem for those enterprises that need to allocate and reserve certain amounts of storage for specific departments or business units.

Postprocessing disk libraries
With postprocessing, the disk library stores the data in its native format before deduplicating it, which allows the disk library to dedupe the data during nonpeak backup times. Vendors implement postprocessing in a variety of ways.

For example, Quantum's DXi-Series deduplicates data after it's stored, but initiates the deduplication process without waiting for the entire backup job to finish. By starting deduplication and then compressing the data while the backup is still running, it overcomes one of the principle downsides of postprocessing--the requirement for sufficient capacity to house the native backups. However, deduplication requires use of the DXi-Series' cache and processor, which can potentially slow the backup process because the backup job may need to write the data directly to slower responding disk instead of storing it in the DXi-Series' cache.

This was first published in June 2007

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: