This article can also be found in the Premium Editorial Download "Storage magazine: Surprise winner: BlueArc earns top NAS quality award honors."
Download it now to read this article plus other related content.
Diligent Technologies' inline ProtecTier Data Protection Platform attempts to avoid the performance penalty required by hash lookups by doing a computational compare. Using its proprietary HyperFactor technology, it avoids opening the backup data stream to examine content and instead scans and indexes the data stream, looking for data that's similar to data already stored.
When the ProtecTier Data Protection Platform finds data it considers similar to data already stored in its index, it does a byte-level compare of the two sets of data; if it matches, it discards the match and references it. Diligent claims this compare-and-compute technique allows its ProtecTier Data Protection Platform to scale to manage hundreds of terabytes. However, this technique still requires some processing power on the part of the disk library to do the computational compare and to compress the data after it has been deduplicated.
NEC Corp. of America's Hydrastor also uses an inline approach, but it employs two different techniques to offset the performance overhead. In the first phase, Hydrastor deduplicates larger, variable-sized chunks of data to eliminate large pieces of redundant data. In the second phase, Hydrastor analyzes smaller, variable-sized chunks of data. In both cases, unique data is compressed.
To compensate for the performance overhead this multiphased approach creates, Hydrastor uses a grid architecture. This allows users to add additional nodes to the cluster
Postprocessing disk libraries
With postprocessing, the disk library stores the data in its native format before deduplicating it, which allows the disk library to dedupe the data during nonpeak backup times. Vendors implement postprocessing in a variety of ways.
For example, Quantum's DXi-Series deduplicates data after it's stored, but initiates the deduplication process without waiting for the entire backup job to finish. By starting deduplication and then compressing the data while the backup is still running, it overcomes one of the principle downsides of postprocessing--the requirement for sufficient capacity to house the native backups. However, deduplication requires use of the DXi-Series' cache and processor, which can potentially slow the backup process because the backup job may need to write the data directly to slower responding disk instead of storing it in the DXi-Series' cache.
This was first published in June 2007