Scalability may be a dedupe dilemma

This article can also be found in the Premium Editorial Download: Storage magazine: RAID turns 20: Do you still need it?:

Craig Wilson, IT director at the Minneapolis law firm of Winthrop & Weinstine, became convinced that a deduplication strategy could benefit him as he watched the recurring costs associated with his tape management system climb. Last year, Wilson invested in a Data Domain Appliance, a product that promises cost-effective, long-term, onsite retention and highly efficient WAN vaulting for disaster recovery.

At that point, Wilson tried to forecast his data growth. But within a month, the law firm exceeded the initial capacity of the Data Domain Appliance Series because of a court case that generated reams of files and required Wilson to back up 1.5TB of data. "I try to anticipate our growth rate but then, Bam!, everything changes," he says.

Wilson now has two choices for scalability: purchase a second appliance or buy a new, larger Data Domain DDX Array. He'll present a case to management for both options, but his vote would be for a larger Data Domain array. "The new Data Domain DDX arrays have expansion cabinets and expansion is a necessity," says Wilson.

He could add a second appliance, but Wilson wouldn't double his capacity because there's no way (yet) to cluster two separate deduplication appliances from Data Domain. Data deduplication can reduce data storage rates by tenfold, shorten backup windows and create new possibilities for offsite data replication. But as administrators hit the upper capacity limits of a deduplication appliance, they need to choose between upgrading to a larger deduplication appliance or buying another one.

Upgrading to a larger deduplication appliance requires data migration. In addition, a unit with 50% more capacity will likely cost more than double the entry-level model. However, adding a deduplication appliance means the new appliance has to repeat the process of deduplicating and compressing the backup data. That's because the new deduplication appliance can't access the deduplication pattern-matching schemes created by the first one.

Brian Garrett, a technical director at ESG Labs in Milford, MA--which provides independent analysis of emerging storage hardware--finds that some capacity limitations of deduplication appliances are minimized by 750GB and 1TB SATA disk drives. Companies with 10TB or less of raw data can now deploy entry-level deduplication appliances that scale to 20TB of deduplicated data which, says Garrett, "allows for companies to deploy a deduplication appliance with twice the capacity the company needs to protect and then [allow them to] grow into it."

New products from companies like NEC Corp. of America and Sepaton offer a more scalable, clustered and multinode architecture. Clustered architectures allow companies to add new deduplication appliances to their backup environment, which can then access the pattern matches created by the first appliance.

Garrett also warns that backup software such as IBM's Tivoli Storage Manager (TSM), which may compress data prior to storing it on a deduplication appliance, can potentially negate the effectiveness of a deduplication appliance. Because deduplication appliances do compression as part of the data-reduction process, compressing data prior to sending it to the deduplication appliance impacts the deduplication appliance's ability to recognize and match patterns.

"Users may not receive as much benefit with TSM because it is already doing some of the same magic as the deduplication appliance," says Garrett.

--Jerome M. Wendt

This was first published in November 2007

Dig deeper on Storage vendors

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchSolidStateStorage

SearchVirtualStorage

SearchCloudStorage

SearchDisasterRecovery

SearchDataBackup

Close