This article can also be found in the Premium Editorial Download "Storage magazine: What you need to know about all solid-state arrays."

Download it now to read this article plus other related content.

Today’s data deduplication conversation shouldn’t focus on “when” or even “how well” the dedupe solution compresses per se. Almost every vendor touts its own offering as having the most efficient compression ratios or methodology. Behind closed doors, these vendors can tell you exactly which data types their product dedupes best, and which data types its competitive products will choke on.

Of course, since storage vendors often battle in “speeds and feeds,” dedupe has inevitably led to a lot of marketing hype. To legitimately quantify dedupe effectiveness, we would all have to agree on an industry-standard methodology for measuring deduplication efficiency (and a way to publish the findings) before we discuss how well a given product dedupes. Until then, you’ll have to test the products on your short list to determine which ones fit your needs.

Dedupe is still a “where” discussion. In that regard, I offer that deduplication lends itself to a “good, better, best” assessment -- in other terms, dedupe 1.0, 1.5 or 2.0. Here’s a guide to help answer the “where” question:

• Deduplicated storage is good (dedupe 1.0). Everyone should incorporate deduplication into their disk-based protection architecture, so simply having deduplicated storage is a good thing.

• Deploying smarter backup servers is better (dedupe 1.5). With legacy deduplication, the backup server is oblivious to the storage being deduplicated. It sends everything

Requires Free Membership to View

it backs up to storage, and then the storage discards most of it because the data already exists in the deduplicated storage pool. Extending deduplication intelligence (or even just awareness) to the backup server solves that problem. The backup server won’t send data the deduplicated storage array already has.

• Client-side deduplication is best (dedupe 2.0). Why send everything from the production server if it will be discarded by the storage array (dedupe 1.0) or the backup server (dedupe 1.5)? Instead, by making the production server dedupe aware, only new data whose fragments or blocks aren’t already in deduplicated storage are sent.

This was first published in August 2012

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: