This article can also be found in the Premium Editorial Download "Storage magazine: What you need to know about all solid-state arrays."
Download it now to read this article plus other related content.
Today’s data deduplication conversation shouldn’t focus on “when” or even “how well” the dedupe solution compresses per se. Almost every vendor touts its own offering as having the most efficient compression ratios or methodology. Behind closed doors, these vendors can tell you exactly which data types their product dedupes best, and which data types its competitive products will choke on.
Of course, since storage vendors often battle in “speeds and feeds,” dedupe has inevitably led to a lot of marketing hype. To legitimately quantify dedupe effectiveness, we would all have to agree on an industry-standard methodology for measuring deduplication efficiency (and a way to publish the findings) before we discuss how well a given product dedupes. Until then, you’ll have to test the products on your short list to determine which ones fit your needs.
Dedupe is still a “where” discussion. In that regard, I offer that deduplication lends itself to a “good, better, best” assessment -- in other terms, dedupe 1.0, 1.5 or 2.0. Here’s a guide to help answer the “where” question:
• Deduplicated storage is good (dedupe 1.0). Everyone should incorporate deduplication into their disk-based protection architecture, so simply having deduplicated storage is a good thing.
• Deploying smarter backup servers is better (dedupe 1.5). With legacy deduplication, the backup server is oblivious to the storage being deduplicated. It sends everything
• Client-side deduplication is best (dedupe 2.0). Why send everything from the production server if it will be discarded by the storage array (dedupe 1.0) or the backup server (dedupe 1.5)? Instead, by making the production server dedupe aware, only new data whose fragments or blocks aren’t already in deduplicated storage are sent.
This was first published in August 2012