Dedupe myths and methods


This article can also be found in the Premium Editorial Download "Storage magazine: Using two midrange backup apps at once."

Download it now to read this article plus other related content.

Dedupe tips
Sorting out the deduplication myths is just the first part of a storage manager's job. The following tips will help managers deploy deduplication while avoiding common pitfalls.

  1. Know your data. "People don't have accurate data on their daily changes and retention periods," says Wunder. That data, however, is critical in estimating what kind of dedupe ratio you'll get and planning how much disk capacity you'll need. "We planned for a 60-day retention period to keep the cost down," he says.

    "The vendors will do capacity estimates and they're pretty good," says ESG's Whitehouse. Adventist Health's Aubry, for example, asked Data Domain and ExaGrid to size a deduplication solution. "We told them what we knew about the data and asked them to look at our data and what we were doing. They each came back with estimates that were comparable," says Aubry. Almost two years later the estimates have still proven pretty accurate.

  2. Know your applications. Not all deduplication products handle all applications equally. Special data structures, unusual data formats, and other ways an application treats data and variable-length data can all fool a dedupe product.

    When Philadelphia law firm Duane Morris LLP finally got around to

Requires Free Membership to View

  1. using Avamar Technologies' Axiom (now EMC Avamar) for deduplication, the company had a surprise: "It worked for some applications, but it didn't work with Microsoft Exchange," says Duane Morris CIO John Sroka.

    Avamar had no problem deduping the firm's 6 million Word documents, but when it hit Exchange data "it saw the Exchange data as completely new each time, no duplication," he reports. (The latest version of Avamar dedupes Exchange data.) Duane Morris, however, won't bother to upgrade Avamar. "We're moving to Double-Take [from Double-Take Software Inc.] to get real-time replication," says Sroka, which is what the firm wanted all along.

  2. Avoid deduping compressed data. As a corollary to the above tip, "it's a waste of time to try to dedupe compressed files. We tried and ended up with some horrible ratios," says Kevin Fiore, CIO at Thomas Weisel Partners LLC, a San Francisco investment bank. A Data Domain user for more than two years, the company gets ratios as high as 35:1 with uncompressed file data. With database applications and others that compress files, the ratios fell into the single digits.

    When deduping a mix of applications, Thomas Weisel Partners experiences acceptable ratios ranging from 12:1 to 16:1. Similarly, data the company doesn't keep very long isn't worth deduping at all. Unless the data is kept long enough to be backed up multiple times, there's little to gain from deduplication for that data.

  3. Avoid the easy fix. "There's a point early in the process where companies go for a quick fix, an appliance. Then they find themselves plopping in more boxes when they have to scale. At some point, they can't get the situation under control," says ESG's Whitehouse. Appliances certainly present an easy solution, but until the selected appliance supports some form of global dedupe, a company will find itself managing islands of deduplication. In the process, it will miss opportunities to remove data identified by multiple appliances.

    Magnum Semiconductor's Wunder quickly spotted this trap. "We looked at Data Domain, but we realized it wouldn't scale. At some point we would need multiple appliances at $80,000 apiece," he says.

  4. Test dedupe products with a large quantity of your real data. "This kind of testing is time consuming, so many companies avoid it. Usually a company will try the product with little bits of data, and the results won't compare with large data sets," says GlassHouse Technologies' Preston. Ideally, you should demo the product onsite by having it do real work for a month or so before opting to buy it. However, most vendors won't go along with this unless they believe they're on the verge of losing the sale.

Adventist Health got lucky. It made a decision based on lengthy onsite meetings with engineers from Data Domain and ExaGrid. Based on those meetings and their internal analysis, it opted for ExaGrid. Once the decision was made, Adventist Health's Aubry called Data Domain as a courtesy. Data Domain wouldn't give up and offered to send an appliance.

"I was a little nervous I might have made a wrong decision. We put in both and ran a bake off," says Aubry. ExaGrid was already installed on Adventist Health's routed network. It put the Data Domain appliance on a private network connected to its media server.

"I was expecting Data Domain to outperform because of the private network," he says. Measuring the time it took to complete the end-to-end process, ExaGrid performed 20% faster, much to Aubry's relief as he was already committed to buying the ExaGrid.

Just about every consumer cliché applies to deduplication today: buyer beware, try before you buy, your mileage may vary, past performance is no indicator of future performance, one size doesn't fit all and so on. Fortunately, the market is competitive and price is negotiable. With the technology-industry analyst firm The 451 Group projecting the market to surpass $1 billion by 2009, up from $100 million just three years earlier, dedupe is hot. Shop around. Informed storage managers should be able to get a deduplication product that fits their needs at a competitive price.

This was first published in September 2008

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: