Home > Storage Technology Tips > Data storage management > Tips for an effective data deduplication implementation
Storage Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATA STORAGE MANAGEMENT

Tips for an effective data deduplication implementation


Alan Radding
07.29.2009
Rating: -3.22- (out of 5)


Storage technology learning materials
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


Data deduplication has been identified by nearly every analyst firm as a hot IT trend. Many storage vendors offer products that can handle deduplication, but there's still considerable confusion about the technology, which leads to implementation mistakes.Key areas for users to concentrate on include knowing their data, testing vendor dedupe claims with actual data and not deduping compressed data.

For example, one significant argument is whether inline deduplication is more efficient than post-processing dedupe. While dedupe requires processing, which takes time and resources, the issue is where to spend the time: at the start of the backup process or the end; and which CPU you want to absorb the processing overhead.

The City of Lenexa, Kan., prefers post-processing deduplication. "It's just a question of how fast we can get our data onto the box," said Michael Lawrence, CISO and network administrator for the city. The box is an ExaGrid Systems Inc. storage device used for virtual tape backup. With data deduplication technology, the city can keep 15 days' worth of backups on the ExaGrid. Once the data lands there, it can be deduped, further backed up to actual tape or processed in other ways.

Data deduplication news
Data Domain delivers bigger data deduplication device

Barracuda Networks adds data deduplication with Yosemite integration

EMC wraps up data deduplication vendor Data Domain; what's next for EMC, NetApp?
Another source of confusion is the vendor deduplication ratio, which compares the amount of data at the start of the dedupe process to the amount at the end. Ratios of 40:1, 60:1 and 80:1 are common. And a 400:1 ratio claim isn't unheard of. Under some circumstances and depending on how you calculate it, almost any ratio may be correct. It just won't reflect what you're likely to achieve with your data and backup process.

"Vendors will tout incredible ratios, but that may not be realistic for you," said Tim Malfara, storage architect at GSI Commerce Solutions Inc. in King of Prussia, Pa. Not every workload or backup benefits from data deduplication. GSI Commerce opted not to deploy deduplication. "The biggest backup areas we have, high-rez images and structured databases, don't dedupe well," Malfara said.

The City of Lenexa's Lawrence doesn't yet know what his dedupe ratio will be. "The ratio gets better over time," he noted, because the chance of newly arriving data being a duplicate of previously stored data increases as more backups are made.

Another debate focuses on the particular dedupe algorithms: proprietary or public. Algorithms may seem exotic, but the science of hash-based and content-aware algorithms is widely known and debated online. As a result, you'll end up with roughly the same performance regardless of the algorithm.

Public algorithms, such as SHA-1 or MD5, are good for most situations. There are so many points in the process where latency creeps in or bits are dropped that slightly better hardly matters. Many storage managers don't even know what specific data deduplication algorithm they use.

You also don't need to worry about hash collisions, which increase data bit-error rates as the environment grows. Although this is statistically true, you don't need to lose sleep over it.

W. Curtis Preston, executive editor of TechTarget's Storage Media Group and an independent backup expert, did the math in his blog and found that with 95 exabytes of data there's a 0.00000000000001110223024625156540423631668090820313% chance your system will discard a block from a hash collision that it should have kept. The chance that the corrupted block will actually be needed in a restore is even more remote.

"And if you have something less than 95 exabytes of data, then your odds don't appear in 50 decimal places," reads a quote from Preston's blog. "I think I'm OK with these odds."

Four simple steps to maximize your data deduplication experience

So what can you do to maximize your dedupe experience? Here are four simple steps:

1. Know your data. Is it structured database data, graphical data or general office files? Different types of data, such as general office files, lend themselves better to deduplication.

2. Test dedupe with your actual data and insist vendors demonstrate their systems with a large chunk of your actual data. Better yet, ask them to let you demo the system with your data for a month before committing to a purchase.

3. Don't bother deduping compressed data. Deduplication is just another form of compression. Compressed data, in effect, has already been deduped.

4. Understand that deduplication is a feature, not a product. You don't have to buy a dedupe product to get deduplication. The capability is increasingly being incorporated into a range of storage products, including virtual tape libraries (VTLs), backup software and storage arrays.

With the right data in the right situation, data deduplication works well. While dedupe continues to be used primarily to reduce backup volumes, the technology should eventually expand and may even be applied to archiving.

Rate this Tip
To rate tips, you must be a member of SearchStorage.com.
Register now to start rating these tips. Log in if you are already a member.




BROWSE BY TAG
Data Backup,   Data reduction and deduplication,   Data storage management,   VIEW ALL TAGS

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
Data reduction and deduplication
Backup and disaster recovery (DR) hardware finalists: 2009 Products of the Year
Creating a data center migration plan
An introduction to data compression
Primary storage data reduction advancing via data deduplication, compression
NetApp: Post-process deduplication limits performance hit in primary storage data deduplication
EMC Celerra: Primary storage data reduction through deduplication, compression
Storwize claims good data compression rates, no performance degradation on STN-6000 appliance
Primary storage data reduction: Data deduplication and compression tools
Gartner analyst on data deduplication for primary storage
Ocarina ECOsystem deconstructs before compression, deduplication for primary storage data reduction

Data storage management
Use MAID, intelligent power management as green storage options to control energy consumption
Cloud storage pricing: The cost of a hypothetical month of cloud data storage
Cloud storage pricing revealed: Hidden costs include data migration and access fees
Creating a data center migration plan
Top 10 enterprise data storage tips of 2009
Building a private storage cloud: Essential components
How to add solid-state storage to your enterprise data storage systems
Is cloud data storage right for your IT infrastructure?
Optimizing enterprise data storage capacity and performance to reduce your data footprint
Is data deduplication right for your primary storage infrastructure?

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
compression  (SearchStorage.com)
data deduplication  (SearchStorage.com)
delta differencing  (SearchStorage.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



Find Data Backup Analysis
TechTarget Storage Media
Storage Magazine View this month\\'s issue and subscribe today.
Storage Decisions Apply online for free conference admission.
SearchStorage.com
HomeNewsMagazineTopicsLearningMultimediaWhite PapersBlogsEventsAbout Us

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2000 - 2010, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts