Is data deduplication right for your primary storage infrastructure?

Data deduplication is a firmly established technique for reducing storage demands on data backup, but a handful of vendors are now applying this technology to primary

Requires Free Membership to View

storage. However, the demands on primary storage are considerably different than those on data backup, so if you're undertaking a primary deduplication project, you'll need to learn about the different requirements and techniques.

In both primary storage and data backup, deduplication technology scans the data to be stored and replaces duplicate blocks or files with pointers to the previously stored blocks or files that have been duplicated.

In backup, data deduplication is highly space efficient, resulting in storage savings of as much as 20:1. But because primary storage offers fewer opportunities for deduplication, primary dedupe usually doesn't produce the same kind of space savings. Rather than 20:1, primary dedupe is more likely to result in ratios of 2:1.

Before you start a primary deduplication project

If you're considering a primary deduplication project, it's important to determine what you intend to dedupe. You need to study your data and look for likely candidates, such as applications with data that rarely changes and transactional databases where you can't afford performance penalties. You should run tests to measure the performance impacts before you commit to deduping your primary storage.

Because each system is unique, you should carefully consider the effects of primary dedupe before applying it. The effectiveness of primary dedupe depends in large part on the characteristics of the system it's applied to, including:

  • The mix of applications

  • Usage patterns

  • The rate of change in the data

  • Processor power, storage configuration and network throughput

Latency is another issue that separates primary storage and data backup when it comes to dedupe.

Because every block or file has to be checked for duplication, data deduplication extracts a performance penalty and uses resources for checking data. Latency is more likely to affect users in primary storage than in backups. Therefore, there are a lot of primary dedupe products that emphasize performance, but this often comes at the expense of enterprise data storage efficiency.

Data deduplication options for primary storage

A number of companies, including NetApp Inc. and the recently acquired Data Domain Inc., offer options for primary dedupe. Other vendors offer dedupe capabilities combined with features such as in-line compression to reduce the footprint of data not suited for dedupe or to automatically identify deduplication opportunities in the data stream. Storwize Inc. offers real-time compression, while Ocarina Networks uses an extraction compression technique to indentify dedupe candidates.

Data deduplication is also becoming an increasingly popular option in virtualized systems because the multiple instances of the OS are highly redundant and seldom change. Other contents of C: drives on virtual machines (VMs) are also highly redundant and barely change.

Generally speaking, the closer an application or a data file approaches a WORM device, the more suited it is for deduplication. Therefore, CAD files and graphics files are also perfect candidates because of how little they change.

A classic example of where data deduplication is a poor choice is in a transactional database where data frequently changes. This can result in increased activity and places a heavy load on system resources. File sizes are often also too small, which can make it difficult to efficiently match standard block sizes.

This was first published in October 2009

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.