This article can also be found in the Premium Editorial Download "Storage magazine: Boosting data storage array performance."

Download it now to read this article plus other related content.

The goal is to improve backup performance and minimize capacity requirements by capturing only the actual changes, and then minimizing the amount of data backed up and stored. Capacity savings are achieved by recognizing redundant data and avoiding the storage of multiple copies of data.

The data coalescence process examines data at a specified unit of granularity to identify redundancies, indexes common units of data and then stores only additional unique data. Specific approaches vary based on several factors, such as the level of granularity. With a finer granularity (i.e., smaller data units), greater commonality can be discovered and, therefore, greater storage savings can be realized. Indexing at the file level, for example, eliminates storing multiple backups of the same file; but adding one word to a 1MB Word document represents a change that would require another entire file to be saved. Indexing at the subfile or block level would store only the portion of the file with changed data. But greater granularity requires more indexing, which could impact performance when accessing information.

Another consideration is the use of fixed- vs. variable-length data elements. A fixed-element approach may be fine for structured data such as databases, but with unstructured data, many commonalities may be missed. Consider the files shown in the figure "Fixed-block commonality factoring" at right. Using a fixed-length approach, no commonality would be discovered and each

Requires Free Membership to View

file would be stored as a separate set of objects.

With a variable-length algorithm (see "Variable-block commonality factoring," at right), it would be easy to detect that, aside from the first letter in File 2, the files are identical and would require storing only a fraction of the file.

Regardless of the level of factoring, the true benefits of a data-reduction technology are realized when aggregated across multiple servers. A single backed-up system has significant redundancy, but when many systems are backed up to a common server, there's likely to be an even greater potential for redundancy and, therefore, for data reduction. Imagine coalescing backup data down to onetenth or one-twenty-fifth of its original total size. Properly deployed, this technology can make disk cheaper than tape, turning the tape vs. disk TCO comparison on its head.

This was first published in January 2006

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: