This article can also be found in the Premium Editorial Download "Storage magazine: Boosting data storage array performance."
Download it now to read this article plus other related content.
The goal is to improve backup performance and minimize capacity requirements by capturing only the actual changes, and then minimizing the amount of data backed up and stored. Capacity savings are achieved by recognizing redundant data and avoiding the storage of multiple copies of data.
The data coalescence process examines data at a specified unit of granularity to identify redundancies, indexes common units of data and then stores only additional unique data. Specific approaches vary based on several factors, such as the level of granularity. With a finer granularity (i.e., smaller data units), greater commonality can be discovered and, therefore, greater storage savings can be realized. Indexing at the file level, for example, eliminates storing multiple backups of the same file; but adding one word to a 1MB Word document represents a change that would require another entire file to be saved. Indexing at the subfile or block level would store only the portion of the file with changed data. But greater granularity requires more indexing, which could impact performance when accessing information.
Another consideration is the use of fixed- vs. variable-length data elements. A fixed-element approach may be fine for structured data such as databases, but with unstructured data, many commonalities may be missed. Consider the files shown in the figure "Fixed-block commonality factoring" at right. Using a fixed-length approach, no commonality would be discovered and each
Regardless of the level of factoring, the true benefits of a data-reduction technology are realized when aggregated across multiple servers. A single backed-up system has significant redundancy, but when many systems are backed up to a common server, there's likely to be an even greater potential for redundancy and, therefore, for data reduction. Imagine coalescing backup data down to onetenth or one-twenty-fifth of its original total size. Properly deployed, this technology can make disk cheaper than tape, turning the tape vs. disk TCO comparison on its head.
This was first published in January 2006