This article can also be found in the Premium Editorial Download "Storage magazine: Who owns storage in your organization?."
Download it now to read this article plus other related content.
Imagine if each day you could parse every tiny segment of data in the enterprise and only back up or archive those parts that have truly changed, rather than backing up entire files, databases and objects. Object-based backup--an emerging field of storage technology that only a handful of companies are focusing on--introduces a new medium for data protection and retention. It presents a software infrastructure that reinvents the way we think about and visualize production data backup and archive activities. And it just may be the foundation for a strategy to use inexpensive commodity servers, disk arrays and IP networking far more effectively.
An object-based system can determine if any changes to a file or its attributes have occurred since it was last backed up. If modifications are detected, only the changes are backed up--not the entire file. This can eliminate the unnecessary copying of large amounts of data, thus significantly speeding up backups and reducing the amount of storage space required.
Hashing and storage
Hashing algorithms--one of the key components of object-based storage--were developed in academic computing circles decades ago and are widely used in computer security, encryption and authentication technologies. With hashing, a string of data is analyzed to produce a unique value, or signature, that identifies the original segment of data. In security systems, hashing algorithms are commonly used
In object-based storage, hashing algorithms are used similarly to uniquely identify segments of data that are stored in file systems. Incoming data files are parsed into uniformly sized objects and a hash value is calculated for each one. The results are then compared to those in an existing hash index, which is essentially a database that holds the hash values--or meta data--for each data segment. If an identical hash value already exists in the index, then the piece of data it represents isn't copied (backed up). If the new hash value doesn't match any in the hash index, it's added to the index and the associated data object is copied into the object-based environment.
|Traditional backup is storage-intensive|
This was first published in May 2004