Imagine if each day you could parse every tiny segment of data in the enterprise and only back up or archive those parts that have truly changed, rather than backing up entire files, databases and objects. Object-based backup--an emerging field of storage technology that only a handful of companies are focusing on--introduces a new medium for data protection and retention. It presents a software infrastructure that reinvents the way we think about and visualize production data backup and archive activities. And it just may be the foundation for a strategy to use inexpensive commodity servers, disk arrays and IP networking far more effectively.
An object-based system can determine if any changes to a file or its attributes have occurred since it was last backed up. If modifications are detected, only the changes are backed up--not the entire file. This can eliminate the unnecessary copying of large amounts of data, thus significantly speeding up backups and reducing the amount of storage space required.
Hashing and storage
Hashing algorithms--one of the key components of object-based storage--were developed in academic computing circles decades ago and are widely used in computer security, encryption and authentication technologies. With hashing, a string of data is analyzed to produce a unique value, or signature, that identifies the original segment of data. In security systems, hashing algorithms are commonly used in public-key encryption protocols where a simple string of password data may be encrypted with a 128-bit hashing algorithm to produce a unique signature--unique enough, that is, to require as many as 2^128 guesses for an interloper to decrypt the string.
In object-based storage, hashing algorithms are used similarly to uniquely identify segments of data that are stored in file systems. Incoming data files are parsed into uniformly sized objects and a hash value is calculated for each one. The results are then compared to those in an existing hash index, which is essentially a database that holds the hash values--or meta data--for each data segment. If an identical hash value already exists in the index, then the piece of data it represents isn't copied (backed up). If the new hash value doesn't match any in the hash index, it's added to the index and the associated data object is copied into the object-based environment.
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.