Home > Storage Magazine > Features > Object-based backup
EMAIL THIS LICENSING & REPRINTS
Storage Magazine

  CURRENT ISSUE  

  FEATURES  

  TOOLS, TRENDS & ANALYSIS  

  COLUMNS  

  ARCHIVES  

  SUBSCRIBE/RENEW  
 

Object-based backup
by Jerome M. Wendt
Issue: Jul 2005
printer-friendly
licensing & reprints
< PREV PAGE   |   1  |   2  |   3  |   NEXT PAGE  >

Hashing algorithms
Object-based storage products that store the entire file as an object may give users a choice of hashing algorithms to create each file's digital signature. To choose the correct hashing algorithm, users should understand what a "hash" is, how it works and why one might be better for a particular environment.

A hash is a cryptographic function that takes an input of any length and produces an output of a fixed length. For instance, a common hashing algorithm used in object-based storage products is MD5, which produces a fixed-size digital signature of 128 bits. Hashing algorithms are particularly appealing for creating digital signatures because they create a unique output from the input; it's also thought to be impossible to compute the nature of the input from the output.

The primary differences between the types of algorithms used are:

Security of the hash
Possibility of "collisions"
Speed in generating the digital signature
SHA1, which generates a 160-bit digital signature, is another hashing algorithm used frequently by object-based products. While both MD5 and SHA1 are considered secure, SHA1 is more secure than MD5 because of its 160-bit digital signature, which makes it a much harder hash to break. The other benefit of using a 160-bit digital signature is that it eliminates the possibility of two hashes generating the same digital signature, something theoretically possible using the MD5 algorithm. But because SHA1 generates a larger digital signature, it runs slower than the MD5 algorithm.

Content management
Object-based content-management products offer the following benefits:

  • Data preservation and consolidation
  • Capacity optimization
  • Regulatory compliance
  • Fast, random access to data
  • Constant data availability

OBS products for content management differ architecturally from OBS products focused on backup. Content-management products preserve the user's original data for a long period of time, make sure it's accessible when needed and ensure that organizations remain compliant. While storage administrators can set policies for individual objects, vendors say that most organizations set up a default policy for all files stored in a specific directory. For example, Archivas suggests admins go through the following preparatory steps for a new application:

  1. Create a directory on the ArC server for the application's files.
  2. Within the directory, create policies that get assigned to all files stored in that directory, such as retention period or what hashing algorithm is used to create the digital signature.
  3. Mount the directory and present it to the app.

Unlike products intended for backup, content-management products don't change or break apart the incoming file to store it in smaller blocks. They store the file as the object--either in its native form or encrypted/compressed as products like Permabit allow--and then use hashing algorithms to analyze the file for uniqueness vs. other files already in its repository. During this analysis stage, the product's algorithm also creates the meta data associated with the file object.

The meta data includes traditional file attributes such as file ownership, creation, modification and access date, user and group access. It can also include additional attributes such as which hashing algorithm should be used to create the object's digital signature, retention period, backup requirements and last successful replication or backup.

EMC's Centera Seek software allows storage admins to search and retrieve files from all of the applications on their EMC Centera. For example, an administrator can retrieve all documents from John Doe that were created between May 1 and May 31 with keywords such as "change," "alteration" or "conversation," regardless of which app was used to create the specific file.

Once files are secured, benefits like data consolidation and capacity optimization emerge. Users will see the most noticeable improvement with e-mail apps such as Exchange and Notes because they allow a single instance store of the same attachment sent to multiple users. This reduces the amount of storage and overhead on the e-mail server while allowing the organization to meet compliance regulations.

While products like EMC's Centera, HP's Reference Information Storage System (RISS) and Permabit's Permeon present a standard NFS or CIFS mount point to the e-mail server, they add a new NAS device to the storage environment. With organizations moving toward global name spaces and standardized NAS interfaces, the last thing the storage or network group may want to see is another specialized NAS product added to the environment. There may also be other considerations. For example, with Centera, users will need to ensure their e-mail software has the necessary APIs to communicate with Centera; they'll also likely need to purchase and maintain that interface as part of their ongoing e-mail management.

Most of these OBS content-management products need to provide availability 24x7 and deliver acceptable performance. To achieve these requirements, vendors are primarily using off-the-shelf Intel servers running a Linux operating system in some type of highly available configuration--clustered or N+1--with RAIDed ATA drives in the background. They generally have their own software running on each server that constantly monitors the integrity of the data, and will either repair or copy the data to another node if an error is detected.

Joshua Freeman, IT director at the New York Botanical Gardens, uses Archivas ArC because it's hardware-agnostic and built on open-source code. He also found that it gave him so much additional low-cost capacity that he was able to use it as both an archive server and a file server. This allowed Freeman's users to store and stage items such as field notes or images of plant specimens prior to their eventual placement in the Botanical Garden's main object database. Likewise, CTRC's Luter hopes to use his ArC configuration to automate the flow of X-rays from Tier-1 storage to lower cost storage, something his staff does manually.

Freeman, Luter and Power Integrations' Degner reflect the growing interest users have in better managing their storage and data. OBS products that minimize and eliminate duplicate data while taking advantage of low-cost storage technologies in highly available configurations are becoming more popular. These products will become even more useful when features such as replication and automated workflow are added.
< PREV PAGE   |   1  |   2  |   3  |   NEXT PAGE  >





TechTarget Storage Media
Storage Magazine View this month\\'s issue and subscribe today.
Storage Decisions Apply online for free conference admission.
SearchStorage.com
HomeNewsMagazineTopicsLearningMultimediaWhite PapersBlogsEventsAbout Us

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2000 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts