This article can also be found in the Premium Editorial Download "Storage magazine: How HP is reloading its storage strategy."
Download it now to read this article plus other related content.
Object-based backup products offer the following benefits:
- Reduced backup and restore times
- Smaller data stores
- Reduced bandwidth for offsite replication
- Elimination of tape
DD200 Restorer and Axion are the only two object-based products aimed squarely at backup. DD200 Restorer stores the incoming backup data as a file and then, in the background, breaks the file into individual 4K or 8K segments, and creates a unique data object for each segment. The product then verifies if that object already exists. If it doesn't exist, DD200 Restorer stores the new object and records its existence in its database. If it exists, DD200 Restorer records the object's occurrence in its product database and logical associations with other objects but doesn't store the object again.
The Axion system consists of an industry-standard server running Avamar's software and a set of intelligent agents deployed on each backup client. Users also have the option of purchasing Avamar's software and deploying it on their own compliant platforms. Avamar's approach differs
from DD200 Restorer because its agent on the host server breaks apart the file's blocks and creates a digital signature by running a hash against each block of data. The agent then sends the digital signature to the backup repository to verify the originality of the blocks. If the repository determines this is a new ID, it signals the agent to send the entire block of data from the server to the backup repository. The primary advantage this approach has over DD200 Restorer is that it minimizes the amount of data moved across the network from the server to the repository.
Because neither of these products needs to manage large amounts of meta data, they can focus on and optimize data reduction and backup times. Frank Slootman, president and CEO of Data Domain, finds optimizing data reduction secondary to lowering backup times. "Users find data reduction interesting, but the product needs to be able to reduce the backup and restore times first," says Slootman.
Steve Degner, an IT manager at Power Integrations Inc. in San Jose, CA, was one of the early adopters of this technology. Degner found that DD200 Restorer reduced the time for his full backups from three days to one-and-a-half days and decreased his restore window from 14 hours to four hours.
From a network connectivity standpoint, Axion and DD200 Restorer use existing Ethernet IP LANs for communications. However, Avamar uses a secure SSL TCP/IP socket to send the backup data, while DD200 Restorer presents a standard NFS/CIFS interface as a target to the backup software.
The major difference--and it's a big one--between Axion and DD200 Restorer is how they interact with existing backup software. Avamar requires users to either replace their existing backup software agent with the Axion agent or to deploy Avamar's agent in addition to their existing backup software agent to work with Axion. The rationale behind this architecture is to compress the backup traffic at the host level, reducing the amount of network traffic.
"Avamar wants customers to abandon use of their backup software and tape libraries--solutions that took time, money and resources to put into place for operational backup and recovery," wrote Tony Asaro, a senior analyst at the Milford, MA-based Enterprise Strategy Group (ESG), in a recent report on Avamar. He praises the Avamar technology, but advises users to implement it in small doses, at the department level or at a remote office, before rolling it out to an entire storage environment.
Users who wish to maintain their existing backup software and use Axion will need to create two backup copies, one using their existing backup software and one using Axion. However, this approach increases network traffic, introduces more complexity into the environment and should only be considered as a short-term, stop-gap measure until the company standardizes its backup processes on Avamar.
Users with existing backup software will find DD200 Restorer a more palatable alternative. Data Domain's DD200 Restorer appliance works with backup applications such as CommVault Inc.'s Galaxy, EMC's Legato NetWorker and Veritas Software Corp.'s NetBackup and only requires backup administrators to redirect the backup output to the DD200 appliance.
Avamar and Data Domain can replicate data offsite once it's stored and compressed in a central repository, cutting down on the required bandwidth and time to transport the data. Data Domain reports that it has seen instances where bandwidth requirements for offsite replication are reduced to one-tenth that of a standard tape drive.
Object-based content-management products offer the following benefits:
- Data preservation and consolidation
- Capacity optimization
- Regulatory compliance
- Fast, random access to data
- Constant data availability
OBS products for content management differ architecturally from OBS products focused on backup. Content-management products preserve the user's original data for a long period of time, make sure it's accessible when needed and ensure that organizations remain compliant. While storage administrators can set policies for individual objects, vendors say that most organizations set up a default policy for all files stored in a specific directory. For example, Archivas suggests admins go through the following preparatory steps for a new application:
- Create a directory on the ArC server for the application's files.
- Within the directory, create policies that get assigned to all files stored in that directory, such as retention period or what hashing algorithm is used to create the digital signature.
- Mount the directory and present it to the app.
Unlike products intended for backup, content-management products don't change or break apart the incoming file to store it in smaller blocks. They store the file as the object--either in its native form or encrypted/compressed as products like Permabit allow--and then use hashing algorithms to analyze the file for uniqueness vs. other files already in its repository. During this analysis stage, the product's algorithm also creates the meta data associated with the file object.
The meta data includes traditional file attributes such as file ownership, creation, modification and access date, user and group access. It can also include additional attributes such as which hashing algorithm should be used to create the object's digital signature, retention period, backup requirements and last successful replication or backup.
EMC's Centera Seek software allows storage admins to search and retrieve files from all of the applications on their EMC Centera. For example, an administrator can retrieve all documents from John Doe that were created between May 1 and May 31 with keywords such as "change," "alteration" or "conversation," regardless of which app was used to create the specific file.
Once files are secured, benefits like data consolidation and capacity optimization emerge. Users will see the most noticeable improvement with e-mail apps such as Exchange and Notes because they allow a single instance store of the same attachment sent to multiple users. This reduces the amount of storage and overhead on the e-mail server while allowing the organization to meet compliance regulations.
While products like EMC's Centera, HP's Reference Information Storage System (RISS) and Permabit's Permeon present a standard NFS or CIFS mount point to the e-mail server, they add a new NAS device to the storage environment. With organizations moving toward global name spaces and standardized NAS interfaces, the last thing the storage or network group may want to see is another specialized NAS product added to the environment. There may also be other considerations. For example, with Centera, users will need to ensure their e-mail software has the necessary APIs to communicate with Centera; they'll also likely need to purchase and maintain that interface as part of their ongoing e-mail management.
Most of these OBS content-management products need to provide availability 24x7 and deliver acceptable performance. To achieve these requirements, vendors are primarily using off-the-shelf Intel servers running a Linux operating system in some type of highly available configuration--clustered or N+1--with RAIDed ATA drives in the background. They generally have their own software running on each server that constantly monitors the integrity of the data, and will either repair or copy the data to another node if an error is detected.
Joshua Freeman, IT director at the New York Botanical Gardens, uses Archivas ArC because it's hardware-agnostic and built on open-source code. He also found that it gave him so much additional low-cost capacity that he was able to use it as both an archive server and a file server. This allowed Freeman's users to store and stage items such as field notes or images of plant specimens prior to their eventual placement in the Botanical Garden's main object database. Likewise, CTRC's Luter hopes to use his ArC configuration to automate the flow of X-rays from Tier-1 storage to lower cost storage, something his staff does manually.
Freeman, Luter and Power Integrations' Degner reflect the growing interest users have in better managing their storage and data. OBS products that minimize and eliminate duplicate data while taking advantage of low-cost storage technologies in highly available configurations are becoming more popular. These products will become even more useful when features such as replication and automated workflow are added.
This was first published in July 2005