This article can also be found in the Premium Editorial Download "Storage magazine: Who owns storage in your organization?."

Download it now to read this article plus other related content.

Why is object-based backup so important?
What makes object-based backup particularly appealing is that it drastically reduces the amount of data that needs to be backed up. Object-based backup vendors clearly differentiate themselves from vendors offering disk-only backup appliance solutions.

Because object-based storage is relatively immature, products are only available from a handful of vendors. But given the optimistic upside of the technology, more storage vendors are sure to follow. (See "Object-based storage vendors and products") Note that vendors often use different terminology to describe object-based backup, including reference data, commonality factoring, data coalescence, single-instance storage or content-addressable storage.

How it works
Vendor-specific software implementations commonly leverage the following hardware components in an object-based backup architecture:

  • Client interface. This is could be standalone client module, a networked file system interface (such as NFS or CIFS) or existing backup software clients.
  • Portal node. The portal node is typically a rack-mounted Intel server running a stripped-down version of Linux and specialized software that manages data and object-based processing (parsing, hashing, indexing, etc.).
  • Storage nodes. These are often

Requires Free Membership to View

  • rack-mounted servers containing high-density ATA or Serial ATA (SATA) disk drives. Usually, five or six storage nodes will be clustered with a portal node. Storage nodes usually provide redundant storage services for data storage, and in some cases, redundancy for meta data stores.
  • Gigabit Ethernet. These architectures use Gigabit Ethernet to connect clients to portals and storage nodes. Typically, switches populate the cluster environment to enable high-performance switched network traffic among clustered storage and portal nodes.
These combined components form a disk-based storage environment where all incoming data objects are broken into subfile level blocks, which then have hash-derived signatures created at the storage portal. The signatures are then compared to the master index. If the signatures don't exist in the index, the data segments are copied into the storage node environment. In some deployments, the data parsing, hashing and index processing is distributed across the portal and storage node clusters in a grid computing-style of workload distribution.

Architecturally, the combined hardware components form a clustered storage environment where meta data (hash-derived index files) and data objects are stored and managed across multiple storage nodes. Some implementations use a checksum routine, plus mirroring, to ensure data integrity, while others use RAID technologies. To avoid data loss that could occur at a single storage server or disk array, a RAID-type algorithm propagates data across the storage nodes in the cluster, ensuring that data parity is established for all data in the cluster. Portal servers are also clustered to enable fault-tolerance and load balancing for the front-end data processing function. In some deployments, data parsing, hashing and index processing is distributed across the portal and storage node clusters in a grid-computing style of workload distribution. The result is a highly intelligent storage cluster, built with low-cost, high-density commodity hardware components.

Client access to the subsystems occurs either through standard NFS- or CIFS-mounted file systems or through a custom client software interface, depending on which vendor and implementation strategy best fits the client infrastructure. A common theme--not unlike traditional backup systems--is that data is copied into the backup environment and transferred across IP networks into the object-oriented storage environment. Some object-based backup systems will work with existing backup software tools, while others completely replace installed backup systems and associated client applications. (See "Object-based storage vendors and products")

Object-based backup has the potential to substantially reduce the cost and time it takes to transfer data to a remote location for disaster recovery applications. Object-based backup's low bandwidth requirements may also enable distributed implementations, where smaller satellite installations can replicate data to other locations for disaster recovery purposes.

Object-based storage vendors and products

This was first published in May 2004

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: