This article can also be found in the Premium Editorial Download "Storage magazine: Who owns storage in your organization?."
Download it now to read this article plus other related content.
Data currency--to a backup software system--determines whether or not data has changed since the last incremental backup. If a single file attribute has changed since the last backup, that file will inevitably be backed up again. This applies to backup systems using the incremental forever methodology. The problem is compounded in environments where a full backup is run daily, weekly or monthly, in addition to differential or incremental backups. The result is that environments store many redundant versions of files, although actual subfile level changes to the data may be insignificant.
The volume of stored data stored can be extraordinary. Based on the backup metrics in place, the ratio of active to backup data may be as high as 1:25. "Traditional backup is storage-intensive" on this page shows ratios that are likely to occur with common backup scenarios. Specific ratios vary, depending on site-specific data change rates, backup policies, archival retentions and the underlying technologies being used for backup.
The redundancy of data exists because of the practice of evaluating data currency exclusively at the file level. Vendors aim to improve the ratio of primary (active) to secondary (backup) storage to a ratio of 1:2. This means that a 20TB object-based disk backup disk pool would sufficiently manage data backups in an environment with 10TB of active data. Because the ratio is low, a large pool of
Where it can work best
Two common enterprise file types--e-mail file stores and databases--illustrate how the benefits of object-based backup can be realized. These tend to be large files that are backed up entirely if they are changed--even when the changes are not extensive. It's such a concern that it has spawned a software cottage industry.
The ubiquitous Microsoft Outlook personal folder files (.pst) residing on vast numbers of corporate file servers are a case in point. Each time Outlook is run, a personal folder file is opened and updated, regardless of whether any e-mail messages were added or deleted from the file store. Because the .pst was updated by the application, the file will be backed up during the next backup session because appears to have changed. For a company with hundreds or thousands of Outlook users, the volume of unnecessary backup activity can be staggering.
Database applications pose a similar problem. In ideal circumstances, backup applications will interface directly with the database backup API to extract only incremental changes to a database. These are effective solutions, but more common practices for database backup are typically less efficient. It's common to export the database to file for backup, quiesce the database for a full backup or copy the database to a split mirror volume for backup. Like the mail store file, every moment a database application runs, the database file attributes are modified, resulting in the entire file object being backed up.
Object-based disk technologies are also being adapted to serve a primary storage role. EMC Corp.'s Centera offers a similar concept to provide content-addressed storage (CAS) functionality for content management applications, where large volumes of redundant files can be managed in a singular instance in a disk-based appliance. EMC's Centera offers a way for applications to write data into a content-addressable primary storage environment, using an API designed for accessibility by a wide array of content management applications. The object-based market is entirely different, however, due to the ability of those products to manage data objects at a subfile level.
Object-based backup vendors offer a different technology breed, yet with the same basic principles and goals: to help organizations manage--in an extremely efficient manner--large volumes of infrequently changing data in a disk environment. Whether or not object-based backup will function as primary or secondary storage, or both, is still to be determined, so expect to see multiple product marketing strategies from the vendor community.
This was first published in May 2004