Content-addressed storage

CAS system purchase considerations <<previous|next>>

Tutorial:

Guide to purchasing CAS systems

SearchStorage.com

Archives provide a repository for data that must be retained for long periods of time but doesn't need to be accessed frequently. Consequently, most archives use high-capacity SATA disk that offers low cost per gigabyte. Content-addressed storage (CAS) enhances archival storage environments by supporting immutability. Data is stored in a fixed location on disk (the "content address") and cannot be changed once it's been written. This is a critical attribute for compliance and litigation. CAS platforms also prevent data from being deleted until its established retention period had elapsed. The addition of a litigation hold feature can prevent relevant data from being deleted even if its retention period is over.

More information
Visit the Data Storage Management All-In-One Research Guide for background information on provisioning, virtualization, tiered storage and compliance.
Since data is typically added to an archive faster than it's deleted, long-term storage capacity must also be considered, along with data reduction features that make the most of drive space. Users must also evaluate means of protecting archived data through backup or replication schemes. Now that you've reviewed the issues involved in purchasing compliance products, this segment will cover the considerations related to archive/content-addressed storage system purchases. After that, you'll find a series of product specifications to help you compare products from vendors like EMC Corp., Hitachi Data Systems (HDS) and Network Appliance Inc. (NetApp).

Media life and reliability. Archival storage demands may exceed the reliable life of storage media, such as disk, tape or optical disc. Prospective users must recognize the value of their archival data and understand the associated regulatory or legal issues. For organizations that need to store data for just a few years, a disk-based platform is usually sufficient. For long-term storage of data that is rarely ever accessed, removable optical media may be preferable. Disk storage users must also consider protective measures, such as remote replication or RAID groups with hot spares and accelerated disk rebuild features.

How important is immutability? Although it is possible to archive data on virtually any platform, CAS prevents changes to the data, thus guaranteeing its authenticity. Not every organization needs this feature, but those that must address compliance or litigation may find it invaluable. Users may also opt for traditional WORM media, such as CD-R, DVD-R or emerging media like holographic disc. Disk-based CAS platforms can also offer virtual immutability through software-based storage management tools.

Evaluate storage capacity and data reduction technologies. Select an archive platform that can scale to meet future storage needs. This is usually accomplished using additional disks or clustered storage systems. Archives also depend on data reduction technologies to pare down the sheer amount of redundant data. Data deduplication is also currently standard on many archive platforms. Deduplication works best when used on long-term archives that allow plenty of time to eradicate duplicate data, but it's best to test this type of feature in the lab prior to a purchase. [See Data deduplication explained for more details on this technology.]

Consider practical object scalability. An archive will typically index each file or other key elements ("objects"), and future search requests are made against the index that develops over time. However, search performance can deteriorate as the archive grows to hundreds of millions or even billions of objects. Prospective users should evaluate archive performance with a full system, determine if there are any practical limits to the number of files or objects and evaluate any means of overcoming those limits. As an example, the Content Archive Platform from HDS can support a total archive size up to 20 PB consisting of up to 32 billion objects.

Retention and deletion features. Although an archive stores data for the long term, data is not retained indefinitely. Organizations face the challenge of retaining necessary data and ensuring that the data is not deleted during its retention period. Conversely, the data must be securely deleted after the retention period expires, but any data involved in litigation must be exempted from deletion (litigation hold). The archive system must be able to implement retention policies, provide litigation hold for pertinent data and demonstrate secure deletion with a minimum of administrative oversight.

Archival data protection. Consider how your archive data will be guarded against disaster. Some archive platforms include backup-to-tape features for off-site storage, or provide other data protection capabilities, like data migration or data replication between storage systems. For example, IBM's DR550 supports local and long-distance mirroring, allowing synchronous or asynchronous replication between two DR550s. Chances are your data retention needs will far outlast the physical archive platform, so consider the logistics of moving current data to the new platform, then moving data off that new platform to some future storage system. Ideally, data migration should cause minimum disruption while preserving the retention policies, holds, indexes and other data attributes.

Data retrieval and security options. Before purchasing an archive, decide how "accessible" the archived data should be and verify that any management tools can accommodate that level of accessibility while maintaining the necessary security. If you determine that users on the network should be able to access or restore files as needed, minimizing storage administrator involvement, make sure that the management tool supports those tasks, such as searching, and implements access restrictions that will prevent casual or unauthorized access. Users should not be able to invoke or rescind litigation holds and should not be able to alter or destroy any data currently under litigation hold.

Total cost of ownership. Archive systems carry incremental costs, such as future storage upgrades and annual system maintenance agreements. Power is another expense, especially for large archive systems full of spinning disks. Shop by total cost of ownership (TCO), not the lowest price. Features like wide-scale systematic disk spindown (a.k.a. MAID) and other disk-powered management techniques can dramatically reduce power requirements and ease overstretched power budgets.

Protocol support and connectivity. Choose an archive platform that can integrate properly in your current environment. For example, the StorageTek 5800 storage system from Sun Microsystems Inc. includes two 1 Gigabit Ethernet ports in each system. By comparison, the FAS3020 from NetApp includes 20 Fibre Channel ports and 24 regular Ethernet ports. The Content Archive Platform from HDS supports a variety of common network protocols, including NFS, CIFS/SMB, HTTP, WebDAV, SMTO and NDMP.

The archive hardware specifications page in this chapter covers the following products:

  • EMC Corp.; Centera
  • Hitachi Data Systems; Content Archive Platform
  • Hewlet-Packard Corp.; HP Integrated Archive Platform
  • IBM; System Storage DR550
  • Network Appliance Inc.; NearStore platform
  • Nexsan Technologies Inc.; Assureon SA Archive Appliance
  • Permabit Technology Corp.; Permabit Archive
  • ProStor Systems Inc.; InfiniVault
  • Sun Microsystems Inc.; StorageTek 5800 system

    Return to the beginning

    03 Jan 2008