Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Evaluating and purchasing CAS archives

Content-addressed storage (CAS) is a specialized type of archive that provides the inexpensive and high-capacity storage needed to retain data that, although accessed infrequently, still has long-term relevance to the enterprise. More importantly, CAS provides immutability to data stored in fixed locations on disk (the content address). CAS has become indispensable for organizations concerned with litigation and compliance, and this chapter will list the criteria for purchasing CAS systems.

Content-addressed storage (CAS) is a specialized type of archive that provides the inexpensive and high-capacity storage needed to retain data that, although accessed infrequently, still has long-term relevance to the enterprise. More importantly, CAS provides immutability to data stored in fixed locations on disk (the content address). Archived data cannot be changed once it has been written, and cannot be deleted until the established retention period has expired. This combination of features makes CAS indispensable for organizations concerned with litigation and compliance.

Still, the acquisition of new archival storage carries ramifications. Today's archives must

  • scale to billions of objects across petabytes of storage
  • maximize capacity through data reduction technologies
  • find individual files or folders based on established search criteria
  • retain each data type for the prescribed period, delete expired data securely
  • prevent the deletion of data subject to litigation, and
  • guard against data theft or loss.

In short, archives can be a complex and demanding storage tier.

SearchStorage.com has already covered the general criteria involved in purchasing an archived product. This chapter will cover the purchasing criteria for CAS systems. After that, you'll find a series of product specifications to help you compare CAS products from vendors such as EMC Corp., Hitachi Data Systems and NetApp.

More information
Visit the Data Retrieval Strategies All-In-One Research Guide for data retrieval guidelines from backups, archives, email, and insights into document management software.

Also visit the Advanced Data Storage All-In-One Research Guide for additional information on storage components, data protection, SAN, NAS, and management.

Purchasing criteria for CAS systems

What impact will retention requirements have on your media? An archive may need to store data for decades. Lengthy storage requirements can exceed the normal storage life of media. Magnetic storage media such as tape and disk is usually fine for up to 10 years. Optical media such as CDs and DVDs extend storage to up to 20 years. Emerging holographic storage technologies tout retention periods of 50 years.

When evaluating CAS or any archival storage platform, you should know how access and retention requirements will influence media choices. Disk-based systems typically protect long-term data through RAID groups with hot spares and accelerated disk rebuild features. Some organizations mix media, systematically moving data from disk to tape or optical disc, then refresh the media by periodically rewriting the data on disk, tapes or optical discs to eliminate data loss due to media degradation.

How important is immutability? The main difference between CAS and other archival storage systems is immutability -- once data is written to the CAS platform, it cannot be altered or deleted until its established retention period has expired. Thus, CAS can guarantee the authenticity of its data. Not every organization needs this type of immutable storage, but it can play a role in compliance and litigation. Some CAS platforms rely on optical media for WORM (write once, read many) immutability, while other platforms use disk storage protected with software-based storage management tools.

Data reduction technologies, including data deduplication. Any archiving system should include data reduction technologies such as compression and data deduplication, which can save a single unique iteration of a file, block or byte to disk. Properly implemented, data deduplication can reduce storage requirements by a factor of 50, meaing that a 500 GB disk can effectively store the equivalent of 25 TB. Deduplication allows far more data to be retained on disk, for much longer periods, without offloading to tape or optical media. Note: Since encryption removes the data redundancy that deduplication relies on, data should be deduplicated before it's encrypted.

Storage platform scalability. Although storage space is vital, it's more important to consider the total number of "objects" (files, blocks or bytes) that an archive system can support. As each object is stored, it is also indexed for search purposes. For example, the Content Archive Platform from Hitachi Data Systems can support a total archive size up to 20 PB, containing up to 32 billion objects. Storage administrators should select an archiving system that can support the anticipated number of objects stored by the end of the platform's working life, and should test the prospective storage system to see that its performance remains steady as object counts increase.

Indexing and search features. Time is the worst enemy of data. It's not enough to simply deposit file data to a CAS archive and worry about finding the data later. Data may be retained for decades before it's referenced by users or legal council. By then, no one will remember filenames, folders, creators or other key attributes of the data. This can make it impossible to locate relevant information, defeating the purpose of an archive in the first place. The archive system must include relevant metadata as objects are stored, and index the metadata so that it is searchable. Evaluate the CAS platform's indexing and search capabilities to understand how searches are performed and results are presented -- especially as the CAS platform scales to billions of objects.

Retention and deletion features of the platform. An archive platform should include (or integrate with) policy managers that will support data retention based on file type or user. A Microsoft Word document may have different retention requirements than Microsoft Exchange mailbox data. Similarly, archived data from an HR department may have different retention requirements than R&D test results. The policy manager should also support file deletion, eradicating data as its retention periods expire and logging the activity for management purposes. Retention features should always include litigation hold capability that will prevent deletion of data involved in compliance or litigation issues regardless of the retention period.

Impact on backups and other data protection schemes. Archival storage must be integrated into the enterprise backup strategy. Even though disk-based platforms use RAID to guard against unexpected disk faults, data protection needed to be extended down to the archival storage tier. However, archives are rarely included in traditional backups because archival data typically doesn't change, and is only accessed infrequently anyway. It's more practical to see the archive platform support its own backup schemes such as backup-to-tape features or mirroring and remote replication capabilities for off-site duplication.

When considering backup and recovery, evaluate data migration capabilities related to technology refreshes. Just because a piece of data must be held for 30 years doesn't mean you're going to keep that archive platform for 30 years, so consider how an archive can be migrated to new platforms in the future while minimizing service disruptions and maintaining retention policies, holds, indexes and other archiving information.

Encryption and other security features. Security features are increasingly important in long-term archival platforms. Evaluate the use of encryption in the archive system itself to protect sensitive data such as client names, addresses, social security numbers and credit card information. Note: Data reduction technologies will not work once data has been encrypted, so be sure that the archive platform will deduplicate data prior to encryption.

Evaluate other security features that regulate the accessibility of data. For example, a physician should be able to access patient records from the archive, but should not have access to archived personnel records. Conversely, an HR specialist should be able to access employee records, but not open patient records on that same archive. Implement authentication and restrictions to ensure that a user can only access limited elements of the archive.

Power saving. Even with data reduction technologies, long-term archives can grow to thousands of disks, presenting significant power and cooling demands. When evaluating an archive platform, look at available power reduction capabilities like disk throttling and systematic disk spindown (sometimes dubbed MAID).

This chapter includes product specifications for the following CAS products:

  • Caringo; CAStor
  • EMC Corp.; Centera
  • Hitachi Data Systems; Content Archive Platform
  • Hewlet-Packard Corp.; HP Integrated Archive Platform
  • IBM; System Storage DR550
  • Network Appliance Inc.; NearStore platform
  • Nexsan Technologies Inc.; Assureon SA Archive Appliance
  • Permabit Technology Corp.; Enterprise Archive
  • ProStor Systems Inc.; InfiniVault
  • Sun Microsystems Inc.; StorageTek 5800 system

    Return to the beginning

  • Dig Deeper on Long-term archiving

    Start the conversation

    Send me notifications when other members comment.

    Please create a username to comment.