Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Finding the right CAS system for your organization

Content-addressed storage (CAS) is a specialized type of archive that provides the inexpensive and high-capacity storage needed to retain data that, although accessed infrequently, still has long-term relevance to the enterprise.

The following CAS products were selected based on input from industry analysts and SearchStorage.com editors. The specifications, which were provided by the vendors, are current as of July 2008. They are periodically upgraded. Vendors are welcome to submit their updates and new product specifications to Matt Perkins.

Go to the first product snapshot, or select the desired product below:

  • Caringo Inc.; CAStor
  • EMC Corp.; Centera
  • Hitachi Data Systems; Content Archive Platform
  • Hewlett-Packard Corp.; HP Integrated Archive Platform
  • IBM; System Storage DR550
  • NetApp; NearStore platform
  • Nexsan Technologies Inc.; Assureon SA Archive Appliance
  • Permabit Technology Corp.; Enterprise Archive
  • ProStor Systems Inc.; InfiniVault
  • Sun Microsystems Inc.; StorageTek 5800 system

    Return to the beginning

    Product Snapshot #1

    Caringo Inc. CAStor

    Maximum storage capacity: Scalable to petabytes
    Immutability: CAStor delivers WORM storage ensuring that content cannot be changed once stored in the system and cannot be deleted unless its retention period has expired. Content integrity is ensured for the future with a patent pending upgradeable hash capability that allows the digital fingerprint associated with the file to be changed to a more robust algorithm if the original is compromised as is the case with MD5. CAStor provides applications and/or content owners with a content integrity seal that includes the hash key so they are able to independently verify the authenticity of content being stored.
    Litigation hold: Applications generating and managing electronic records and content are able to fully manage the retention cycle to ensure that any items that must be maintained due to litigation or regulatory investigation cannot be deleted from the system.
    Retention/deletion features: CAStor allows the retention period to be set for each individual file as well as at a directory level giving the same retention period to all files stored in that directory. Content can only be deleted from CAStor when the retention period has expired and it will remove all replicas stored in the local cluster and in distributed clusters.
    Storage reduction features: CAStor is designed to provide single instance storage as a back-end process so that read/write performance is not impacted.
    Reporting and logging features: CAStor includes a browser-based administrative console for monitoring cluster activity, capacity and event logs. The interface is the same whether it is a three-node cluster or a cluster with hundreds of nodes. Administrators can also seamlessly retire nodes from the cluster.
    Encryption and security features: CAStor stores each content object with a UUID that is a 128-bit randomly generated number that must be known in order to retrieve content from the cluster. Files can be encrypted as a preprocess before storing in CAStor using commercial cryptographic technologies.
    Metadata features: CAStor stores metadata along with the actual file data as a whole object in the cluster associated with a UUID. Applications can set the values for standard metadata including file type, retention period and number of replicas that are immutable. Custom metadata is also supported, allowing applications to store additional descriptive information about content being stored. The hash or digital signature is calculated and stored as metadata, which provides the ability to update the hash algorithm as needed. A metadata element called a LifePoint allows an application to describe the lifecycle for a particular content object when it is stored. Mutable metadata is supported for records management scenarios in which there is a need to update certain metadata elements.
    Scalability: CAStor scales from 1TB to petabytes simply by adding a new node(s) to the cluster without provisioning or configuring the additional storage. Performance scales linearly with new nodes as the added processing power is immediately utilized to handle I/O traffic. This enables CAStor to deliver the performance needed for massive reads and writes of small files as well as the high throughput demands of large files.
    Management tools: CAStor is self-managing, self-healing and self-balancing, which minimizes the amount of active management and administrative effort required for storage clusters. It automatically migrates content within the cluster to balance workload and capacity and responds to disk or node failures without administrator intervention to recover content and without losing availability of data. Monitoring and administration are performed through the browser-based management console.
    Archiving software integration: CAStor is integrated with applications for medical image management, HSM/file management and email archiving, and continues to pursue other archiving software relationships. Native integration to CAStor is through a simple HTTP interface and also supports CIFS and NFS where a traditional file system interface is required.
    Connectivity: CAStor is IP-based storage and uses standard Gigabit Ethernet for internode communication of the cluster as well as to the production LAN.
    Base cost: Pricing is based on storage capacity and ranges below $3 per gigabyte.
    Detailed specs: http://www.caringo.com/products.html
    Go to beginning

    Product Snapshot #2

    EMC Corp.Centera

    Maximum storage capacity: Scalable to petabytes
    Immutability: Centera's Content Addresses are digital fingerprints of the saved content. Centera-based content cannot be overwritten and it enforces an organization's retention and disposition policies intrinsic in the storage. The result is that information is non-erasable before its retention period expires and, if desired, it can also be configured so that information is kept forever. Additionally, Centera's WORM functionality helps clients address compliance with internal governance and regulatory requirements.
    Litigation hold: Centera's Advanced Retention capabilities enable an application to put a hold on a specific piece of or whole category of information. Litigation hold locks content irrespective of its original retention period and prevents deletion of information. Information may only be deleted when the hold is released and only if the original retention period has expired.
    Retention/deletion features: Centera Governance Edition and Centera Compliance Edition Plus allow retention periods to be set on any and all stored information. Centera also offers an e-Shredding feature that complies with requirements of DoD 5015.02. It makes the information non-recoverable but the media reusable.
    Storage reduction features: Centera does single instance storing of information, meaning that any unique piece of information is stored only once, no matter how many applications or users request that it be stored. This single instancing is based on digital fingerprinting.
    Reporting and logging features: Centera Console is a web-based UI that enables monitoring and reporting. CentraStar 3.1.3 (the Centera operating environment) does system logging for all successful and failed log-ins.
    Encryption and security features: Centera's Content Address (digital fingerprint) for each piece of unique information is an encrypted key of the content. Centera adheres to the EMC Corporate Security Compliance features of encrypted management connections, profile password persistency and restricts the IP addresses for a given profile. Encryption of information would be done by the application that sends the information to Centera to be stored.
    Metadata features: Centera calculates a Content Address for the information object. That Content Address goes into a metadata file specific to that user's use of the information. The metadata file (an XML file) contains user information, a time stamp, user information, text annotations and tags as sent from the application, and the content address of the stored object. The stored metadata files in the archive repository can be searched with CenteraSeek (the Centera search engine) for eDiscovery or other purposes without opening the individual information items.
    Scalability: As a scalable repository that self-discovers new capacity and has the ability to containerize very small objects within one content address if desired, file and object count is not a meaningful measure of Centera capacity utilization.
    Management tools: CentraStar, Centera's operating environment, is self-configuring, self-managing and self-healing. It handles all the logistics of storing and retrieving data objects, including the creation of content addresses. CentraStar delivers storage, retrieval and network-aware intelligence. It facilitates non-disruptive maintenance and upgrades, and with its layered software, content replication for disaster recovery and business continuity. In addition, Centera Console is a web-based UI that enables monitoring and reporting.
    Archiving software integration: Over 250 ISVs have integrated applications to the Centera API. These applications cover over 20 categories of archiving, including email archiving, medical image archiving, content archiving, voice/video archiving and network intelligence archiving. Non-integrated applications can leverage Centera as its archive via NFS, CIFS, FTP, HTTP or Mainframe HSM interface access methods.
    Connectivity: Centera is IP-based storage with Gigabit copper (optionally optical) Ethernet connectivity to a LAN/WAN.
    Base cost: As a scalable archive repository pricing is based on capacity purchased. Upon a determination of needed capacity a price quotation can be furnished on request.
    Detailed specs: http://www.emc.com/products/systems/centera.jsp
    Go to beginning

    Product Snapshot #3

    Hitachi Data Systems Content Archive Platform

    Maximum storage capacity: 64 LUNs/node x 2 TB/LUN x 80 nodes = 10,240 TB
    Immutability: Yes
    Litigation hold: There is a retention setting that correlates to being put into Litigation hold.
    Retention/deletion features: Event-based retention (retention is changed when an external event occurs); infinite retention (retention remains infinite until converted to a discrete value); DoD 5520-M Shredding (an object can be tagged for shredding upon deletion. If the object is tagged then when it is deleted it is removed and shredded.)
    Storage reduction features: We support collision-proof object-based deduplication. A background policy first identifies candidates for duplicate elimination. The next step performs a binary comparison between the two objects. If the comparison returns true then and only then is the content rendered down into a single instance.
    Reporting and logging features: Logging of system functions is recorded internally and displayed on the administrative interface. Capacity and full content search indexing.
    Encryption and security features: Encryption is supported on ingestion via HTTPS. Encryption is supported over wide area replicated links via TLS. Encryption of archive objects to a backup application over NDMP is supported. Encryption over the SAN fabric is supported using AES encryption.
    Metadata features: Both explicit and implicit metadata capabilities are supported. Implicit metadata (object creation time, object size, cryptographic hashes) are computed by the system and linked to the object. Explicit metadata -- users of the system may set qualities such as retention time or choose to link an arbitrary XML document to the objects archived by the system.
    Scalability: 32 billion
    Management tools: The system itself has an embedded element manager. It also integrates to Hitachi's HiCommand suite of storage management applications.
    Archiving software integration: See: http://hds.com/partners/solution-partners/
    Connectivity: LAN (two fully redundant ports per node are available to connect to a customer network); SAN (if a user chooses to connect the system to the SAN fabric, each node has up to 2 FC ports and can connect either directly or through a fabric to storage). Further multipathing and traditional active-active clustering is supported (swap of LUNs between node pairs).
    Base cost: Complete kit cost of the above system is ~$86K
    Detailed specs: http://www.hds.com/products/storage-systems/content-archive-platform/index.html
    Go to beginning

    Product Snapshot #4

    Hewlett-Packard Corp.; HP Integrated Archive Platform

    Product details not available at this time.

    Detailed specs: http://h18006.www1.hp.com/products/storageworks/riss/index.html
    Go to beginning

    Product Snapshot #5

    IBM; System Storage DR550

    Maximum storage capacity: 168 TB with disk and petabytes with attached tape to DR550.
    Immutability: The DR550 offers a secure, non-erasable and non-rewritable storage archiving repository for highly regulated industries and industries with long retention needs. DR550 is a policy-based data retention solution where data is maintained as non-erasable and non-rewritable until deletion is permitted by retention policy. Its internal management system is hardened to prevent any system administrator deletion, whether intentional or inadvertent.
    Litigation hold: A Deletion Hold feature allows the selected content to be protected against the normal end of life (policy expiration) process. This is useful should a record or set of records need to be retained for legal, audit or other reasons.
    Retention/deletion features: The DR550 enables management of data that has no explicit retention period, such as employee (as long as employed) and customer (as long as account is open) data, through an event-based records management feature. It can help protect these records from deletion until a specific event occurs. A designated object or group of objects can be protected against the normal end of life (policy expiration) process through the Deletion Hold feature. The DR550 enforces force data retention polices that maintain data as non-erasable and non-rewritable until deletion is permitted by retention policy.
    Storage reduction features: Compression can be performed by the tape systems attached to the DR550. Deduplication can be performed by the content management applications that use the DR550 for archive storage.
    Reporting and logging features: Not provided
    Encryption and security features: Data encryption options include the option for transparent key management done by the DR550 or by an external key management application. The DR550 client (DR550 software that sits on the same server as the archiving application software) can encrypt archive data using AES 128 or DES 56 encryption. The archive data is in encrypted form during transmission to the DR550 and remains in encrypted form when stored in the system, including backup copies. Tape encryption capabilities offered with TS1120 or LTO 4 tape drives are supported.
    Metadata features: Archive policy metadata is added to the objects when they are added to the DR550.
    Scalability: Total number of objects supported depends on the amount of metadata stored with each object. On average, up to 500 million objects can be stored.
    Management tools: The DR550 supports SNMP managers by providing SNMP information on the DR550 components.
    Archiving software integration: DR550 integrates with more than 40 archiving applications. This is direct integration to the DR550 client. DR550 also supports NFS/CFS file interface through DR550 File System Gateway, which broadens application support options.
    Connectivity: DR550 is a network attach product -- Ethernet LAN
    Base cost: List price for 900 GB raw capacity is approximately $23K.
    Detailed specs: https://www-03.ibm.com/systems/storage/disk/dr/index.html
    Go to beginning

    Product Snapshot #6

    NetApp; NearStore platform

    Product details not available at this time.

    Detailed specs: http://www.netapp.com/products/storage-systems/near-line-storage/index.html
    Go to beginning

    Product Snapshot #7

    Nexsan Technologies Inc.; Assureon SA Archive Appliance

    Maximum storage capacity: Assureon can be a single appliance using internal Assureon appliance storage or can be deployed in a scalable grid architecture sharing a single Assureon storage system or the resources of an entire Assureon SAN. The maximum capacity Assureon can scale to in a single system is 5400 TB (5.4 PB).
    Immutability: Assureon is a disk-based WORM solution. Once files are stored they can't be altered nor deleted unless the retention period has ended. Assureon storage systems are also fully hardened. Users can't bypass Assureon appliances and directly access the Assureon storage LUNs/systems.
    Litigation hold: Search results can be reviewed with a special viewer. Those files can be placed on legal hold, meaning they can be retrieved as a group, restored on to another system, copied to removable media for mailing etc. Files that are on legal cannot be deleted even if their retention period has expired. Assureon supports thousands of different legal hold groups.
    Retention/deletion features: Assureon has retention periods from one day to 999 years. Retention period policy is applied on a file-by-file basis and can be applied for a specific file folder being archived, a specific person and a specific file type. Once a file's retention period is met, the file is moved to a deletion folder so they can be reviewed by an admin. Deleted files can migrated to other media or are permanently destroyed with any encryptions key shattered using up to a 7-wipe process on the media.
    Storage reduction features: Data compression and deduplication. Each file entering Assureon is give a dual hash using SHA1 and MD5 which gives the file a unique CAS fingerprint string of 260 characters. If the file/CAS fingerprint is already stored in Assureon, it will not save the file, but will update the metadata data base about the file metadata. This process along with data compression can reduce the required storage up to 50%.
    Reporting and logging features: Reports and logs include transaction logs, disposition logs, archive logs and manifest, replication logs, storage utilization reports. There are also audits that Assureon runs in the background to repair damage, load balance operations, replace lost files, etc.
    Encryption and security features: Provides AES 256-bit Smart Key Encryption. Each file has an individual AES 256 key, which is managed automatically by a multiply-redundant, remote, Key Server as well as a local Key Manager which caches unused keys and manages a local repository of keys used to protect files. All accesses to the assets are logged, providing an audit trail and requiring a secure login. The built-in RAID subsystems have firmware to prevent the deletion of volumes or RAID sets, and can be locked such that they will only respond to authenticated I/O from the Assureon cluster.
    Metadata features: Assureon processes every file and creates a uFID for the file. The uFID is stored in the system database. The uFID and other file data, such as its name, extension, creator, type, date of creation, retention policy, asset serial number, encryption key serial number, CAS fingerprint and source path are combined into a metadata record which is digitally signed and bound to the asset.
    Scalability: Assureon is built on a grid that can support up to 256 Assureon Appliance nodes and can federate multiple search databases and grow into a SAN. The number is unlimited and can easily scale into the billions of objects/files.
    Management tools: A web-based GUI allows the admin to remotely configure the agents that Assureon uses to gather files from remote systems. On each system, there may be multiple watched directories, which can have individual retention policies and metadata. Also from the GUI, files may be searched, transaction logs examined, disposition behavior configured, and files or groups of files may be copied to remote systems. No user configuration of storage space or cluster behavior is required as that is all automatic. Extensible reporting tools are provided.
    Archiving software integration: Archiving products from Mimosa, Zantaz, Symantec, Messaging Architects, Digital Imaging, ZL Technology, Idatix, Jack Henry, Plasmon, EnterpriseVault/Symantec and others. Assureon also includes a NAS interface for NFS and CIFS, and Assureon archive agents for Windows.
    Connectivity: Dual GB Ethernet ports and dual 4GB FC ports. Assureon can also be customized with different of more ports.
    Base cost: Base price starts at under $49K
    Detailed specs: http://www.nexsan.com/assureon/saapp.php
    Go to beginning

    Product Snapshot #8

    Permabit Technology Corp.; Enterprise Archive

    Maximum storage capacity: Scalable from 96 TB to 3 PB of raw disk storage.
    Immutability: Both application-set WORM and policy-based (user, group, file type) WORM are supported. Digital fingerprinting with the SHA-256 cryptographic hash also ensures the immutability of stored information.
    Litigation hold: Retention capabilities enable an application to put a hold on a specific piece of information or a whole category of information.
    Retention/deletion features: Retention periods can be set on any and all stored information. When information is deleted from the Permabit Enterprise Archive, the corresponding data chunks are removed; the links between the information and data chunks are broken such that the original information cannot be reassembled. Deleted data chunks are continuously overwritten, ensuring that information is completely deleted and the space is reclaimed.
    Storage reduction features: Scalable data reduction provides compression and data deduplication at the sub-file level, without capacity limitations; data reduction capabilities scale along with the system to multiple petabytes. Permabit's RAIN-EC functionality further reduces overhead and storage requirements.
    Reporting and logging features: Notification and event management maintains a log of monitored events which can be accessed through the web-based Permabit Management console. The log displays the date and time of each log entry, a description of the event and a URL link to the affected volume or node, if applicable.
    Encryption and security features: The volume encryption feature uses AES to protect information from physical media theft. Data is encrypted at rest, as well as during all replication actions. Authentication via Microsoft Active Directory as well as through corporate LDAP or NIS mechanisms. Security is further enhanced through the support of host and identity-based access controls, which permit administrators to specify specific hosts that may access specific Permabit volumes and which users can access specific files or directories on those volumes.
    Metadata features: Internal object structure is available through XML Content Certificates on WORM volumes. Standard NFS, CIFS and WebDAV interfaces allow simple application storage of additional metadata.
    Scalability: Scalable to multiple petabytes with grid architecture. The system automatically load balances and integrates the new node into the grid. New storage components can be added at any time, including different capacity, performance, or generation of hardware. Older hardware can removed at any time, providing seamless internal media refresh over dozens of years without the need for costly and risky migration projects.
    Management tools: Enterprise Archive is self-configuring, self-managing and self-healing. It handles all the logistics of storing and managing information over its lifecycle. Permabit's health check technologies predict potential failures and preemptively alert administrators and our support team that performance is outside of customer-defined parameters.
    Archiving software integration: Thousands of ISV applications have been integrated with the Permabit Enterprise Archive. Permabit is a file share that is simply a target for these applications. With an open architecture and support for NFS, CIFS and WebDAV, Permabit enables any application to integrate with the Enterprise Archive without the need for expensive programming.
    Connectivity: Multiple Gigabit Ethernet interfaces (one per access node); NFS, CIFS and WebDAV open interfaces.
    Base cost: $5/GB with Scalable Data Reduction.
    Detailed specs: http://www.permabit.com/products/data-center-series.asp
    Go to beginning

    Product Snapshot #9

    ProStor Systems Inc.; InfiniVault

    Maximum storage capacity: The number of RDX removable disks inserted into the RDA (removable disk array) slots are all available as online capacity, which can range up to 39 TB currently, but the overall capacity supported is infinite since RDX media can be removed for offsite vaulting and new RDX disks inserted.
    Immutability: WORM mode is enforced in InfiniVault and each RDX cartridge that is used in InfiniVault has hardware-enforced WORM to ensure immutability.
    Litigation hold: Legal hold is supported on a per-file basis where an administrative interface into InfiniVault is used either to do eDiscovery searches or file searches and then selectively apply a unique legal hold to those files. The legal hold will prevent any deletion due to retention expiration until the legal hold is removed. There may be multiple legal holds outstanding per file.
    Retention/deletion features: Retention period may be set and InfiniVault will manage the disposition of the data upon retention expiration. When the retention period expires on a file, based on the configuration selection for the independent archive, the file may be deleted with a standard delete or with a secure delete, which will recall all copies of the file and perform a digital overwrite of the data.
    Storage reduction features: For each independent archives, the configuration settings exist to do compression and single instancing. Compression is a Lempel-Ziv algorithm. For archiving which by definition is a one-time move of a file to the archive, within a specific archive, if the same file is transferred (one that has the same hash code digital fingerprint), only one instance will be stored.
    Reporting and logging features: An audit trail is maintained for all data ingested on a per-file granularity. Reports for the chain of custody from the audit trail as well as operational reports regarding statistics on data ingested, etc., are available.
    Encryption and security features: AES 256 encryption of files on the removable RDX cartridges may be selected on a per-archive basis. Access control security for both file access and access to the management GUI is also required. All encryption keys are managed internally by the InfiniVault system.
    Metadata features: For standard format files, a content indexing operation is performed and the results stored in a database for searching. Other information regarding a specific file (encryption, digital fingerprint, cartridges stored, etc.) are maintained in a file database.
    Scalability: Depends on the specific model. The high-end model does not have a defined storage capacity limit.
    Management tools: A web-based GUI is provided to all configuration, administration and management activities.
    Archiving software integration: InfiniVault is a target storage system for archiving that is accessed with UNC paths. Archiving software that can put data on a UNC path (a NAS device is the example) will be able to use the InfiniVault for archiving and compliance protection.
    Connectivity: A gigabit Ethernet port is used to connect InfiniVault to the network.
    Base cost: The InfiniVault Model 30 starts at $32,995; the InfiniVault Model 100 starts at $74,995.
    Detailed specs: http://www.prostorsystems.com/infinivault.php
    Go to beginning

    Product Snapshot #10

    Sun Microsystems Inc.; StorageTek 5800 system

    Maximum storage capacity: The system is designed to scale multi-PBs; today it supports 512 TB.
    Immutability: The 5800 computes a checksum for each block in the system. This is checked upon retrieval of the block for accuracy. As part of each file's ID, the 5800 computes a cryptographic SHA 1 checksum. Each file is checked periodically against its block and file level checksums for integrity.
    Litigation hold and retention/deletion features: The system has built-in user definable metadata that can be used to add retention times and any legal hold flags. The product will not automatically delete an object at the end of its retention time.
    Storage reduction features: Not provided
    Reporting and logging features: The system collects log files. You can specify an external logging host to which the Sun StorageTek 5800 system sends detailed log messages for debugging purposes or to enable easy integration with existing monitoring systems. Email notifications and the external logging host are configured on a per-hive basis.
    Encryption and security features: Administrative access is provided through a separate dedicated virtual IP address and is password protected. There is no encryption.
    Metadata features: Metadata is stored in the system's object archive as an XML document the same way that file data is stored. Up to two disks or nodes can fail without impacting metadata availability. Metadata is indexed by the internal highly available database and kept cached in memory across the cluster for fast searches. Extended metadata goes beyond the system metadata to further describe each data object. For example, if the data stored on the system includes medical records, extended metadata attributes might include patient name, date of visit, doctor name, medical record number, and insurance company. The schema describes what attributes are available. Users can define attributes through the system's administrative interfaces.
    Scalability: Up to 10 million files/objects today.
    Management tools: Using the CLI or the GUI, the user can perform administrative tasks such as monitoring the system and individual components such as nodes or disks, specifying which clients are authorized to access data on the system, setting up the system schema, powering down, and rebooting the system.
    Archiving software integration: The system is integrated with the most popular open archiving frameworks: Fedora Commons, DSpace and EPrints, as well as medical imaging applications (CareStream, Telvent and Tianni Spirit); Others are in progress. Using the Storage Switch or SAM gateway products, the system should integrate seamlessly with any archiving software.
    Connectivity: Two gigabit Ethernet ports: Broadcom Gigabit Ethernet (BGE) and Nvidia Gigabit Ethernet (NGE). The gigabit Ethernet ports are configured into Solaris Internet Protocol Multi Pathing (IPMP) group for transparent failover. An elected master node controls load spreading on the switch.
    Base cost: Not provided
    Detailed specs: http://www.sun.com/storagetek/disk_systems/enterprise/5800/
    Go to beginning

  • Dig Deeper on Long-term archiving