Content addressable storage FAQ

Depending on what industry you work in, compliance can be a major issue and content-adressable storage (CAS) has emerged as an option for archiving data on disk. Terri McClure, analyst with Enterprise Strategy Group, discusses the pros and cons of CAS, the CAS market and whether CAS is really a good fit for small to midsized businesses (SMBs). Her answers are also available below as an MP3 download.

Depending on what industry you work in, compliance can be a major issue and content-adressable storage (CAS) has emerged as an option for archiving data on disk. Terri McClure, analyst with Enterprise Strategy Group, discusses the pros and cons of CAS, the CAS market and whether CAS is really a good fit for small to midsized businesses (SMBs). Her answers are also available below as an MP3 download.

Table of contents:

>>What is CAS?
>>Is CAS affordable for SMBs?
>>What are the reasons for SMBs to archive information?
>>What are the alternatives to CAS?
>>What are the drawbacks of using CAS for archiving?
>> Who offers content-addressable storage?

What is CAS, and how does it work?

CAS is a way of storing information that can be retrieved based on its content, instead of its storage location. It's typically used for long-term storage and retrieval of fixed content, like documents stored with compliance for government regulations, or medical records like x-rays and MRIs. In other words, when you think about storage, everything has an address. For conventional file systems, it's a name and a location in a hierarchy of directories.

A CAS system uses the content itself as an address through a unique identifier, typically using a hashing algorithm performed against the content. That makes the content address unique. No two pieces of content have the same address unless the content is exactly the same.

Using content-addressing models provides three main benefits:

  1. The need for applications to understand the physical location of information is eliminated.
  2. A content address acts a digital fingerprint, which can be used for irrefutable authenticity during a legal or regulatory investigation.
  3. Digital fingerprints are used to identify and eliminate duplicate records, such as emails with attachments, which ultimately reduce archive capacity requirements.

Is CAS affordable for SMBs? How do SMBs justify the cost of archiving information?

Like the answer to almost everything in IT, it depends. CAS is much more affordable than litigation. So, if you're in an industry that is regulated, you have to balance your potential litigation costs against the CAS expenditure.

There are entry-level systems in the market, like Nexsan's Assureon product, which is targeted at SMBs, that offer advanced CAS capabilities. So, it is a little cheaper to get in these days. And the inherent utilization gains in CAS from single-instance storage are on the plus side because your are storing less data as a result of eliminating duplications.

Other things to consider are that you're eliminating the overhead associated with handling tracking, transporting and storing tapes and the potential impact of losing even a single tape. They all need to be balanced with the initial cost outlay for a CAS system.

You could save a little money and go with a locked network-attached storage (NAS) system, but you'd have to invest in dedupe software and you wouldn't have the robust metadata that CAS offers.

There are all kinds of things you need to think about. You need to think about the gains from reducing your backup window by archiving onto an active archiving platform. And the gains that you get from moving less active data from expensive primary storage to a secondary CAS storage tier that leverages commodity hardware.

So, there are a lot of things to look at on top of the price of the CAS tier. And it's the overall risk and economics of the storage environment as a whole that needs to be considered when you're looking at whether or not it's affordable.

What are the business reasons for SMBs to archive information?

Regulatory compliance has become as big of a challenge for SMBs as it has been for large enterprises. Thousands of small community banks, hedge funds and investment advisors will face increased regulatory scrutiny due to the lax lending standards that are currently plaguing the U.S. economy. There is no question that a subset of these financial services firms will also find themselves required to produce records that range from loan applications to emails between mortgage brokers.

Enterprise Strategy Group (ESG) research suggests that one out of two SMBs has gone through an electronic discovery event, the same ratio that was seen with enterprise organizations two years ago. This current research also shows that two-thirds of all enterprises have now been through an electronic discovery, indicating that SMBs, including those outside of the financial services industry, will continue to face similar records management and retention challenges.

What are the alternatives to CAS that address the same problems?

Tape is one. But organizations that are forced to deal with increased accessibility and multiple retention periods with a long-term archive have found that tape storage has become less feasible. SMBs will likely be deterred from choosing CAS as a media of choice for archiving due to the lack of resources to handle tapes during discovery or investigation and the cost of restoration during a discovery event.

A legal service provider can charge between $500 and $2,000 per tape, depending on the amount of data and the format of that tape for recovery services. Additionally, since tapes are usually used in rotation, there is a risk that they may be accidentally erased when they should be preserved as part of an ongoing legal matter.

Another challenge for long-term data retention with tape is if you have to keep records for 20 years because of HIPAA regulations or 30 years because of OSHA regulations. What are the chances the tape will be even readable by then or that the tape drive format you are using will still work?

There are other platforms that use NFS or CIFS protocols and the right ones read many files for long-term active archives. For example, Hitachi entered the fixed-content archiving department when it partnered with Archivas in 2006 and cemented its commitment when it acquired Archivas in 2007. That platform uses open standards such as NFS and CIFS to access data archives and storage data in standard formats like XML and HTML.

Something else to consider is that while CAS has a mechanism for storing metadata about objects; these NAS devices don't. So you need to make sure that your archiving data can double as your metadata storage if you're using one of these alternative solutions.

So, while there are alternatives, there are pluses and minuses to every approach.

What are the drawbacks of using CAS for archiving?

There are the likely suspects like cost, capacity, power and cooling because there is more data online than you have in alternative archiving solutions like tape. Some people discussed the performance limitations of CAS, but CAS was not designed to be used for primary I/O-intensive applications, it was designed to be used a secondary storage tier for an active archive. So, it's not expected to have a really high I/O performance.

There is also some talk about hash collisions. A hash collision is when two objects generate the same CAS address or value, but the content is not identical. A CAS system would read the CAS address, which is the unique identifier that the content has generated, indicating an instance of duplicate data causing data loss. The chances of this happening are pretty infinitesimal. But as long as there is a chance, there is a worry. So, most CAS vendors have addressed this concern by using multiple hash schemes in their algorithms.

Don't assume CAS is all that is needed to meet regulatory requirements and if your CAS vendor is telling you that, find a new CAS vendor. That's a trap -- don't fall for it. Application-specific software, policies and policy enforcement, and compliance best practices all need to be considered as part of an overall archiving strategy.

Who offers CAS?

Probably the most well known is EMC, with its Centera platform that has had more than 4,500 customers to date. It has been out since about 2002. The other big players have jumped into the CAS pool; IBM with the DR550 and Hitachi Data Systems (HDS) with the Content Archive Platform. But probably since Centera has been out the longest, none of them quite have the traction that Centera has.

There are also some smaller players. Plasmon offers CAS with optical disk, and optical disk has its pluses and minuses for active archive uses. Nexsan has some traction with its Assureon product, which provides a good entry-level product for SMBs.

Caringo, a startup that is lead by the original CAS inventors that worked for FilePool when EMC acquired FilePool, offers a software-based platform that can run on commodity server hardware. But interestingly enough, most of the competition we see when Centera is involved in a deal comes from block NAS, such as NetApp or Sun.

Terri McClure is an analyst with Enterprise Strategy Group.

Dig Deeper on Data storage management