Content-addressed storage (CAS) can potentially offer significant benefits to the enterprise. On the surface, costs are lowered because less expensive
CAS systems are touted as offering improved manageability and scalability. Management is enhanced by enforcing corporate compliance and retention policies -- a notable step toward risk mitigation. "CAS should be easier to scale and manage," says Tony Asaro, senior analyst at Enterprise Strategy Group. "The idea is to have a system that is as easy to manage with 100 files as it is with 100 million files."
Other analysts highlight the value of enriched metadata. "By being more tightly integrated with the application, more metadata can be stored -- more 'information about the information'," says Greg Schulz, founder and senior analyst, StorageIO. "There's much more awareness of the content." Schulz explains that enriched metadata content supports more comprehensive content-based searches and enables powerful business capabilities such as data classification [see the searchStorage.com Tech Closeup on Data Classification].
Appropriate for any size of organization
Although it's easy to see how a large storage infrastructure can apply CAS technology, analysts are clear that the technology is appropriate for any size organization -- as long as there is enough data to justify the cost of acquisition and deployment. Today, CAS is being deployed by companies that span all sizes and industries. "ESG has seen the financial sector embrace CAS," Asaro says. "The healthcare industry is another big market. Any publicly traded company that is concerned about SOX compliance should consider CAS."
Beyond the clear market segments like healthcare or legal, there are also creative uses for CAS technology. Schulz points to e-mail as one powerful use -- saving space up front by moving attachments to directly to CAS. "Open those mailboxes up and enable a GB, but the active data is only going to be on fast disk. Everything else, whether it is archived or inactive, is moved off onto object-based storage," he says. "In other words, it's tiered storage for e-mail."
Sorting data to CAS
Corporate information does not transition to a CAS system automatically. Storage administrators must make conscious decisions about which data is moved, and configure a CAS system accordingly. Such decisions about corporate data fit ideally as part of a data classification initiative, though a formal data classification process may not be necessary. "Yes, it [CAS] is impacted by data classification and regulatory requirements," says Jim Damoulakis, chief technical officer at GlassHouse Technologies Inc. "A lot of it is just dictated by retention requirements for certain kinds of data."
Other analysts take a somewhat broader view, noting the importance of holistic business consideration -- looking at each business function and determining what business interests are best served by moving particular data types to a CAS system. "How are those applications being used, and how is the data from those applications used?" Schulz says. "It's taking a little broader view of data and getting into 'information classification'."
Understanding CAS limitations
While CAS offers a number of benefits, it also presents several notable limitations which should be considered carefully. The first issue is performance. CAS was clearly designed around secondary (tier two) storage. A CAS system is much faster than tape, but CAS is not suitable for high-performance storage tasks. "Given the implementations today, it's not well suited for transaction-type processing applications, update-intensive or heavy read/write activity," Schulz says.
Schulz suggests the only time that CAS might be considered as a primary storage platform is when your primary application involves the collection and preservation (archiving) of data. However, CAS vendors like Bycast Inc. are challenging the common Tier two status, and are pushing CAS toward primary storage (tier one) using products like Bycast StorageGRID. StorageTek's IntelliStor system is another example where archival data can be migrated and stored across multiple tiers.
Another area to guard against is "hash collisions". Ideally, a hashing algorithm produces a unique identification value for every portion of data processed onto the CAS system. When another portion of data produces the same hashing result, the CAS system knows that data is already on the system, and that duplicated data is not resaved -- this is the cornerstone of data reduction (a.k.a. commonality factoring) technology. However, analysts note rare cases where a single hashing algorithm has returned the same result for two or more different data elements. When these hash collisions occur, new data can be lost because the CAS system already thinks that the data exists. Vendors like EMC claim to have resolved the problem of hash collisions by integrating multiple hash schemes into the same algorithm.
CAS is often implemented to address regulatory requirements, but SNIA warns against CAS products that are sold specifically to meet government regulations. Regulatory requirements involve more than just products. Best practices and other protection schemes are necessary to meet regulatory requirements adequately. "The good vendors are not making those claims," says Mark Carlson, chair of the SNIA Technical Council. He emphasizes the need to understand the practices and procedures necessary to meet regulatory compliance first, then evaluate CAS products based on that comprehensive understanding. "Once you get down to the actual requirements for a storage box, look at it from that point of view."
Go to the next part of this article: Content-addressed storage: The vendors
Or skip to the section of interest: