| Home > Content-addressed storage (CAS) explained | |
| CAS Explained: |
|
||
No matter which term you choose, CAS technology continues to be particularly useful in addressing two problems: the long-term retention of content for compliance and/or regulatory purposes, and the archiving of massive amounts of records, images or other information that rarely (if ever) change. One reason CAS is so effective is its use of a hashing algorithm to assign a unique identifier, or digital fingerprint, to each stored object. That process, coupled with storage best practices, ensures that whatever goes into the system is exactly what comes out. If a data element changes, it receives a new unique identifier, aka content address. The stored object's physical location doesn't matter. "CAS is not necessarily a category of storage, like SAN or NAS. It is a mechanism which allows you to do a number of things much more efficiently than would be possible using traditional techniques like file systems," says Paul Carpentier, chief technology officer at Caringo, Inc., a provider of content storage software. Carpentier developed CAS technology for Belgian software company FilePool BV, before the company was acquired by EMC in 2001 and its data archiving software became the forerunner for Centera. A classic case for CAS is e-mail archiving. For instance, East Carolina University chose its CAS system over regular storage array disks after tests showed its IT department would need 60 man-hours to recover a year's worth of messages for any given employee with its existing backup system. Making matters worse, the existing backup process didn't ensure full recovery, since e-mails might have been deleted before the backups were performed.
CAS managing storage clusters After reaching almost 130 TB in its 40 high-capacity PetaBox systems from Capricorn Technologies Inc., CIDR installed a double-density storage array with nine nodes, each with a dozen terabyte disks, from Rackable Systems Inc. Caringo's CAStor software now manages the storage clusters. "You can set the replication for how much redundancy you want for the data, and it's so simple to use and to manage," says Lee Watkins Jr., director of bioinformatics at the CIDR. "When you need additional capacity, you add another node. You bring it up. It's part of the cluster. You're done. Honest, it's that simple." CAS users often employ a redundant array of independent nodes (RAIN) architecture, allowing data to be copied to one or more servers in the cluster, instead of storing it on different disks in the same server. "[RAIN] enables larger, more cost-effective scalability from a capacity standpoint," says Brian Garrett, technical director of the ESG Lab for storage research firm Enterprise Strategy Group. "I don't want to be encumbered by traditional RAID 5 rebuild penalties, so mirroring is better than parity. And if I'm going to mirror and use commodity servers, instead of mirroring within the servers, why don't I mirror the data between the servers over a commodity Ethernet network? What I get is cost-effective scalability."
CAS eliminates traditional file system On the downside, the knock on CAS has been its lack of performance. Running every bit of data through a hashing algorithm is processor-intensive, making CAS prohibitive for anything but infrequently used content. "If you're looking for very, very, very fast storage, you might want to rethink going with CAS," says Greg Schulz, founder and analyst at The StorageIO Group. "With CAS, you're trading performance for intelligence, for information, for optimization."
CAS performance improving Gartner Inc. finally scrapped the narrow CAS category to take a broader market view, comparing Centera to other products aiming to solve the same user problems, even if the technology isn't strict CAS, notes analyst Pushan Rinnen. She uses the example of Hitachi Data Systems' Content Archive Platform (HCAP), which was acquired from Archivas. HCAP competes against Centera but isn't considered CAS because it makes use of a NAS file system on the front end, she says. Other vendors with CAS offerings that compete with Centera include Hewlett-Packard with its Information Access Platform (formerly RISS), IBM with the DR550, NEC America Corp. with its HydraStor and Permabit Technology Corp.
"Legacy technology with scalability limitations" The eXtensible Access Method (XAM) standard that EMC and other vendors worked on, through the Storage Networking Industry Association, aims to address the proprietary tag for connecting applications to object-based storage systems. But XAM wasn't ratified until July, and XAM-supporting products have yet to make an impact. "We gave a huge amount of intellectual property as the starting point for this open API, so an application that writes to XAM can store information in Centera or someplace else," says Steve Spataro, director of Centera marketing at EMC. He countered the proprietary accusation, saying that Centera's API was always available to anyone via the Web and EMC's intention was never "to keep a customer locked into Centera." It remains to be seen how the new XAM standard will affect the CAS space, but the sands had already been shifting for quite some time. "CAS used to have a lot of benefits in terms of single-instance store and some self-healing properties," says Rinnen, "but some of these [features] are getting less distinctive because other vendors have come up with deduplication technologies, which are even more superior than single-instance store." The sweet spot for CAS is still secondary storage, although Caringo's technologists are trying to push the envelope. "Centera is really positioned as an archiving type product. Our ambition is much further. We are going after the active volume storage market as well as archiving," says Carpentier, conceding that it will be an uphill battle. Garrett says he understands that argument, especially as processing power becomes more affordable. But for now, he says, content addressable is still for secondary storage.
'); // -->
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||
|
||||||||||