Content-addressed storage (CAS) provides a near-line archive of fixed-content business data. Although CAS hardware primarily involves a secondary (Tier-2) storage array, a CAS system offers data reduction technology that shrinks the total storage requirements by eliminating redundant or duplicate data. Software is also used to integrate the CAS system to enterprise business applications, allowing for comprehensive search and management...
of the content. Major vendors in the CAS market include -- in no particular order -- EMC Corp., Nexsan Technologies, Sun Microsystems Inc., Storage Technology Corp. (StorageTek), Permabit Inc., Hewlett-Packard Co. (HP), Bycast Inc., IBM and Avamar Technologies Inc. Most vendors (see sidebar) possess a remarkably similar view of CAS, though each vendor puts its own unique stamp on the technology.
Data reduction and backup
Data reduction is a key attribute of CAS technology, but the effectiveness of data reduction often depends on the granularity of the particular CAS system. By breaking a file into smaller pieces, such as blocks or other elements, it is typically easier to find redundant elements. Avamar is particularly focused on data granularity, breaking files into small pieces dubbed "atomics." When changes are made to an existing file, only those changed elements are stored to disk. The process works with virtually any type of file.
"We can do it for all the operating systems, for all the applications on your primary systems, for virtually any kind of data that you have," says Jedidiah Yueh, founder of Avamar. "And then even if files and documents and databases change over time, we'll still only need to store the new atomics."
Avamar's focus on granularity and data reduction isn't just intended to save costs. It's also an essential strategy to save time. Avamar's Axion product is positioned as a backup and restoration system. Eliminating redundant data -- thus reducing the total amount of storage required -- allows for faster backups and restores, especially over a WAN where bandwidth limits can inhibit backup objectives. "They [customers] buy us to replicate the data from one environment to another," Yueh says.
Avamar also takes a grid server approach to storage, allowing for significant scalability and very high availability in the face of individual server faults. "You can begin with just a single node which has internal disk and CPU, and then you can just scale by adding nodes," Yueh says, noting that this is substantially different than tape systems where an entirely new system would be needed to overcome capacity limitations.
Over time, Yueh believes that CAS will increase in value as more applications become available for searching, replication and other tasks. He sees independent software vendors (ISVs) being a major source of CAS software products -- a trend that is likely to increase as CAS products adopt common interfaces, such as the emerging eXtensible Access Method standard from Storage Networking Industry Association [see Content-addressed storage: Future directions].
Part of business governance is the issue of security. Not only must important data be retained for a prescribed period, sensitive data must also be protected from theft or abuse, and obsolete data must be destroyed once its retention period expires. Nexsan's Assureon appliance emphasizes security by implementing AES 256-bit encryption. Encryption prevents unauthorized use of sensitive data and also prevents tampering or modification of content. "We've actually combined CAS with encryption," says Brendan Kinkade, vice president of marketing for Nexsan. "This provides immutability and a way to authenticate data."
The Assureon isn't limited to disk storage. It can support removable media with the same level of security. "It [Assureon] has integrated disk storage, which is scalable to PB [petabyte]," Kinkade says. "However, it does support removable media, such as tape and optical, and will also encrypt to those forms of media." Key management features in the Assureon allow storage administrators to selectively destroy individual files.
Nexsan also works to ensure data integrity by preventing lost/missing files -- a feature that Kinkade calls "serialization." "Each time we create a CAS address, each new asset that's created in this object-based CAS system is given a global unique sequential serial number," Kinkade says. "By serializing the assets (they're in numerical order), it delivers a very straightforward method to validate and retrieve those assets." Periodic scans ensure that each asset is accounted for and missing items can be replaced from a redundant copy.
Most products require the intervention of APIs to interface a CAS product to an application. But in a unique departure, Assureon uses no APIs. "We've built a technology that is really a 'pull' feature," Kinkade says. "It requires no APIs." Like other CAS products, the Assureon attempts to minimize the overhead involved with management tasks. Although Kinkade says that there is some administrator management needed to set up storage, access and retention policies, operation is very transparent once the setup is completed. Some management is also needed as retention periods expire. "It will alert you that there are expirations coming up so that the administrator can then effectively [manually] execute those dispositions," he says.
Looking ahead, Kinkade sees widespread adoption for CAS, noting that the sheer size of penalties and judgments being assessed against noncompliant corporations offers a fearsome incentive to consider CAS technology. But perhaps the biggest obstacle today is a lack of knowledge about the technology and its capabilities.
Scalable and searchable
The very nature of CAS implies that data must be kept available for a prolonged period of time and this presents storage managers with two unique dilemmas. First, data volumes are always growing because it may be years -- even decades -- before any CAS data becomes eligible for deletion. This requires scalability, allowing companies to limit their initial investment, but grow their CAS infrastructure as needed to accommodate burgeoning content. Second, a CAS system can grow to handle hundreds of millions of objects (some can handle over 1 billion objects). Large CAS systems must be able to organize and search for a file within an ocean of data. While most CAS vendors make provisions for scaling and searching, major players like EMC and Sun have made strides to address these concerns for large enterprise products.
A conventional file system stores files in terms of nested folders or "hierarchies." As the file count spirals upward, the hierarchies expand. "As you continue to scale hierarchies, you get those tree-like structures that make it more challenging to find the files, and performance and robustness of the file system can be questioned," says Roy Sanford, vice president of CAS at EMC. He notes that CAS products, like Centera, can scale to high object counts without the need to reconfigure. "The fact that you can actually scale to very large object counts and still get subsecond response time to a random retrieval request is absolutely unique," he says. "This is what 'active archiving' needs to be."
Management relies on metadata, which Centera stores about each object. "The metadata field that we use, called a 'content descriptor file,' is an XML-based file," Sanford says. Not only can the metadata file be accessed by application partners (ISVs), but also by the CAS system operators. Metadata can hold retention details, as well as keywords and other descriptive content, which supports comprehensive searches using software like Centera Seek. This search potential also has a beneficial impact on system management, allowing huge volumes of data to be handled by a single IT staffer.
Sun also places a strong emphasis on scalability and management. CAS architectures must accommodate billions of objects over a long period of time, and include centralized management tools to locate and retrieve those objects quickly. "I think those are the two areas where CAS has taken a knock in the past," says Russ Kennedy, ILMS chief technology officer for the data management group at Sun.
StorageTek's IntelliStor system provides long-term object retention for the enterprise. "The IntelliStor product is designed for multiple tiers of storage, and includes disk and tape in a single system," Kennedy says. "It can manage large repositories of fixed-content information and manage the long-term retention of that information across those tiers of storage."
Go to the next part of this article: Content-addressed storage: User perspectives
Or skip to the section of interest: