The content-addressed storage (CAS) market that EMC Corp. pioneered with its Centera line is changing, and one piece of the new dynamic involves small vendors that are attempting to push the technology beyond traditional use cases.
The most recent example of this is Tarmin Technologies Ltd., a software company launched in October with the promise of combining elements of CAS, archiving, information lifecycle management (ILM), search and e-discovery, security and audit management.
Tarmin's GridBank product fulfills the basic tenets of CAS technology -- alternately known as content-addressed, content-addressable and content-aware storage -- by converting files into objects and assigning a unique identifier, or digital fingerprint, to each one.
The ability to guarantee that an object won't change, coupled with its highly scalable architecture, has made CAS particularly useful for IT departments that need to archive massive amounts data, such as e-mail or medical records, for compliance or regulatory purposes.
However, Tarmin is trying to break new ground by offering capabilities that are typically add-ons. GridBank, for instance, includes features such as policy-based data migration, object-level encryption up to 448 bit, search and e-discovery, and compression.
"All of the smaller players in the space have added additional feature/function for a single price that the larger players don't do," says Eric Herzog, vice president of sales and marketing at Tarmin.
CAS for primary storage?
CAS vendors have traditionally targeted the secondary storage market because of the performance hit associated with running data through a hashing algorithm to create the digital fingerprints. But Caringo Inc., another small, software-only vendor, claims its CAStor-based clusters are fast enough for primary storage plus cheaper than tape. The company is trying to eliminate the separation of archives from active content that's used daily.
"CAStor's symmetric, parallel architecture enables near-linear scalability in terms of performance because each new node added to the cluster increases the overall processing power of the cluster," says Derek Gascon, vice president of marketing at Caringo. He says some customers currently use CAStor for primary storage, if they have a specific need or application that suits it.
Charlotte, N.C.-based Yap Inc., which created a speech recognition platform for cell phone users who want to automatically convert spoken conversations to text, is using CAStor for its tier 1 storage, says CEO Igor Jablokov. Yap stores audio files in CAStor the moment it gets them, and its speech recognition cluster does the transaction processing against the files in CAStor. "We don't have any data on the actual processing cluster," says Jablokov.
Yap also uses the stored audio files and their text translations for research and development purposes, to derive higher accuracy models for its speech-to-text platform.
Jablokov says his firm compared CAS to NAS to SAN and ultimately chose CAS "based on the manageability and ability to grow the storage. NAS would have been the cheapest option, but it can get "fairly burdensome" creating directory structures and duplicating data sets as the storage demand grows, he adds. And a SAN would have cost more.
"[CAS is] obviously not as capable as SAN, but it's far less expensive," says Jablokov. "It gets 80% of the performance for 20% of the cost."
Caringo contends that CAStor is suitable for 90% of all fixed content, with the exception of dynamic content in databases, which is more appropriate for a SAN or NAS.
"We talk about single-tier ILM, and that's a very new sound, and many people will be skeptical about it," says Caringo chief technology officer Paul Carpentier, who first developed CAS technology at FilePool NV, which EMC acquired in 2001. "Yet that's the case where we think, ultimately, this market is going. It'll be in steps, but I very strongly believe we'll get there."
Both Caringo and Tarmin promote their software-only, hardware-agnostic approach as advantages, yet one of the earliest software-only CAS vendors broke ranks in 2006. Permabit Technology Corp. shifted to a hardware-software appliance that supports standard CIFS, NFS and WebDAV protocols because customers wanted a "turnkey solution," according to Mike Ivanov, the company's vice president of marketing.
"What we came to realize," he says, "is that people don't go out to look for a technology like CAS."
Permabit's latest offering, Enterprise Archive Data Center, released last year, represents a major architectural change to what Ivanov calls "next-generation grid," allowing for massive scalability, high availability and reliable data protection at a price cheaper than tape.
Ivanov says Permabit's Rain-EC is able to recover from multiple simultaneous failures with no loss of data or access, unlike products that employ mirroring and single-parity Rain, which can only recover from a single failure. Rain-EC also significantly reduces storage overhead, he says.
Permabit further differentiates itself with features such as data compression and sub-file data deduplication.
"We really don't place ourselves in the 'CAS' bucket as it's merely a piece of technology within the Permabit Enterprise Archive, rather than it defining what our entire solution is," says Ivanov, explaining that CAS came to be associated narrowly with archiving for compliance purposes. "We position ourselves as disk-based enterprise archiving," he emphasizes.
A broader definition of CAS
With its launch of Centera in 2002, EMC kicked off a campaign to promote CAS and create a new market segment, recalls Pushan Rinnen, a research director at Gartner Inc. Gartner, too, narrowly used the term CAS based on its technology definition rather than the application point of view until last year, when it broadened the term, says Rinnen.
"When we talk to users, they don't talk to us about CAS or NAS or object-based storage," she says. "They don't care about that. They care about what is the best way for [them] to store the data in the longer term."
Steve Spataro, director of Centera product marketing, says EMC created Centera because customers told the firm they wanted a disk-based solution to use as an archive, with low total cost of ownership and the immutability of burnt-in media.
EMC continues to focus on secondary storage, but it has made substantial performance improvements over the years. According to Spataro, some customers now build production archives on Centera. One hospital used to wait a year to put a patient's medical images on Centera, he says. It now does that while the patient is still in the hospital.
"To do that, the archive has to be exceptionally robust," says Spataro. "I need to make sure I'm confident in the archive's ability to be used as close to primary storage as humanly possible."
As for features beyond traditional CAS, EMC builds added functionality into the Centera operating environment, but a customer sometimes needs to buy a license key to unlock those extra features. One example is search.
"Rather than trying to compete against competitors," says Spataro, "what we're trying to do is stay ahead of what the customer wants."