This article can also be found in the Premium Editorial Download "Storage magazine: Exploring the most innovative midrange systems."
Download it now to read this article plus other related content.
|At a glance:|
Waltham, MA-based Archivas Inc.'s ArC software lets companies store fixed-content files as write once, read many (WORM) data over the long term, yet provides fast access to files. It guarantees the storage and retrieval of files--even if the hardware and apps that created them have changed--by incorporating the meta data necessary to turn the stored data into an object that contains the information necessary to open the file. Data can only be deleted at the end of the specified retention period or as the result of some other rule incorporated as part of the meta data. Data is stored on magnetic disk, not on optical WORM where data is permanently etched in the media.
Archivas' distributed architecture scales to petabytes. "Archivas is optimized to be a large-scale repository," says Brad O'Neill, senior analyst, Taneja Group, Hopkinton, MA. "It takes an object-based approach that's more scalable than any file system. It stores objects that include the content and the meta data associated with the content, so it's not restricted to the file system hierarchy," he says. An archive database, which is replicated across multiple nodes, manages meta data.
ArC uses a network-oriented object architecture built around redundant arrays of independent nodes (RAIN) that isolate archived data from the hardware layer. Archived files are represented as objects containing the data and meta data required to support apps and ensure content integrity. Archivas' proprietary Fixed Content File System (FCFS) provides access to archived objects. FCFS is transparent to users, who can access files through their file systems such as NFS.
The nodes are rack-mounted Linux servers (purchased separately). Each node has 1TB of storage and 24 nodes can be packed into a cabinet. To scale the system, add more nodes; ArC automatically finds new nodes and distributes the workload. Disks are attached directly to the nodes and while any kind of disk can be used, they're usually low-cost ATA/SATA disks.
ArC distributes all runtime operations and physical storage among node clusters. If a node fails, the cluster redirects processing to the other nodes. This approach, says Archivas, can tolerate up to three simultaneous points of failure.
Early adopters are pleased with ArC. The Cancer Therapy & Research Center in San Antonio uses ArC to store patient images. Because treatment is cyclical, "I don't want to keep [patient] data online during that [off] time," says Mike Luter, the center's CTO. When treatment resumes, ArC allows doctors to access older images as if they were stored on primary disk.
EMC's Centera is the established product, but "Archivas is more cost-effective," says Peter Gerr, senior analyst, Enterprise Strategy Group. Luter agrees: "Centera was more costly and didn't do what we needed."
At this point, the biggest downside to ArC is the untested nature of the technology. "Nobody has pushed the upper limits in terms of multiple petabytes, so we don't know what happens when you get up there," says O'Neill. For most companies, however, multiple petabytes is still a long way away.
This was first published in March 2005