Building out an archive requires a lot of planning if you want to be able to manage it as it balloons in size.
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
"One [issue] we've seen get people is retrieval," says Jim Cuff, VP of engineering at Boston-based Iron Mountain, which provides a variety of electronic archiving and vaulting services. Working on the assumption that they're building a low-access archive, "they get caught flat-footed" as it grows and are unable to retrieve data at the rates they can write.
Another issue is logistics: How do you procure, power, cool and protect that much disk? Iron Mountain has customers who archive 250GB daily, which at first "sounds like a conventional IT problem," Cuff says. But over, say, seven years, that 250GB per day approaches 650TB. What happens if that 250GB per day becomes 500GB? "You're in a different problem domain quite by accident," notes Cuff.
Massive array of idle disks (MAID) storage is one technology that may help the archive cause. In a nutshell, a MAID array spins down disks that aren't being used to reduce wear and tear, lengthen their lives, and save on power and cooling. An example of a MAID array is Copan Systems' Revolution 200T.
Now the question becomes "How do you manage the data?" Today, most shops simply front the archive with several large file servers and get around scalability limitations that traditional file systems present (e.g., inode and directory object limitations) through application code. But Iron Mountain, for one, is "very interested in the global file system metaphor," says Cuff. Examples include ADIC's StorNext Management Suite and IBM's SAN File System.
But not everyone is sold on distributed file systems. In a joint research project with the Cornell Theory Center, Unisys ruled out the clustered file system approach "because we saw that it couldn't scale past 30TB or so," says Dr. Michael Salsburg, director for systems and technology at Unisys. "Data clusters look good and look inexpensive," he adds, but as the amount of storage grows, "you get totally bogged down with the communication."