This article can also be found in the Premium Editorial Download "Storage magazine: Exploring the solid-state storage advantage."
Download it now to read this article plus other related content.
Capturing a firm's unstructured data is no easy task. That's what file archiving is designed to tackle, and it likely involves finding and cataloging years' worth of documents, spreadsheets, PowerPoint slides, and media files such as videos and MP3s. File archiving products are becoming widely available, but the burden of finding and choosing the files to archive may largely fall to storage admins.
A key problem, says Stephen Foskett, director of data practice at Mountain View, CA-based Contoural, is the lack of meta data around files vs. emails. "We don't know which things we should archive and which things we shouldn't because we don't have good describing words," he says. "If you're going to archive files effectively, you really need to have some human intelligence. File systems don't lend themselves to archiving."
File archiving has followed in email archiving's footsteps with similar features like single-instancing, version control, and full-content index and search, but file archiving tools aren't necessarily a must-have yet. Dave Campbell, senior product marketing manager for Symantec's Enterprise Vault, says email archiving was a "must do" for many firms because of message storage growth. But in terms of file archiving, he says, "it's almost a worse scenario, because the things people place on their file shares just never get managed and there aren't a lot of controls
| in place."
According to Foskett, "email tends to be used roughly the same way: people send and receive messages. Unstructured data is all over the map. That's what's holding up the acceptance [of file archiving systems]." Users must set policies to archive files, both current and old (based on last access, file size, type or other available data), and may do it by application or group, and update the policies as internal processes or projects change.
Rick Chin, until recently the data center manager at Pinnacle Financial Corp. in Orlando, FL, beta tested Mimosa Systems' File Systems Archiving (FSA) product, part of its NearPoint offering, earlier this year. Chin says Pinnacle Financial went with Mimosa because it was a natural extension of using their email archiving product. "Our biggest need initially was for email archiving because it was just an out-of-control resource," he says. "File archiving was equally out of control, but it was less of a pain point because we had more general storage than email storage."
Chin says it was during Pinnacle's data migration to a new SAN when they realized how many of their files hadn't been touched for years, but were still needed for documentation. He says Pinnacle followed a policy similar to that for email with file archiving. "We had to define groups who all adopted similar policies, like marketing or executive administration," he says. Chin says what he most appreciated about Mimosa's product was a setting that lets files be directly called up from the archive but not rewritten to disk.
Barry Murphy, director of product marketing at Mimosa, says they wanted to get away from the typical method of leaving a stub for a file (which allows users access) because the stubs got in the way of backup routines.
"When a user double clicked, the file would be restored and then have to be rearchived," he says. Instead, users can choose to stub or not.
Ediscovery and compliance are big drivers of file archiving implementation, but "what I hear more [as a driver] from IT is cost savings, getting their data off production servers and onto cheaper options," says Murphy.
The next step on the file archiving path seems to be archiving SharePoint. Murphy thinks continued SharePoint adoption will add more structured management to the content to be archived. Campbell concurs; he also sees an increase in SharePoint archiving as more companies adopt the tool, which carries more meta data. "The ability to archive across all the unstructured content is really getting a lot more visibility," he says.
"It's not the data movement that's the hard part," adds Foskett. "The hard part is actually figuring out what to move."
This was first published in July 2008