This article can also be found in the Premium Editorial Download "Storage magazine: Lessons learned from creating and managing a scalable SAN."
Download it now to read this article plus other related content.
|What's here, what's missing|
Probably the biggest impediment to a successful ILM solution is classifying the stored data. Specialized data classification tools from vendors such as Abrevity Inc., Kazeon Systems Inc., Njini Inc., Scentric Inc. and StoredIQ Corp., are still quite new and haven't been battle tested by many companies. "You can look at file meta data, but that's very limited--it tells you who created the data and when," says Michael Masterson, an information system architecture at a Fortune 500 manufacturing company in the life sciences industry. To appropriately handle the stored data, the company needs much more information. "We need to know if the file contains private data that must be kept confidential, regulated data that must be saved for a certain period of time, proprietary intellectual property or public information that we can make available to anyone," he says.
This can get very complicated. The company uses instruments that generate scientific data based on the Flow Cytometry Standard (FCS). "We need to be able to automatically read the FCS headers and classify data based on what's in there," explains Masterson. The company turned to Abrevity, one of the new breed of ICM tool vendors, to classify stored information so users can find it based on terms in the FCS file header.
"Before Abrevity, we were managing data based on the need for storage space, like dumping in a landfill," says Masterson. "Now we're managing the data based on [its] meaning." Ironically, points out Masterson, today's landfills try to separate the different kinds of trash based on disposal requirements. That's what he's trying to do with his firm's stored data.
"The classification vendors typically are starting with archiving, legal discovery or something like that," says GlassHouse's Foskett. "These tools will enable ILM, but for now the vendors are staying more narrowly focused."
For example, St. Vincent Health in Indianapolis, part of the Ascension Health Network, uses StoredIQ's data classification product for HIPAA compliance. "We had been relying on users to tell us what the data was and that didn't work," says Karen Johnson, HIPAA security officer for the 16 hospitals in the network's St. Vincent region. The HIPAA compliance team found itself manually trying to classify the data. With well over 100TB of stored data on multiple SANs, that proved to be a nearly impossible task.
After looking at some of the emerging data classification tools, Johnson brought in StoredIQ. "They came in, plugged it in and it started data crawling," says Johnson. "I didn't have to give it keywords or taxonomies or anything." The tool classified the data using linguistics, pattern searching and keywords.
However, data classifications need standard taxonomies that must be consistent throughout a company. "In a large company, each department might have its own way of classifying things," says MacFarland at The Clipper Group. This makes ILM impossible, even with classification tools (see "What's here, what's missing," at right).
This was first published in July 2006