Home > Storage Technology News > Fortune 500 firm takes a crack at data classification
Storage Technology News:
EMAIL THIS LICENSING & REPRINTS

Fortune 500 firm takes a crack at data classification

By Jo Maitland, News Director
14 Feb 2007 | SearchStorage.com

News and trends in the storage industry
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google

Data classification is "like eating an elephant," according to Michael Masterson, IT manager at a Fortune 500 life sciences company that's in the middle of a data classification project. "Don't get discouraged," he said. "You can't do it all at once."

Masterson's office has 60 Windows servers and a handful of Unix machines, plus the latest EMC Corp. Clariion CX3 array for primary storage and a Nexsan Technologies system for nearline, noncritical data. He's using EMC Documentum for document management and has 9 terabytes (TB) of unstructured data floating around unmanaged.

About a year ago, Masterson's company decided it needed to better understand the files it was storing before throwing any more disk into its data center. Unfortunately, Masterson said, this information isn't available in the metadata provided by Windows systems.

More data management info
Symantec makes major update to Enterprise Vault  

Zantaz buys data classification partner Singlecast  

EMC introduces data classification for files
  

Data classification is end users' job
"People have dumped stuff on me like I'm a landfill, but I'm not in the storage business," he noted. He is, however, responsible for ensuring that the company's scientists can find files months or even years after they've created them -- and with a recovery rate of minutes or hours, not days. Drug discovery is a competitive field, and it's heavily regulated by the Federal Drug Administration (FDA) and the Sarbanes-Oxley Act (SOX). "The risk of not managing these files is huge," Masterson said.

Masterson uses what he calls a "folksonomic" approach to data classification. Folksonomy is Internet parlance for tagging Web content on the fly to make it easily discoverable to users of that content. "People will not adapt consistently to one system ... it's human nature to be constantly reorganizing," he said, "and files are no different."

He's been piloting Abrevity Inc.'s FileData Classifier software for approximately one year and is impressed with its ability to work with legacy files and file systems, and to provide custom file classification and tagging. "It uses tags [that] users have already provided and words within the file system that they already understand," he said.

Aside from email and the usual Microsoft Office files, fluorescence-activated cell-sorting (FACS) files -- more commonly called instrument files -- make up much of the company's unstructured data. These are text files produced by flow cytometers, instruments used to measure microscopic particles in fluids. As the instruments become smarter, they crank out more data, all of which must be stored and managed. Analysts report that more dollars were spent last year for these types of instruments than for IT storage systems, and an order of magnitude more files were generated by them than by Microsoft Office or email users in most of these life sciences facilities.

Masterson notes that while other data classification tools (he looked at products offered by Arkivio Inc. and Kazeon Systems Inc.) are designed to extract known values from a single document and don't create indexes for multidocument searching, Abrevity's FileData Classifier can search and parse FACS headers, extract target data, tag files with new metadata for classification and then allow for policy-based management.

"Engineers nest folders within folders, so it's important to be able to search across these without having to open each file, which can take hours or days," he said.

More significantly, FileData Classifier offers context-based discovery rather than text searching using a proprietary database technology the vendor calls SLICEbase, instead of a relational database. This "speed[s] up searches tremendously," Masterson claimed. "They've got the right approach [to] preserving context."

Still, Masterson said that showing users how to tag files with a business value is an arduous task. To that end, he built a survey and created interview questions to find out which files are important given business and regulatory requirements. The secret is to keep classifications simple. "We have security and retention tags only. Don't get too complex with it and create slices that people will forget are even there," he advised. He also recommends creating a short list of the most important data -- files for a legal discovery case or human resource files, for example -- rather than trying to tag everything.

So far, Masterson has indexed about one-third of his office's unstructured content. His next step is to turn on policy automation to force the back end to move files to the right location.

"It will [take] a while for us to achieve nirvana," said Masterson. The dream is for users to tag files with the appropriate values when they save them. Ideally, this functionality will be built into the operating system, but for now the Abrevity tool is a good start, he said.



Sound Off! -   Be the first to post a message to Sound Off!


Tags: Data management toolsVIEW ALL TAGS

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


TechTarget Storage Media
Storage Magazine View this month\\'s issue and subscribe today.
Storage Decisions Apply online for free conference admission.
SearchStorage.com
HomeNewsMagazineTopicsLearningWebcastsWhite PapersBlogsEventsAbout Us

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2000 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts