Home > Storage Magazine > Features > Apps to classify and find data
EMAIL THIS LICENSING & REPRINTS
Storage Magazine

  CURRENT ISSUE  

  FEATURES  

  TOOLS, TRENDS & ANALYSIS  

  COLUMNS  

  ARCHIVES  

  SUBSCRIBE/RENEW  
 

Apps to classify and find data
by Greg Forest
Issue: Nov 2006
printer-friendly
licensing & reprints
< PREV PAGE   |   1  |   2  |   3  |   4  |   5  |   6  |   7  |   8  |   9  |   NEXT PAGE  >

Unstructured data
Unlike information stored in well-defined application databases, or in semistructured e-mail servers and document management systems, the file shares in most companies are a dumping ground for more than 400 types of file formats. As corporations control the data in ERP apps and e-mail servers, users are increasingly using Microsoft Office apps to store their personal productivity files, which are often critical to the business' day-to-day operation. This leads companies to provide higher levels of service for this data at an ever-increasing cost. Unfortunately, this data may be everything from pictures of the grandkids to highly confidential customer documents containing private information.

The problem is that this data is stored in a user-defined fashion that's rarely controlled, searchable or organized in any meaningful way. A company must therefore find a way to separate mission-critical data from data that requires less costly service levels without adversely affecting productivity. Equally critical is the need to identify older data, duplicate copies of data and orphaned data that's no longer needed and can be deleted. Finally, when the need to find, protect or destroy specific pieces of information arises because of litigation or new regulations, how do you quickly respond to these demands? The short answer is to classify data, ideally with automated, user-friendly tools.

The primary purpose of information classification and management (ICM) tools is to provide intelligence about the files residing in file shares or share drives. These files may reside on individual Unix/Linux or Windows servers connected to SAN or DAS, in NAS filers or serviced by NAS blades inside a SAN chassis. Historically, management tools for file systems focused on the file attributes from the file systems themselves. ICM tools discover the file attributes of a file system, but their power and functionally comes from their ability to actually read the contents of a file and search for specific patterns (like Social Security or credit card numbers) or, in some cases, to create a complete index of all text in the document (including numeric information).

They create a repository of meta data by "crawling" a file system or reading a data stream and capturing the file attributes and/or the content of each file. They don't actually store the file in the repository, but instead store the data as individual attributes or entire full-text indexes. The initial process for all but two of the tools described in this article is fairly slow (they may have to read millions of documents); however, after the initial crawl, they all have the ability to do incremental crawls on a periodic basis, which run faster with less impact on the file system. The repositories are then searched to produce reports or take actions on the file. In general, the size of a repository will run from 3% to 15% of the total amount of storage being classified, depending on the type of files and the amount of data kept in the repository (file attributes, specific patterns or full text).

< PREV PAGE   |   1  |   2  |   3  |   4  |   5  |   6  |   7  |   8  |   9  |   NEXT PAGE  >





TechTarget Storage Media
Storage Magazine View this month\\'s issue and subscribe today.
Storage Decisions Apply online for free conference admission.
SearchStorage.com
HomeNewsMagazineTopicsLearningMultimediaWhite PapersBlogsEventsAbout Us

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2000 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts