Home > Data classification tool purchase considerations
Buying Guide:
EMAIL THIS

Data classification tool purchase considerations

30 Jan 2008 | Stephen J. Bigelow

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   

Data classification is a two-pronged process. An organization must first understand the business value of its applications and data, then store and protect that data at the appropriate service levels. In effect, data classification aligns business applications with the storage infrastructure. Although this sounds simple, data classification is a difficult corporate initiative -- usually because an organization cannot locate all of its data, categorize it properly or determine its business value. Data classification tools can overcome these obstacles by helping organizations locate their data, then locate it based on user-defined rules. Once the data is classified, many tools can also move and migrate the data to the appropriate storage subsystem.

Data classification information
Visit the Data Management All-In-One Research Guide for background information on provisioning, growth, virtualization, tiered storage and more.
Still, data classification is not a task for IT to shoulder alone. Proper classification depends on understanding the data's business value, and this normally requires the involvement of business units such as legal, manufacturing, human resources and finance. Data must be classified "on paper" first . . . and then the data classification tool can bring efficiency and automation to the process.

There are many data classification tools to choose from, with a variety of features, such as indexing, search, policy management and migration. Before looking at the criteria below for purchasing data classification tools, you should first review the issues involved in any tiered storage acquisition. After the list of the factors to consider, you'll find a series of specifications to help you compare data classification software from vendors such as Abrevity Inc., EMC Corp., Index Engines Inc. and StoredIQ Inc.

What is the product's scope? Once you know how many file types you'll need to support, select a data classification tool that can handle that number. Also, pick a tool that fully supports structured and unstructured data. Tools that only handle either structured or unstructured data, or are only intended for certain applications (such as databases) may not meet your long-term objectives. Most products handle an array of structured and unstructured file types. For example, FileData Classifier from Abrevity claims to handle hundreds of file types including Microsoft Office files, .pdfs, email files, databases, such as SQL or Access, and a variety of media file types. Otherwise, some file types may be left unclassified -- and probably stored improperly.

How does the product support rule sets and automation? All data classification products rely on a set of rules that drive the classification engine. Early data classification tools relied on rule sets created in-house, but most current tools can import established rule sets (i.e., to support the medical or legal industries). You should determine if imported rule sets can be modified or adapted to your specific needs. For example, the auto-stor product from Arkivio Inc. includes standard classification categories out of the box, but classes can be adapted and new classes can be created as needs change. Manual classification is not universally available. The Information Server from Kazeon Systems Inc. allows manual classification to be performed by the user or administrator on a set of files (defined by a search query or a report), but Infoscape from EMC does not support manual classification.

Does the tool support tiered storage and migration? Up to 20% of corporate data is underprotected. Such data is not available at a service level needed by the business, so it may take too long to recover that underprotected data. Conversely, up to 60% of data is overprotected -- it's kept on expensive storage and probably replicated too much relative to its business value. This results in excess storage expense. Shop for a data classification tool that can migrate data between storage tiers so that each data type receives the appropriate service level once it's classified. This maintains adequate storage performance while minimizing costs. If the tool does not natively support data migration, be sure it can support a third-party data mover. Note: Migration will impact network performance to some extent because data in motion will contend with other storage network traffic.

What is the product's performance and scalability? A large company may need to classify and migrate millions or even billions or files. Since data classification products generally have a practical limit to the number of files that they support, select a product that can accommodate that volume while providing an acceptable level of performance. Furthermore, you should understand how the tool handles data in terms of file count and size. Some tools may be adept at handling a large number of small files; others may be suited for fewer large files. With data volumes growing at a very fast pace, the tool should be able to accommodate projected future data volumes.

What is the tool's level of heterogeneity? A data classification tool must interface with other platforms in your environment. For example, a data classification tool without migration capability will need to interface with another policy manager or data mover. The tool should also support your current storage platforms. If you have data in three different storage systems, the classification tool needs to be compatible with all three in order to look inside them and perform its job -- otherwise you won't get full value from the tool. Lab testing is recommended to verify performance and interoperability.

Here are specifications for these nine data classification products:

  • Abrevity Inc.; FileData Classifier and FileData Manager
  • Arkivio Inc.; auto-stor software
  • Brocade Communications Systems Inc.; Storage X software
  • Brocade Communications Systems Inc.; File Lifecycle Manager (FLM) software
  • EMC Corp.; Infoscape
  • IBM; IBM Classification Module for OmniFind Discovery Edition
  • Index Engines Inc.; ILM and Data Classification appliance
  • Kazeon Systems Inc.; Information Server software
  • Scentric; Destiny software

    Return to the beginning



    BROWSE BY TAG
    Data Storage Management,   Data management tools,   Tiered storage,   Tiered storage data classification tool purchase considerations,   Tiered Storage,   Tiered storage data classification tools,   VIEW ALL TAGS

    Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



    RELATED CONTENT
    Data management tools
    SRM tools: Choosing the right storage resource management tool for storage provisioning
    pNFS spec for faster file service arrives, but NAS systems lack capable clients
    pNFS and NFSv4.1 adoption still on hold with most storage array vendors
    NFS alternative meets Los Alamos National Laboratory's high-performance computing needs
    NFS Version 4.1 update: pNFS spec approved
    Symantec to consolidate Veritas Cluster Server and storage resource management applications
    Backing up SharePoint
    Hitachi Data Systems tweaks IT Operations Analyzer resource monitoring application
    EMC World 2010 Boston: Cloud computing, data backup and storage management
    SRM tools include new metrics to meet needs of changing IT infrastructure
    Data management tools Research

    Tiered storage
    Dell EqualLogic PS6000XVS adds automated tiered storage
    Tiered storage alive and well, even for NetApp customers
    Hospital manages electronic medical records with tiered storage and virtualization
    Data classification trends: Classifying native applications for enterprise data storage
    Storage tiering getting more automated
    Tiered storage: The importance of data migration in storage tiering
    3PAR adds SSDs, sub-volume automated tiered storage to InServ arrays
    No more wasted tiers
    Tiered storage: A look at internal and external tiered storage models
    Automated tiered storage startup Avere adds support for flash SSDs

    RELATED GLOSSARY TERMS
    Terms from Whatis.com − the technology online dictionary
    Andrew file system  (SearchStorage.com)
    application-aware storage  (SearchStorage.com)
    capacity optimization  (SearchStorage.com)
    compression artifact  (SearchStorage.com)
    data classification  (SearchDataManagement.com)
    depository  (SearchStorage.com)
    storage consolidation  (SearchStorage.com)
    storage provisioning  (SearchStorage.com)
    storage resource management (SRM)  (SearchStorage.com)
    wide-area file services  (SearchStorage.com)

    RELATED RESOURCES
    2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
    Search Bitpipe.com for the latest white papers and business webcasts
    Whatis.com, the online computer dictionary




  • Find Data Reduction
    TechTarget Storage Media
    Storage Magazine View this month's issue and subscribe today.
    Storage Decisions Apply online for free conference admission.
    SearchStorage.com
    HomeNewsMagazineTopicsLearningMultimediaWhite PapersBlogsEventsAbout Us

    About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
    TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

    TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




    All Rights Reserved, Copyright 2000 - 2010, TechTarget | Read our Privacy Policy
      TechTarget