This article can also be found in the Premium Editorial Download "Storage magazine: Salary survey reveals storage skills are in demand."
Download it now to read this article plus other related content.
While all the products described in this article provide canned reports, data classification exercises often produce reports that have thousands, or even millions, of lines that may need further refinement and data manipulation to be useful for your data management tasks. To better understand and analyze a report's trends, you'll probably need to consolidate, filter and manipulate the report's data.
In most cases, using a relational database is the best method for analyzing and reporting the data. Even if your information and classification management (ICM) product creates reports, the size of the data set or limitations of the report generator will often require summarizing or filtering the data for decision making. Most of the tools don't allow much flexibility in output (graphs, tables, etc.). Spreadsheets can be appropriate for small datasets or for creating specific reports, but databases are generally the best repositories because of their ability to filter and sort data based on specific queries and extracts. All of the ICM products described in this article can output their data in either comma-separated values (CSV) or XML formats, which can easily be imported into databases or spreadsheets.
Reporting on classification
Some offerings have canned reports for file aging, file type, file ownership and so on, with detailed and summary levels, and the ability to chart and graph data (see "Refining reports," at right). Most have graphical report engines that allow you to design custom reports by clicking on search patterns or entering regular/Boolean expressions as you would do with an Internet search engine. Some ICM tools go beyond basic classification and search capabilities and can manage, archive, retain, deduplicate or delete data; the long-term roadmaps of all ICM products include plans to introduce new features in this area. Most of the tools come from small independent vendors, although a few have established strong OEM partnerships or are looking to do so.
The list of data classification tools in this article isn't exhaustive; the products discussed were chosen because they're recognized as market leaders by users and analysts, or because their functionality is unique and compelling. Many new vendors have entered this space in the last six months, and some with existing file-attribute classification tools are adding content capability.
This was first published in November 2006