This article can also be found in the Premium Editorial Download "Storage magazine: The best high-end storage arrays of 2005."
Download it now to read this article plus other related content.
One of these days, your CEO will ask the big question: "How is our data storage infrastructure driven by the business value of our company's information?" Aside from pulling out a few strands of your hair, how would you respond?
Answering accurately would take time and plenty of clarification. However, if your organization is among the few that have installed one of the new Information Classification and Management (ICM) applications, your answer could be as succinct as the question itself: "We have ICM deployed above the storage infrastructure to manage information based on content values. Those content values are determined by business objectives established by our compliance, security and internal IT services teams."
ICM isn't just another disposable acronym; it's a concept and product category that unites business process with storage. Companies in this segment hope to bridge the gap between high-profile processes like compliance, corporate security and IT consolidation, and the storage infrastructure those processes depend on. If ICM succeeds, storage management and information management will become blissfully blurred in a world of automation with full transparency about what information goes where, when and why. In short, ICM could be the missing link between your day-to-day storage reality and the information lifecycle management (ILM) marketing dreams conjured by storage vendors. Startup vendors in the ICM market all share the same key insight: You have
The following criteria should be applied when evaluating Information Classification and Management (ICM) vendors:
Business-solutions focus: A key differentiator among ICM players is their relative focus on business-level applications. Products may focus on compliance, infrastructure controls or security.
Integration capabilities: An ICM offering's ability to integrate with back-end storage software is an important consideration. Given the wide range of potential data-movement software and solutions available, finding the right player requires careful evaluation.
Classification power: Consider a product's breadth and flexibility of classification coverage. Does it classify only unstructured content, or does it integrate with e-mail and database offerings? How easy is it to create new lexicons? Is it a completely extensible environment?
Deployment method: There's some variance in how ICM is deployed. Most products are out-of-band, but others support in-band operations. Additionally, some products use the LAN for indexing on hosts, while others use mass streams like backup jobs for data collection.
Performance and scale: For ICM, performance is measured in the number of files per hour that can be indexed. With regards to scalability, ICM products should be compared by their high-end file range of support and by the ease with which they scale. A relatively small number of ICM devices (no more than four to five) should be capable of covering the unstructured content of a typical Fortune 1000 data center.
What is ICM?
ICM software indexes enterprise information and executes a range of precise actions on that content. Based on policies, ICM can determine access rights to an object, as well as its residency, movement and final disposition within the storage infrastructure.
It's deployed as a standalone technology that interoperates with existing data movement and storage technologies. ICM isn't dependent on top-level applications, and doesn't need to have a dedicated interface to a proprietary data movement technology (such as a volume manager, snapshot tool or backup application). Upon initial installation, ICM software establishes an index by proactively scanning or "crawling" the file environment. After establishing a baseline, the software conducts ongoing, nondisruptive indexing in the background or at specified time intervals. When deployed in front of an enterprise storage solution, ICM can achieve the following goals:
- Ongoing classification of file information based on a range of programmable meta data attributes such as business owners, history, creation, directory and so on. This classification process can be automated based on predetermined policies created by an administrator, which may be applied to production application environments and to secondary or archival environments.
- Ongoing classification of information based on content-related attributes that are extracted from file-level inspection, such as social security numbers, customer names, employees--literally any programmable keyword. As with meta data classification, content-based classification can be automated according to preset policies. The policies can also be applied to production environments and to secondary or archival repositories.
- Creation of classification templates called "lexicons," which become the basis of policies carried out by the ICM app. These can range from simple directives ("Always restrict access to documents authored by Jane Doe wherever they originate") to complex templates filled with specialized jargon and nested logical operations. Typically, the more complex lexicons are architected to address specific business processes such as HIPAA or SEC 17a regulatory compliance.
- Granular file-level controls--searching, retrieving and acting--against any content that has been indexed by the ICM deployment. For example, this may include an administrator searching for a particular document using a Google-like interface, and then restricting access to a given user or user group. But it could also include encryption, deletion or migration of that single file. As with all other functions in the ICM category, these controls may be applied to primary application environments and to secondary or archival environments.
Compliance, security and ILM
Over the last three years, all of the ICM vendors independently came to the same conclusion: Despite the advances of storage resource management (SRM) and backup products, those products didn't expand our understanding of what was being stored and its relation to various business processes. "We realized that many of our storage challenges were only solvable if we could get knowledge about the content itself," says Michael Masterson, an information systems architect at a leading life sciences company now evaluating several ICM products. "We needed to know information about each piece of data if we wanted to link it up with business policies for compliance and corporate security."
To address compliance and security requirements, technologists like Masterson can use an ICM product to create detailed policy templates, or lexicons, that use meta data and content attributes to precisely automate what content is accessed by whom, when content migrates and its retention period. Masterson believes this kind of granular control is not only desirable, but inevitable. "Getting semantic control of our data and turning it into information is what this is all about," he says.
The other major area of interest driving ICM is adding long-promised granular controls to the ILM process for unstructured content. Specifically, this means more intelligent content movement capabilities. The poor visibility that most enterprises have regarding their unstructured file information and its usefulness to the organization is nothing short of staggering.
Taneja Group conversations with storage admins routinely reveal that many low utilization rates for file storage are directly related to a lack of knowledge about what's being stored, which leads to inaction based on a fear of deleting important files. This translates into inefficient backup as static content is redundantly protected on tape and disk libraries. By using ICM solutions, enterprises can become much savvier about what they store where, for how long and how it will be migrated to an appropriate storage tier over time.
This was first published in August 2005