This article can also be found in the Premium Editorial Download "Storage magazine: The best high-end storage arrays of 2005."
Download it now to read this article plus other related content.
Meet the players
To date, the ICM category has four announced players: Arkivio Inc., Kazeon Systems Inc., Njini Inc. and StoredIQ Corp. Scentric Inc., another ICM company, is still in stealth mode.
Mountain View, CA-based Arkivio was the first to release an ICM product, and offers a range of meta data management capabilities to classify content and drive its policy engines. Its auto-stor appliance has been focused on the ILM usage model described earlier, although the company also positions it as a compliance solution. Arkivio helps a user to intelligently move content off production environments. The company recently completed integration with EMC Corp.'s Centera as a front end to that content-addressed storage (CAS) archive. Arkivio doesn't focus on content-based indexing; instead, it delivers its functions based on the files' meta data attributes.
Kazeon Systems, also in Mountain View, CA, is a new entrant. The Kazeon Information Server (KIS) was developed over two years by search, database and storage experts. KIS delivers full meta data and content classification, and a totally programmable policy engine. A unique differentiator for Kazeon is its focus on integrating very easy-to-use, Google-like search capabilities. With KIS, users will be able to search for any piece of indexed information in the infrastructure and then view an entire range of allowed actions for that stored object
through a single user interface.
Njini, based in Surrey, England, offers an in-band ICM product. It consists of modules, such as njiniENCOUNT, which prevent unnecessary duplication of unstructured data objects. Hierarchical storage management (HSM) and compliance tools are expected in the next six to nine months; all work with the njiniENGINE. Because it sits in the data path, Njini believes it has an advantage over competing products because it can take policy-based actions on data before it reaches the storage devices.
StoredIQ 3.0, from Austin, TX-based StoredIQ, enables meta data and content-based indexing and full controls over content, as well as search and query functionality. The firm is focused on compliance and corporate security, and has developed regulation-specific lexicons for HIPAA, SEC 17a and Sarbanes-Oxley that automate compliance controls for unstructured content. Based on the positive feedback from StoredIQ's early customers in the healthcare industry, a preset lexicon for compliance apps could become a common approach across the ICM category.
|ICM: What it does|
Proactive data inventories: Information Classification and Management (ICM) inventories existing data sets by indexing or "crawling" the environment to collect data. This may take place as an activity on the LAN or as a batch process to inspect content during its movement for migrations or backup operations.
Meta data attributes: ICM applications depend on meta data indices to achieve their goals. The ICM product collects all available information about files residing in the data pool.
Content attributes: ICM "cracks" and inspects file-level content, enabling content-based classification and control of the data based on its own attributes.
Lexicon creation: ICM supports the ability to create any manner of business-value templates of keywords and logical operations, known as a lexicon. This constitutes the brains for managing a complex business process such as compliance or corporate information security, as well as disk archival management.
Execution and initiation of controls: ICM can act upon content to execute certain controls (e.g., encrypt, restrict, delete and migrate) and to initiate a chain of operations that might be executed by an associated data movement technology such as a volume manager, snapshot application or backup application.
For the foreseeable future, ICM implementations will likely be deployed almost exclusively as network-mounted devices. To date, most vendors have chosen to deploy "out of band," but there's no architectural requirement for this, and other vendors will assuredly emerge as in-band providers later this year. There's no need for server-side agents to be permanently deployed on the servers under management by ICM. That said, some vendors may chose to deploy agents in the future to increase application-specific functionality in a move analogous to the CAS archiving evolution we've seen where API or CIFS/NFS interfaces are available.
The network device hosting ICM software directly accesses all assigned servers and then communicates with a centralized storage movement or management app, as required, to hand off data-movement activities. All ICM players also provide the means for file-level data movement, but they seem to realize that users expect ICM software to integrate with existing data movement software.
ICM devices consume storage, but not much because only meta data is stored. Early indications suggest that storage requirements will be between 10% and 25% of the production data capacity being indexed and managed. It's worth noting that ICM storage doesn't have to be high-performance disk; Serial ATA is perfectly acceptable because ICM is a relatively low IOPS application. To scale, the ICM devices can be clustered. Based on scaling metrics from ICM vendors, no more than a few physical devices would be needed to meet the scaling requirements of most large enterprises.
Alternatives to ICM?
Are there any current alternatives to ICM? The short answer is no. SRM products can't do what ICM does: proactive indexing, content-aware inspection, and provisions for detailed policies and lexicons. Current SRM products can only scan file-level environments and collect information on file meta data. It's possible that ICM could eventually be integrated into SRM products.
At this early stage, some may claim that cluster or distributed file system-based namespaces (e.g., Ibrix Inc., Isilon Systems, PolyServe Inc.) or network file management (e.g., Acopia Networks, NeoPath Networks, NuView Systems Inc., Rainfinity Inc.) can do all or some of what ICM does. This misconception likely stems from confusion about the benefits of a global namespace. With a global namespace, file-level content is abstracted from physical device relationships and can be classified and moved according to business-level decisions: simple virtualization. That's not proactive indexing, content-aware or applicable to a business process like compliance. Further, global namespaces cover only the servers in question, not an entire infrastructure.
There are also no manual approaches for content-aware file inspection or policy creation. The people who ask whether it can be done manually have historically catalogued file meta data and confuse ICM with file auditing, which has been done manually for many years.
ICM is another step in the evolution of storage. Controlling content based on its own attributes and associated meta data amounts to a transition from a world of opaque data management to one of transparent information management. The implications of this trajectory are clear: Data storage will increasingly become a matter of the architecture of information policies and data values as much as it is about storage topologies and device management.
It will take another one or two years of development before ICM finds its way into the hands of Tier-1 storage vendors. Because ICM enables enterprise IT to establish such direct control over high-profile processes like compliance and corporate security, ICM is a technology class that appears destined for wide deployment atop enterprise storage infrastructures.
This was first published in August 2005