When considering strategic options for capacity management in previous columns, I've looked at the shortcomings of traditional storage tiering with hierarchical storage management and those related to software functions such as deduplication and compression. In my opinion, the former has failed to transition smartly from centralized data centers to the more distributed IT infrastructures we see today. This leaves us to search for a more meaningful strategic capacity management strategy: data lifecycle management, also sometimes called information lifecycle management -- or ILM.
Data lifecycle management (DLM) or ILM has a bad reputation in the minds of many IT managers because of a late 1990s swirl of hype, led by three top storage vendors. In short, ILM was sold as a product rather than as a process, and many consumers who bought into the hype realized that they had purchased a "data mover" -- one of four components of an ILM strategy -- and were left to cobble together the other three parts with a paucity of software tools.
IBM wrote the book on ILM several decades ago. In its view, managing storage according to the value of business data required four things. First, you needed to have some means of classifying the data assets that were going to be stored. Second, you needed to classify the storage targets where data was to be placed. Then you needed to develop policies -- rules for moving data from one class of storage to another based on data classification, elapsed time, re-reference rates and similar factors. Finally, you needed a data mover that would move the data from one storage device to another based on the rules established in the policy. Without the first three components, which are the heavy lifting of ILM, a data mover alone was of little value.
Despite the oversell of ILM in the previous decade, the concept remains pivotal to bending the storage capacity growth curve once and for all. Improvements have been made that can help realize the benefits of ILM.
Real-world ILM improvements
In terms of data classification, products now exist that you can use to classify data based on what its owner does for a living at the firm. Third-party software products like StorHouse/Trusted Edge from FileTek (recently acquired by SGI) enable a user's files to be grouped by his or her job category (department or workgroup assignment) within Active Directory. The classified data (Joe works in the accounting department, so his data is classified as "accounting department" data, for example) is then subjected to data placement, protection, migration and archiving processes that are appropriate to accounting data.
Similarly, Microsoft has made available access to its File Classification Infrastructure (FCI), the component of Microsoft operating systems that creates metadata references for file properties such as "Hidden," "Read Only," "Encrypted" or "Compressed." Administrators can now create new FCI properties so that all PCs in accounting can have a permanently assigned property called "Accounting" that will tag all files saved by that user at that workstation. Metadata tags provide, as in the earlier example, a simple method for identifying which data requires which policies.
As with data being classified, ILM also requires storage to be classified. Storage arrays offering specialty features such as fast access or slower access but low cost per-GB, or that offer special data protection services such as mirroring, replication or continuous data protection, can be given their own target classification, thereby providing discrete destinations for data writes and data moves.
The problem that many administrators find is that a lot of storage deployed in the infrastructure is of the same class (say, Tier 1) rather than neatly organized by speeds, feeds and capacities. To parse your existing infrastructure more intelligently, it is extremely worthwhile to consider virtualizing your storage using any of a number of hardware or software approaches. Virtualization enables you to group different storage assets together into "pools," with each pool providing a set of services and hosting conditions that make them appropriate for designation as storage targets. Better still, virtual volumes can be increased in capacity without disruption, facilitating more intelligent and less costly expansion as needs dictate.
With data classes and storage classes established, automating policy-based data movement from one storage class to another is the remaining challenge to data lifecycle management. Numerous file managers and data movers are available in the market, with excellent ones coming from Condrey Corp. (File System Factory), Qstar Technologies (QNM), Crossroads Systems (FileStor-HSM) and others.
New products are appearing in the market, such as Spectra Logic's Black Pearl Server, that intercept data objects directly from pre-established object workflows within applications for delivery to appropriate archival object storage. For industries with regimented workflows, such as audio or video post-production in the media and broadcast world, genomic sequencing and medical imaging in health care, telemetry acquisition in oil and gas exploration, etc., the Black Pearl approach seems to have a lot of merit. Watch this space.