The pieces that will enable an enterprise-wide information lifecycle management solution are starting to appear,...
but until meta data standards become available, ILM is still a work in progress.
There's been a lot of buzz about information lifecycle management (ILM) and its promise of efficient data and storage management. And while some companies have taken steps toward implementing ILM, few--if any--are doing fully automated enterprise ILM the way it was conceived. Today, ILM implementations are narrowly focused and usually address a specific application like e-mail-related compliance.
"You can't do complete ILM today," says Arun Taneja, founder and consulting analyst at Taneja Group, Hopkinton, MA. "At best, you might find a vertical stovepipe solution." ILM was initially defined as a layered stack of technologies. Some layers, such as the ability to set up storage tiers, have been addressed, while other layers, such as information classification and management (ICM), are only just beginning to be addressed. ICM describes an emerging set of automated tools that combine search, indexing and data movement to help companies categorize their stored data based on its meaning and value to the organization (see "New ICM tools").
Moving data to different types of storage or tiers may prove more difficult than some vendors let on. "If you're using data movement utilities that have microcode dependencies you can have problems, like if you're moving data between an old EMC Symmetrix and a new EMC DMX," says Angelo Castellano, a data replication engineer at Softek Storage Solutions Corp. However, when working at the host level, where ILM data movement tools tend to operate, microcode dependencies don't matter. "The host never experiences the microcode," he says.
This doesn't mean an enterprise can't start to pursue ILM. "ILM is a strategy, not a product," says Brian Babineau, an analyst at Enterprise Strategy Group (ESG), Milford, MA. "There are a number of tactical measures a company can take today to start its ILM strategy." These range from creating data taxonomies to classifying data, to setting up the storage tiers needed for ILM. IT managers also need to line up the enabling technologies that will make automated enterprise ILM possible going forward. This will include technologies that automatically move data between storage tiers and policy engines that will drive the ILM process.
Some of these technologies have been part of the IT landscape for years. Others, like ICM, have only recently emerged. But assembling an ILM technology stack isn't enough; the problem of applying the policies--based on file characteristics--that drive the automated ILM process remains.
Why ILM is so difficult
ILM is more than simply moving data to another type of disk within the array or to a different storage subsystem or tape. "Tiered storage is not ILM, although it is a piece of ILM," says Michael Peterson, program director for the Storage Networking Industry Association's (SNIA's) Data Management Forum (DMF). Rather, "ILM is an information-based management practice that uses information about the data as a central actor for setting policies about what to do with the data," he explains. Those policies can specify performance requirements, retention periods, security needs, service levels and more. Seen in this light, "ILM transcends storage," he says.
In short, ILM describes a management process for moving data based on its value to the organization and its need for protection, availability, speed of access and other services. "ILM is a conceptual state in which data is stored in accordance with its changing business value," says Stephen Foskett, director of strategy services, GlassHouse Technologies Inc., Framingham, MA.
To do this, you need to know the data's value, which isn't as simple as you might think. It requires sophisticated metrics that go beyond aging or frequency of access. "You have to take into account the relevance of the data to the core business and the risk of the data not being available," explains Foskett. Determining the relevance of the data to the business may be the hardest part, and it's a task IT can't do on its own.
Bill Rhyme is manager of support services at a major transportation company. His firm embarked on an ILM initiative six months ago and is still struggling with determining its data's value. "We have the data classified into various categories, but even that doesn't tell us what it is. We're surprised at how much effort even this takes," says Rhyme. The company hopes to have an ILM strategy by the end of the year and to begin implementing pieces of it next year.
For ILM to work across the entire data center, companies need systems that store data based on a unique, meaningful name and meta data stored separately from the data's storage address. That way, data can move to any storage address and the applications can still find it. "Today, we store and access data according to its address and expect it to be there," says Foskett. "If the data is moved, the applications can't find it."
|New ICM tools|
Information classification and management (ICM) represents an emerging class of tools to address information lifecycle management's data classification needs. The tools are rules-driven and use proprietary algorithms to handle such functions as search, discovery, classification and indexing. They use any existing meta data tags attached to the data and classification taxonomies specified by the company. Some tools include a database and also offer data movement capabilities.
Setting data requirements
Another obstacle to implementing an effective ILM storage strategy is the issue of data requirements. "You need to translate the data's [storage] requirements into policies a computer can act upon," explains Foskett. The requirements specify such things as how data should be protected or how accessible it needs to be and for how long. These requirements should be put into the meta data that accompanies the data. Compounding this situation is the issue that "users don't know what the requirements should be," he adds. Most users want their data to be immediately accessible forever and at the highest levels of performance and security.
"We're just starting to think about setting data requirements," says Rhyme. "We need to identify the data owners and they have to make decisions about the data, [such as] the kind of protection or retention it needs." This will involve people at the highest levels of the company. "Company-wide decisions will have to be made about some of the data, like e-mail. Our vice president recognizes [that the decision] can't be driven by storage," he adds.
Today's ILM enablers, such as tiered storage, hierarchical storage management, e-mail archiving and even content-addressed storage (CAS), don't resolve these issues. "Companies are only starting to address things with tiered storage, classification and indexing; the user needs and requirements part hasn't been addressed at all," says Anne MacFarland, director of enterprise architectures and infrastructure solutions at The Clipper Group Inc., Wellesley, MA.
Despite the obstacles, some companies report that they're doing ILM. "But it is more like hierarchical storage management than real, dynamic tracking and movement of data. And they're pretty much using homegrown tools," says Robert L. Stevenson, managing director, storage practice at TheInfoPro Inc., a New York City-based research firm.
Strong interest in ILM
In the latest TheInfoPro storage study, ILM moved from fourth to first on the firm's Heat Index, which reflects the immediacy of respondents' interest in implementing ILM. Thirty-two percent of respondents implemented ILM pilots or plan to implement pilots this year, nearly double the number from a year ago. Policy-based archiving, closely related to ILM, ranked second on the Heat Index.
The ILM elements most sought after by respondents to the TheInfoPro survey were tiered storage, data migration and policy engines. These responses reinforce the view that it's still very early in the ILM game. Data classification, a key component of automated, enterprise ILM, ranked fifth in the functionality importance ranking and sixth in the Heat Index, leaving this critical ILM requirement at the bottom of the list.
A recent IDC survey suggests more ambivalence toward ILM. "ILM had more awareness and was more of a priority with larger firms," says Laura DuBois, research director, storage software at the Framingham, MA-based research firm. But even large firms weren't flocking to ILM: Slightly less than 40% expressed interest in implementing ILM vs. a little more than 15% of smaller firms. "There is still a lot of skepticism about ILM [and] the ability of the industry to deliver on the vision," she says.
At least some pieces of the ILM vision are falling into place, beginning with tiered storage. According to TheInfoPro survey, 37% of respondents had already deployed tiered storage, while 26% reported pilot projects underway or plans to deploy tiered storage in the near future. Still, researchers found complaints about compatibility and interoperability among different tiers of storage--even from the same vendor.
CAS is also gaining traction and can be a stepping stone to full ILM. It allows organizations to sever the link between the data and its storage address, which is a key obstacle to ILM. "CAS addresses data by a unique name, not the address, so you can move the data around," says GlassHouse's Foskett. The problem with CAS today, however, is that the applications don't address the data as CAS records, but as files or blocks. Still, adds Foskett, "CAS can be an ILM enabler by breaking the link between what's being asked for and where it resides."
Storage virtualization products from numerous vendors will mask the physical location of the data. By separating the data from its stored address, applications can access the data logically even as its physical location changes based on ILM policies. Organizations can also use their existing data replication tools, like products from Softek and Symantec Corp., to move data between heterogeneous storage systems in a tiered storage environment. Data movement tools, however, may not move data to different storage subsystems as seamlessly as advertised.
|What's here, what's missing|
Probably the biggest impediment to a successful ILM solution is classifying the stored data. Specialized data classification tools from vendors such as Abrevity Inc., Kazeon Systems Inc., Njini Inc., Scentric Inc. and StoredIQ Corp., are still quite new and haven't been battle tested by many companies. "You can look at file meta data, but that's very limited--it tells you who created the data and when," says Michael Masterson, an information system architecture at a Fortune 500 manufacturing company in the life sciences industry. To appropriately handle the stored data, the company needs much more information. "We need to know if the file contains private data that must be kept confidential, regulated data that must be saved for a certain period of time, proprietary intellectual property or public information that we can make available to anyone," he says.
This can get very complicated. The company uses instruments that generate scientific data based on the Flow Cytometry Standard (FCS). "We need to be able to automatically read the FCS headers and classify data based on what's in there," explains Masterson. The company turned to Abrevity, one of the new breed of ICM tool vendors, to classify stored information so users can find it based on terms in the FCS file header.
"Before Abrevity, we were managing data based on the need for storage space, like dumping in a landfill," says Masterson. "Now we're managing the data based on [its] meaning." Ironically, points out Masterson, today's landfills try to separate the different kinds of trash based on disposal requirements. That's what he's trying to do with his firm's stored data.
"The classification vendors typically are starting with archiving, legal discovery or something like that," says GlassHouse's Foskett. "These tools will enable ILM, but for now the vendors are staying more narrowly focused."
For example, St. Vincent Health in Indianapolis, part of the Ascension Health Network, uses StoredIQ's data classification product for HIPAA compliance. "We had been relying on users to tell us what the data was and that didn't work," says Karen Johnson, HIPAA security officer for the 16 hospitals in the network's St. Vincent region. The HIPAA compliance team found itself manually trying to classify the data. With well over 100TB of stored data on multiple SANs, that proved to be a nearly impossible task.
After looking at some of the emerging data classification tools, Johnson brought in StoredIQ. "They came in, plugged it in and it started data crawling," says Johnson. "I didn't have to give it keywords or taxonomies or anything." The tool classified the data using linguistics, pattern searching and keywords.
However, data classifications need standard taxonomies that must be consistent throughout a company. "In a large company, each department might have its own way of classifying things," says MacFarland at The Clipper Group. This makes ILM impossible, even with classification tools (see "What's here, what's missing," at right).
Basing policies on meta data
Another gap results from a disconnect between policy engines and the other ILM pieces. For example, an ICM tool can recognize patient information, but then what does it do?
"You need to link the classification of the data with policies. Data-driven file systems like Windows FS may help solve this problem," suggests GlassHouse's Foskett. Today, this connection is often made by manually entering meta data.
Masterson hasn't gotten to the point where he has tried to automate the data movement process at his manufacturing company. "We don't have any rules to move data around yet, because we didn't know what the files mean to the business," he says. "We don't want to move files just because they're old; [we want to move them] based on what they mean." Now that Abrevity provides that information about the data, the firm can begin to automate an ILM process. "We think we can use Abrevity for this," says Masterson. "They have some utilities to connect with policy and data movement." If necessary, he's ready to write scripts to do it.
Meta data--information about the stored data used by policy engines and classification tools--has proven frustrating for storage management in general and ILM in particular. Today, the industry is awash in meta data. Every application, such as enterprise content management or records management, and every management tool generates its own meta data. "We end up writing special interfaces for each vendor's product. It takes just a few vendors and you end up with an interface nightmare," says Jered Floyd, vice president of development and co-founder of Permabit Inc., Cambridge, MA, who also co-chairs SNIA's DMF eXtended Access Method (XAM) Initiative.
XAM couples extensible meta data with the data itself as an object called an XSET, and provides a standardized, generic interface. XAM is just one of the DMF initiatives to address the meta data management issues hindering ILM and other aspects of storage management, such as long-term data archiving. Another initiative, the Storage Management Initiative Specification (SMI-S) for Services, "will automate the services layers by providing a common interface," says SNIA's Peterson. "Otherwise, you just have a bunch of point solutions." The Long-Term Archiving and Compliance Solutions Initiative (LTACSI) uses meta data standards to address the challenge of archiving data for 100 years or longer.
For companies wrestling with enterprise ILM, XAM is the initiative with the most immediate value. XAM addresses the meta data needed to automate policy-based decisions surrounding ILM. It promises to provide a single access method to transparently span multiple storage devices and applications. "XAM is an extensible access method run at the application layer," says Peterson. It allows for the automated movement of data between tiers based on standardized meta data and service-level agreements.
Without the XAM interface, meta data tends to be tied to a specific application. And "[the user] ends up being locked into a platform or application," says Floyd. XAM, however, allows for the common access and management of the meta data, says Edgar St. Pierre, an EMC manager and co-chairman of the DMF's ILM Technical Liaison Group.
XAM is in the earliest stages of the standards development process. Floyd expects the initial draft of the standard by the end of 2006, with actual products employing the specification to begin appearing in the second half of 2007. Work is just beginning on the ILM initiative, SMI-S 1.3. Don't expect widespread adoption anytime soon.
While vendors work to fill in the gaps in the ILM stack and to connect the pieces, IT has plenty to do on its own. To begin, IT must enlist the company's business units in the process of identifying, assessing and tagging data based on taxonomies so ICM tools will know how to label it. IT and the business units will then need to hammer out a manageable set of policies to drive the ILM process. At this rate, analysts agree, fully automated enterprise ILM is probably 18 to 24 months away.