|How to proceed with ILM|
First, let's clear the air: ILM is not a product--it's a process that manages data from its birth to disposition. Once your company has an ILM process, then and only then, can you start to cobble together products that support the process. (See "How to proceed with ILM")
Storage vendors are scrambling to beef up their alliances and product lines to offer an integrated ILM solution. Storage Technology Corp. (StorageTek), for example, forged a partnership with Storability Software to more tightly integrate Storability's Global Storage Manager into StorageTek disk and tape drives. EMC Corp. acquired Legato Systems and Documentum to bolster its ILM offerings--and now the company faces the huge task of weaving together the technology it's acquired into its own solutions. Companies such as IBM Corp. and Hewlett-Packard Co. (HP) find themselves in positions similar to EMC, with a stable of disparate internal and acquired solutions that have limited or no integration between them. Veritas Corp. jumped on the ILM bandwagon when it released its Data Lifecycle Manager 5.0, a policy-driven data archive engine that works with Veritas' NetBackup and Backup Exec.
To date, no vendor offers a cradle-to-grave ILM product that easily solves all of an enterprise's compliance needs. Chris Van Wagoner, CommVault Systems' vice president of product marketing, says that the level of integration among most vendors' products is one step above brochure level.
Turning data into information
Steven Murphy, Fujitsu Softek's president and CEO, believes that the key to turning an organization's raw data into viable business information starts with gaining visibility into one's existing environment, in two ways. First is the picture of the different types of data stored on internal and external storage, including databases, files, e-mail and fixed-content storage. The second view gives you the ability to visualize, monitor and manage the devices in their existing storage networking infrastructure.
Products that support either of these two views typically get classified under the general category of storage resource management (SRM), even though the two types of tools gather and provide very different types of information. Tools that provide details on databases, files, e-mail and fixed content may more appropriately fall under the subcategory of storage reporting. The other tools, which visualize, monitor and manage the storage network devices--such as switches and storage arrays--may be more appropriately classified as storage infrastructure management software.
Of these two approaches, it's the tools that support the storage reporting component that initiate ILM. When they begin to classify and document their data, organizations begin the process of translating raw data into meaningful information.
Classifying the information
Reports on storage utilization enable the classification of the data on servers. The Enterprise Storage Group, Milford, MA, finds that data may be classified in at least four ways: data type, organization, data age and data value. (See "Classifying the data," on this page.) Breaking data out into these different categories helps educate the organization on the nature of the data it owns and provides the facts needed to build business cases for future storage and data automation management technologies. Here's where the tool's ability to recognize different data types becomes paramount.
Users must have some fundamental understanding of the application data that they are gathering information on before they even begin to deploy a tool. Without this understanding, don't expect to deploy a tool enterprisewide and gather all of the data you need.
EMC hopes to help users solve one of their bigger issues--classifying unstructured data--through the company's recent acquisition of Documentum. Mark Lewis, EMC's executive vice president of open software, observes that 80% of existing data is unstructured and 90% of newly created data is digital. Yet until users classify their data, they can't take the next steps of protecting it, applying policies, moving it or deleting it.
This data classification step can unveil some important facts and save money at the same time. A recent case study conducted by DeepFile Corp. at Vignette Corp.--both located in Austin, TX--uncovered two important facts. It revealed that more than 50% of the files on Vignette's most expensive storage devices hadn't been accessed in more than a year and that a large amount of storage was being consumed because of file and directory duplication. Armed with this knowledge, Vignette purchased an inexpensive nearline ATA storage array and moved its older files to the lower-cost disk.
Another key feature for a storage reporting tool is its ability to interact with devices in the storage network. For instance, in networked storage environments, being able to do a logical-to-physical mapping to find out which physical storage device the data actually resides upon is extremely helpful. EMC users should take a serious look at EMC's Storage Scope, which reports how each server's files are mapped and laid out. Unfortunately, you have to spend some time configuring the tool, and the tool itself is particular about the environment it's deployed in.
Users also need to understand what sort of data they want to report on and manage. If users just want a more generalized storage utilization reporting tool, almost any SRM tool will do. Yet for those who want more advanced storage reporting capabilities--such as the ability to report on instant messaging files, data in tape silos or even existing paper files--they will likely need different tools to get reports on all these types of data. For instance, Storability's GSM tool will give you detailed reports on the utilization of the tapes in StorageTek's tape libraries. But, if you're using that tool to monitor tape libraries from IBM, don't expect the same level of detail.
Another item users will want to consider is who will control the verification and distribution of the information after it's gathered. Finding out that you have 10TB of unused EMC Symmetrix or HDS Lightning storage may sound great until the financial guys figure out that someone overspent to the tune of $1 million.
|Data retention requirements|
Acting on the information
Theresa O'Neil, director of storage strategy at IBM Tivoli, believes that users will look to better manage the data itself by getting rid of their nonessential data and tagging remaining data according to its content. "Organizations contain lots of processes," she says. "While it is important for them to retain data to support these processes, the greater issue becomes: can they retrieve the data needed to support these processes in a specified situation?"
O'Neil also says that while data may remain in existence for years, the media on which it resides on the back end will likely change. In response to this, IBM separates the management of the data and the media. They use their DB2 Content Manager to manage the content of data while their Tivoli Storage Manager product will manage the placement of their data on the media.
Steve Kenniston, a technology analyst with the Enterprise Storage Group, says ILM can be used to initiate conversations with business unit heads about their storage and recovery needs. "Every department head today believes that IT can recover anything anytime and this is just not true," he says, adding that "ILM is all about people, process and technology, not just technology."
Storage admins should look to capitalize on this new pool of knowledge, since they will now, for the first time, probably have hard facts to justify other storage management tools and technologies. These may include automated provisioning, fabric-based virtualization, deployment of new provisioning, fabric-based virtualization and the deployment of new protocols such as iSCSI or inexpensive storage such as ATA. So, while choosing a storage reporting tool may end up being more of a tactical than a strategic move, it will enable users to gather the data they need to make the more important, longer-term strategic decisions. One of those decisions will be the eventual deployment of an automated data management (ADM) solution.
Choosing the right ADM provider
Two important tasks will enter into the decision-making process when selecting an ADM provider. The first will be trying to understand and document the different types of data retention requirements within your company that this product will be responsible for managing. (See "Data retention requirements") The other will be picking a product that permits you to set up policies that manage data from its creation to disposition. Products vary in their ability to set up policies and carry out prescribed actions, such as deleting or migrating data in different environments, and so does their effectiveness at performing these tasks.
These functions become especially relevant in the different user environments that exist today. For users that only need to keep data for hours or days to support applications such as temporary batch files, probably any tool will work. Yet for some financial, government and human resource applications, data retention needs may span years, if not decades. As a result, you need to spend time determining whether your policy management abilities are aligned with your environment.
The situation becomes more complicated if you're using different vendors' products or even different products from the same vendor. You will need to ask how the different products hand off data management responsibilities. This becomes especially pertinent when using one vendor's tool to do the day-to-day storage administration and another vendor's tool to do backup and recovery. At some point, the tool doing the day-to-day storage administration will need to hand off the management of the data to the backup product, especially if it gets deleted and archived.
Here's where companies that offer a suite of products should be better positioned. Some of the major players such as Computer Associates (CA), EMC, Fujitsu Softek, HP, IBM and Veritas now possess many of the tools needed to succeed at ILM; others, such as AppIQ, CommVault Systems, CreekPath Systems, OuterBay, Princeton Softech and StorageTek, have only some of the needed components. One of the keys for each of these vendors will be how soon and how well they can cleanly integrate the different tools they own and show value to their users.
Another issue that may inhibit the deployment of ADM stems from the complexity of today's storage networks. EMC's Lewis notes that while ILM may be done now on an application-by-application basis, it still revolves primarily around manual processes and remains far too complicated for the enterprise. He says that technologies such as fabric-based virtualization, which EMC classifies as a data delivery service, will enable ADM to succeed in enterprise environments because it helps to solve some of the complexity of large storage networks.
Todd Rief, StorageTek's director of corporate strategy, calls fabric-based virtualization a "huge technology for ILM."
Fabric-based virtualization enables the creation of a fabric-based volume table of contents (VTOC). These VTOCs will play an important role in the future management of enterprise storage, emerging as a type of network-based storage routing table and functioning in much the same way as Cisco's routing tables, which manage today's data networks. Also, because all servers can theoretically get their storage from this virtualization layer, it creates a new programming layer in the network that will enable the development and deployment of tools such as ASM without costing a fortune.
Will HSM emerge?
One component of ILM, hierarchical storage management (HSM), involves putting the right data on the right level of storage and removing it as it ages. This technology, long available in mainframe 0S/390 environments but considered by some to be a failure in the open systems environment, remains a difficult technology to implement. But through the combination of the aforementioned technologies, HSM may start to become a reality for the enterprise.
CommVault Systems' Van Wagoner says the crux of the problem with HSM is establishing the value of the data. For ILM to work, the software must be able to look at something as simple as a Word document and determine whether that document contains your kid's soccer schedule or a vital business contract. One file you may want to delete and the other you may want to archive.
While technologies such as fabric-based virtualization and ADM are still maturing, the generally available storage reporting and SAN infrastructure tools look ready for prime time. So now is the time for organizations to start to deploy these first-line tools and experiment with some of the next-generation technologies. As the momentum for ILM continues to build, users would be well-advised to start the process of ILM by putting the initial building blocks in place.