Managing and protecting all enterprise data


Starting the ILM process

Information life cycle management (ILM) is the buzz right now. But too much emphasis is being put on products and not enough on understanding that ILM is really a process. Before you buy in, here's how to analyze what it means for you.

How to proceed with ILM
Implement a storage reporting tool. Organizations with any type of external storage device should be looking to get a storage resource management (SRM) tool deployed that offers storage reporting capabilities. While these tools primarily give users a view into their environment, they may discover enough unallocated and underutilized storage that the SRM costs are offset by delays in storage purchases.
Protect your data. After you get a look into your environment, use these reports to verify that all of the data in your environment is protected in a method appropriate to the value of the data. If that's not happening, identify unprotected data and start backing it up. There are ample backup and recovery tools on the market to protect your existing investment in data.
Weigh the pros and cons of SAN management tools. Before running out to buy a storage area network (SAN) management tool, consider the cost and time involved to deploy one. New technologies like fabric-based virtualization and standards such as FAIS and SMI-S may limit the value of current product releases. Also, determine what features you want in these products. Do you just want the ability to visualize and report on devices attached to your storage network or do you want to manage them as well? The more you want, the more you will pay and the longer it will take to set up.
Start understanding and testing fabric-based virtualization. This technology promises to change how storage management is done in storage networks. While vendors look to establish a presence in this space and committees sort through what the standards will be, users still have plenty of time to get acquainted with this technology.
Leave ADM on the shelf. With the limited visibility most organizations have into their infrastructure and the current complexity of storage networks, organizations should, for the most part, leave tools that promise enterprise automated data management on the shelf. Organizations that may want to make exceptions include those enterprises that have single-vendor storage array environments or specific applications that can justify point solutions.
Regulations are fueling the information life cycle management (ILM) frenzy. There are more than 10,000 federal regulations in the U.S. that require data retention periods of up to 30 years or more. Think of the cost of complying. And then think of the cost of not complying. No wonder ILM is this year's most bandied-about acronym.

First, let's clear the air: ILM is not a product--it's a process that manages data from its birth to disposition. Once your company has an ILM process, then and only then, can you start to cobble together products that support the process. (See "How to proceed with ILM")

Storage vendors are scrambling to beef up their alliances and product lines to offer an integrated ILM solution. Storage Technology Corp. (StorageTek), for example, forged a partnership with Storability Software to more tightly integrate Storability's Global Storage Manager into StorageTek disk and tape drives. EMC Corp. acquired Legato Systems and Documentum to bolster its ILM offerings--and now the company faces the huge task of weaving together the technology it's acquired into its own solutions. Companies such as IBM Corp. and Hewlett-Packard Co. (HP) find themselves in positions similar to EMC, with a stable of disparate internal and acquired solutions that have limited or no integration between them. Veritas Corp. jumped on the ILM bandwagon when it released its Data Lifecycle Manager 5.0, a policy-driven data archive engine that works with Veritas' NetBackup and Backup Exec.

To date, no vendor offers a cradle-to-grave ILM product that easily solves all of an enterprise's compliance needs. Chris Van Wagoner, CommVault Systems' vice president of product marketing, says that the level of integration among most vendors' products is one step above brochure level.

Turning data into information
Steven Murphy, Fujitsu Softek's president and CEO, believes that the key to turning an organization's raw data into viable business information starts with gaining visibility into one's existing environment, in two ways. First is the picture of the different types of data stored on internal and external storage, including databases, files, e-mail and fixed-content storage. The second view gives you the ability to visualize, monitor and manage the devices in their existing storage networking infrastructure.

Products that support either of these two views typically get classified under the general category of storage resource management (SRM), even though the two types of tools gather and provide very different types of information. Tools that provide details on databases, files, e-mail and fixed content may more appropriately fall under the subcategory of storage reporting. The other tools, which visualize, monitor and manage the storage network devices--such as switches and storage arrays--may be more appropriately classified as storage infrastructure management software.

Of these two approaches, it's the tools that support the storage reporting component that initiate ILM. When they begin to classify and document their data, organizations begin the process of translating raw data into meaningful information.

Classifying the information
Reports on storage utilization enable the classification of the data on servers. The Enterprise Storage Group, Milford, MA, finds that data may be classified in at least four ways: data type, organization, data age and data value. (See "Classifying the data," on this page.) Breaking data out into these different categories helps educate the organization on the nature of the data it owns and provides the facts needed to build business cases for future storage and data automation management technologies. Here's where the tool's ability to recognize different data types becomes paramount.

Users must have some fundamental understanding of the application data that they are gathering information on before they even begin to deploy a tool. Without this understanding, don't expect to deploy a tool enterprisewide and gather all of the data you need.

EMC hopes to help users solve one of their bigger issues--classifying unstructured data--through the company's recent acquisition of Documentum. Mark Lewis, EMC's executive vice president of open software, observes that 80% of existing data is unstructured and 90% of newly created data is digital. Yet until users classify their data, they can't take the next steps of protecting it, applying policies, moving it or deleting it.

This data classification step can unveil some important facts and save money at the same time. A recent case study conducted by DeepFile Corp. at Vignette Corp.--both located in Austin, TX--uncovered two important facts. It revealed that more than 50% of the files on Vignette's most expensive storage devices hadn't been accessed in more than a year and that a large amount of storage was being consumed because of file and directory duplication. Armed with this knowledge, Vignette purchased an inexpensive nearline ATA storage array and moved its older files to the lower-cost disk.

Another key feature for a storage reporting tool is its ability to interact with devices in the storage network. For instance, in networked storage environments, being able to do a logical-to-physical mapping to find out which physical storage device the data actually resides upon is extremely helpful. EMC users should take a serious look at EMC's Storage Scope, which reports how each server's files are mapped and laid out. Unfortunately, you have to spend some time configuring the tool, and the tool itself is particular about the environment it's deployed in.

Users also need to understand what sort of data they want to report on and manage. If users just want a more generalized storage utilization reporting tool, almost any SRM tool will do. Yet for those who want more advanced storage reporting capabilities--such as the ability to report on instant messaging files, data in tape silos or even existing paper files--they will likely need different tools to get reports on all these types of data. For instance, Storability's GSM tool will give you detailed reports on the utilization of the tapes in StorageTek's tape libraries. But, if you're using that tool to monitor tape libraries from IBM, don't expect the same level of detail.

Another item users will want to consider is who will control the verification and distribution of the information after it's gathered. Finding out that you have 10TB of unused EMC Symmetrix or HDS Lightning storage may sound great until the financial guys figure out that someone overspent to the tune of $1 million.

Data retention requirements
Different industries have different data retention requirements. Users need to make sure that the policies they set up for their information life cycle management (ILM) software retain their data long enough to comply with current requirements. They also need to weigh the pros and cons of keeping data around longer than what is legally required. While mining years- and decades-old data may provide interesting and useful information, it may also expose organizations to unneeded risk if there's no legal requirement to keep it around. Here are some examples of data-retention requirements in different industries:
Health care: The Health Insurance Portability and Accountability Act (HIPAA) regulation requires that health care organizations retain records (electronic and paper) for a minimum of six years. Records must also be retained for two years after a patient's death.
Financial: The Sarbanes-Oxley Act of 2002 mandates that accountants who audit or review financial statements of issuers must retain certain records for a period of five years after the end of the fiscal year in which the audit or review was concluded.

The Securities and Exchange Commission (SEC) requires that financial services firms store all e-mail traffic in its original form for at least three years and that they make those communications "accessible" for the first two years.

Banks and financial institutions in New York now have to keep ATM surveillance tapes for 45 days, instead of 30 days, to comply with New York state's recently strengthened ATM Safety Act./td>

Government contractors: Government contractors must keep track of their books, documents and accounting practices for one to four years. Regulations vary depending on the size of the company and the type of information. Firms with fewer than 150 employees or with contracts smaller than $150,000 only need to retain records such as employee information for one year, while larger firms need to track items such as paid, canceled and voided checks for up to four years.
Employee records: The Fair Standards Labor Act (FSLA) requires that employee records pertaining to payroll be kept for either two or three years, depending upon what type of payroll and earnings information is in question. The Occupational Safety and Health Administration (OSHA) requires that records about job-related injuries be kept on file for five years. OSHA also requires that records pertaining to medical exams that involve toxic substance and blood-borne pathogen exposure be retained for up to 30 years.

Acting on the information
Theresa O'Neil, director of storage strategy at IBM Tivoli, believes that users will look to better manage the data itself by getting rid of their nonessential data and tagging remaining data according to its content. "Organizations contain lots of processes," she says. "While it is important for them to retain data to support these processes, the greater issue becomes: can they retrieve the data needed to support these processes in a specified situation?"

O'Neil also says that while data may remain in existence for years, the media on which it resides on the back end will likely change. In response to this, IBM separates the management of the data and the media. They use their DB2 Content Manager to manage the content of data while their Tivoli Storage Manager product will manage the placement of their data on the media.

Steve Kenniston, a technology analyst with the Enterprise Storage Group, says ILM can be used to initiate conversations with business unit heads about their storage and recovery needs. "Every department head today believes that IT can recover anything anytime and this is just not true," he says, adding that "ILM is all about people, process and technology, not just technology."

Storage admins should look to capitalize on this new pool of knowledge, since they will now, for the first time, probably have hard facts to justify other storage management tools and technologies. These may include automated provisioning, fabric-based virtualization, deployment of new provisioning, fabric-based virtualization and the deployment of new protocols such as iSCSI or inexpensive storage such as ATA. So, while choosing a storage reporting tool may end up being more of a tactical than a strategic move, it will enable users to gather the data they need to make the more important, longer-term strategic decisions. One of those decisions will be the eventual deployment of an automated data management (ADM) solution.

Choosing the right ADM provider
Two important tasks will enter into the decision-making process when selecting an ADM provider. The first will be trying to understand and document the different types of data retention requirements within your company that this product will be responsible for managing. (See "Data retention requirements") The other will be picking a product that permits you to set up policies that manage data from its creation to disposition. Products vary in their ability to set up policies and carry out prescribed actions, such as deleting or migrating data in different environments, and so does their effectiveness at performing these tasks.

These functions become especially relevant in the different user environments that exist today. For users that only need to keep data for hours or days to support applications such as temporary batch files, probably any tool will work. Yet for some financial, government and human resource applications, data retention needs may span years, if not decades. As a result, you need to spend time determining whether your policy management abilities are aligned with your environment.

The situation becomes more complicated if you're using different vendors' products or even different products from the same vendor. You will need to ask how the different products hand off data management responsibilities. This becomes especially pertinent when using one vendor's tool to do the day-to-day storage administration and another vendor's tool to do backup and recovery. At some point, the tool doing the day-to-day storage administration will need to hand off the management of the data to the backup product, especially if it gets deleted and archived.

Here's where companies that offer a suite of products should be better positioned. Some of the major players such as Computer Associates (CA), EMC, Fujitsu Softek, HP, IBM and Veritas now possess many of the tools needed to succeed at ILM; others, such as AppIQ, CommVault Systems, CreekPath Systems, OuterBay, Princeton Softech and StorageTek, have only some of the needed components. One of the keys for each of these vendors will be how soon and how well they can cleanly integrate the different tools they own and show value to their users.

Another issue that may inhibit the deployment of ADM stems from the complexity of today's storage networks. EMC's Lewis notes that while ILM may be done now on an application-by-application basis, it still revolves primarily around manual processes and remains far too complicated for the enterprise. He says that technologies such as fabric-based virtualization, which EMC classifies as a data delivery service, will enable ADM to succeed in enterprise environments because it helps to solve some of the complexity of large storage networks.

Todd Rief, StorageTek's director of corporate strategy, calls fabric-based virtualization a "huge technology for ILM."

Fabric-based virtualization enables the creation of a fabric-based volume table of contents (VTOC). These VTOCs will play an important role in the future management of enterprise storage, emerging as a type of network-based storage routing table and functioning in much the same way as Cisco's routing tables, which manage today's data networks. Also, because all servers can theoretically get their storage from this virtualization layer, it creates a new programming layer in the network that will enable the development and deployment of tools such as ASM without costing a fortune.

Will HSM emerge?
One component of ILM, hierarchical storage management (HSM), involves putting the right data on the right level of storage and removing it as it ages. This technology, long available in mainframe 0S/390 environments but considered by some to be a failure in the open systems environment, remains a difficult technology to implement. But through the combination of the aforementioned technologies, HSM may start to become a reality for the enterprise.

CommVault Systems' Van Wagoner says the crux of the problem with HSM is establishing the value of the data. For ILM to work, the software must be able to look at something as simple as a Word document and determine whether that document contains your kid's soccer schedule or a vital business contract. One file you may want to delete and the other you may want to archive.

While technologies such as fabric-based virtualization and ADM are still maturing, the generally available storage reporting and SAN infrastructure tools look ready for prime time. So now is the time for organizations to start to deploy these first-line tools and experiment with some of the next-generation technologies. As the momentum for ILM continues to build, users would be well-advised to start the process of ILM by putting the initial building blocks in place.

Article 10 of 19

Dig Deeper on Storage management tools

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Storage

Access to all of our back issues View All