Managing and protecting all enterprise data


Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Is HSM ready for open-systems storage?

Is HSM ready for open systems or has is it had its day?

The drive to better align data storage policies to business needs is spurring the development of new storage management tools. Two goals are improved efficiency of assets through better utilization and greater effectiveness of people through automation. These goals converge in a renewed effort underway by several vendors to reintroduce a technology that's been around for many years--hierarchical storage management (HSM).

This time-honored mainframe technology (see "When disk wasn't cheap") never caught on in the Windows and Unix worlds. But changes in technology suggest you should take another look at HSM. And if you're thinking of implementing HSM, consider how to integrate HSM into your open systems environments.

A solution in search of a problem? Several factors have combined to limit the adoption of HSM in open systems, notably:

  • Dramatically falling prices for disk storage
  • Distributed nature of open systems environments
  • Fundamental characteristics of open system apps
Traditionally, the driving force behind HSM was the high cost of disk storage. With costs now falling to less than a penny per megabyte, it may appear at first glance that HSM's problems have disappeared. Certainly over the years, it's been far easier for system administrators to increase storage capacity than to attempt to squeeze efficiency out of every disk drive.

This practice resulted in an increase in overall capacity, while utilization rates fell. Although this was wasteful, as long as the emphasis was on the cost of acquisition rather than the cost of management, it seemed reasonable. However, as administrative costs have begun to outstrip hardware costs, IT managers have renewed their interest in any option that can mitigate these costs.

The decentralized nature of open systems also discouraged the adoption of HSM. In the days of non-centralized, direct-attached storage (DAS), the opportunity to reallocate excess storage didn't exist. It wasn't worth the effort to recoup space from a particular system because there wasn't a way to effectively reassign it. With the onset of storage networks, reallocating storage has become more practical, so this objection no longer applies in many environments.

Nor does the issue of open systems' more interactive (vs. mainframe's) character. Open-systems applications tend to be highly interactive, whereas mainframes--to a large extent--perform huge quantities of batch processing. And the interactive nature of open-systems apps doesn't work well with HSM. If you directly applied the mainframe approach to open systems, when users attempted to access a document, for example, they'd be faced with an hourglass icon for several minutes. The potential increase in help desk calls alone is enough to discourage adoption of HSM. That problem is particularly apparent when tape is the target media for HSM data.

Application-focused HSM/HSM-related products
Legato EmailXtender Supports Exchange and Lotus Domino. Part of a family of HSM and archiving products.
Legato DiskXtender Database Edition Supports Oracle databases. Part of a family of HSM and archiving products.
CommVault QiNetix DataMigrator Integrates with CommVault Galaxy backup software.
Educom Exchange Archive Solution (EAS) A full-featured archiving application for Exchange that includes HSM-related features.
KVS Enterprise Vault for Exchange A full-featured archiving application for Exchange that includes HSM-related features.

HSM in today's environment
So, is there a place for HSM? The answer is a qualified yes. Some reasons to consider HSM today are:

  • To reduce costs of storage management
  • To improve backup/restore performance
  • To improve management of large e-mail repositories and other databases
It would be difficult to build an effective business case for HSM based solely on disk drive costs, but you should look at the overall management of storage. In a multitiered storage environment, HSM may help to manage the rate of storage growth at each tier, enabling more of a steady-state operation in an automated fashion. It's feasible in some environments to ensure a consistent targeted storage utilization rate of 80% or more with HSM.

When disk wasn't cheap
With today's steeply falling prices, it's easy to forget that in the early days of computing, resources such as memory and disk were extremely expensive. To maximize the utilization of these valuable and often limited resources, engineers employed some creative techniques.

For instance, virtual memory (VM) was one major development that greatly enhanced the capabilities of computer systems by maximizing utilization of core memory. Unused pages in memory were migrated to slower magnetic disk storage to make room for other pages. If and when the unused pages were needed again, they were recalled into memory and other pages migrated out. The algorithms developed to manage virtual memory have become extremely efficient and reliable, and VM has become a standard function of virtually (no pun intended) every modern operating system.

Because this concept worked so well with memory, it stood to reason that it should also work with disk. The analogy was essentially the same: Magnetic disk was a faster, more expensive medium than magnetic tape. Wouldn't it make sense to migrate infrequently accessed data from the more expensive media to the cheaper one? As you might suspect, the answer to this question was yes, and HSM was born. Just as the first VM systems were mainframes, the same was true for HSM. With HSM, mainframe operators were able to maintain consistently high utilization of their valuable direct-access storage devices (DASD) as data sets were migrated to and from tape, as needed.

Compared with mainframe environments, early Unix and Windows platforms originally had relatively primitive memory management capabilities, but over time, they adopted some of the more advanced mainframe techniques and developed some of their own, as well. However, the same analogy didn't hold with regard to HSM. Although a number of software vendors have introduced HSM products for open systems, by and large, they haven't been widely embraced.

HSM also dramatically improves backup operations. In traditional backup environments, full backups are performed on a regularly. Studies have shown that a large percentage of files on file servers are rarely accessed after a few months, yet these files continue to impact the time required to perform backups and the amount of media consumed.

With HSM, these files would be migrated to tape with stubs--or fingerprints--left on primary storage, greatly reducing the size of the primary data stores. They would no longer be constantly backed up, thereby improving backup and recovery times and reducing tape consumption.

Similarly, a problem plaguing many environments today is the growth of e-mail and databases. Several vendors offer HSM-related products specifically designed for use with applications such as Microsoft Exchange or Oracle that enable the migration of old messages, attachments and infrequently accessed records to other media (see "Application-focused HSM/HSM-related products"). The promised result is a reduction in the size of the primary repositories.

Additional considerations
Integrating an HSM solution into a storage management framework shouldn't be approached without evaluating the impact on the rest of the organization. You should consider four main questions:

How well do you know your data? There needs to be a solid understanding of the data being managed in order to establish appropriate policies that correctly align with the value of the data at risk. Simply determining that a file hasn't been accessed for a certain period of time isn't sufficient to make it a candidate for migration. A clearly defined data classification methodology with broad support within the organization is one requirement for a successful HSM implementation.

What's the impact on users and applications? The impact of delays in accessing data needs to be understood before deploying HSM. Are delays acceptable? Can they be mitigated with near-line storage? Can your applications handle them appropriately?

How does HSM impact backup and other storage operations? It's important to understand what operational changes will be required to accommodate HSM. Also, where will HSM software or agents need to be deployed, and what's the impact?

How can I back out the HSM solution from the environment, if necessary? If HSM turns out to not be the right solution or there's a desire to change vendors, it's important to understand the level of effort and impact to return the environment to its non-HSM state.

With the evolution of storage networks, low-cost disk, and enhanced software offerings, HSM is worth another look. Application-focused HSM solutions, in particular, have the potential to provide some unique benefits. The success of these solutions depends on a clear understanding of requirements, benefits and risks.

Article 13 of 18
This was last published in May 2003

Dig Deeper on Unified storage

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Storage

Access to all of our back issues View All