Automate data migration

Moving seldom-accessed data from primary storage to less-costly storage not only saves money, but can also improve the performance of applications. Hierarchical storage management (HSM) software can help automate the migration of files, but HSM products vary in the way they approach the task. So it's important to identify the requirements of an HSM product before making a choice.

This article can also be found in the Premium Editorial Download: Storage magazine: Salary survey reveals storage skills are in demand:

Moving seldom-accessed data from primary storage to less costly storage is a good idea. HSM products can do the job, but each one is closely aligned to a specific backup software product and file system.


Sir Isaac Newton's first law--the law of inertia--is as applicable to the world of data management as it is to our planet. Experts estimate that 60% to 80% (or more) of organizational data is never accessed after it's created; files often remain at rest, untouched and unmoved from companies' most expensive storage arrays. Hierarchical storage management (HSM) software can force the migration of these files to more economical resting places while keeping them accessible and retrievable.

HSM software lets users set policies to migrate files from primary disk to other types of storage media based on vendor-provided and user-created policies. These migration policies designate which files to migrate to what storage media, and when these files should be migrated.

HSM software is based on three architectures:

  • Backup software. Symantec Corp.'s Veritas NetBackup Storage Migrator for Unix and IBM Corp.'s Tivoli Storage Manager (TSM) use their respective backup software catalogs to centrally initiate, manage and track file migrations. IBM TSM has one version for Windows (HSM for Windows) and another version for Unix and Linux (TSM for Space Management), but both use the same TSM backup software catalog.


  • Alternative HSM approaches
    Within the realm of hierarchical storage management (HSM) software, there are other products that attempt to address specific data migration requirements.
    • Intelligent information management software.
      Arkivio Inc.'s auto-stor, Overtone Software's ManageTone for File Systems and Scentric Inc.'s Destiny are examples of products that migrate data, but can also build relationships among the different data repositories within a user's infrastructure, such as e-mail and database stores. These products provide insight into the route HSM software will likely follow, as they allow users to analyze the content of multiple data stores within an organization. With these software products, it's possible to see relationships among different data stores or retrieve all references to specific files, e-mails and database entries based on a particular search.


    • Microsoft Windows-only HSM software.
      Some products, like Hewlett-Packard (HP) Co.'s StorageWorks File Migration Agent (FMA) Software and BridgeHead Software's HT FileStore, take a more restrictive approach to HSM by offering support only for the Windows operating system. FMA primarily moves files to different types of HP storage hardware, including the firm's content-addressed storage and NAS products. HT FileStore supports all vendor's storage products, including tape and optical.

  • File-system specific. File-system-based products like Silicon Graphic Inc.'s (SGI) InfiniteStorage Data Migration Facility (DMF) require users to first install SGI's Shared Filesystem CXFS on hosts so DMF can take advantage of CXFS' data management APIs.


  • Host agent. Products like CA Inc.'s BrightStor HSM and EMC Corp.'s DiskXtender place their agents on each managed server. For most HSM products, each server tracks the files it migrates on a local database.
It's time to consider an HSM product if your servers regularly run out of storage space, backups take longer to complete, your storage equipment budget is tight or file servers require increasingly more administrative intervention. In addition, new regulations, such as HIPAA and Sarbanes-Oxley, are forcing companies to retain and manage growing amounts of unstructured data for longer periods of time, which may require the use of HSM software to manage these obligations.

HSM software allows users to perform host-based, file-level data migrations to magnetic, optical and tape media over a variety of network protocols. Each HSM product has different levels of dependencies on backup software, file systems and network protocols. Satisfying these requirements can add significant cost and complexity to the HSM product installation. Other key areas to consider when evaluating HSM products are the types of external media and operating systems supported, as well as how the product controls file migrations and recalls (see "Alternative HSM approaches," above right).

HSM software dependencies
All HSM products have varying degrees of dependencies with different parts of your storage infrastructure, and you'll most likely need to modify settings in your backup server software, network and app servers. For example, Veritas NetBackup Storage Migrator for Unix 6.0 requires Veritas NetBackup 6.0 server. You must install the central Storage Migrator management interface on a Veritas NetBackup server and then install the Storage Migrator client on the hosts. Even the Storage Migrator host agent has prerequisites, as it only supports hosts with HP-UX or Solaris OSes with a current version of the Veritas file system (VxFS).

The presence of an advanced file system like VxFS is a requirement for a number of HSM products, as newer versions of Unix file systems include the Data Storage Management API (XDSM) formerly known as DMAPI. XDSM is a standard API that allows the HSM product to interface with a file system using common commands that control access to files and notify apps about the operations performed on files.

Like Veritas NetBackup Storage Migrator, IBM's TSM HSM products--TSM for Space Management and HSM for Windows--take advantage of these XDSM features and support IBM's own General Parallel File System (GPFS) in addition to VxFS. GPFS is a separate product, however, and it needs to be purchased, installed and configured on AIX and Linux systems for the HSM product to work.

SGI's InfiniteStorage DMF is also dependent on a XDSM-enabled file system. But unlike some other HSM products, InfiniteStorage DMF requires users to install the XFS or CXFS file system on host systems prior to installing the DMF HSM software product. While SGI offers its CXFS file system for most major OSes, it's a product that's purchased and licensed separately from its InfiniteStorage DMF software.

A final concern that users need to address with HSM products is how the central server communicates with client servers over the network (see "Best practices for using HSM software," below). Products like IBM's TSM for Space Management and HSM for Windows, and SGI's InfiniteStorage DMF use a proprietary protocol to communicate between server and client. Although this may require users to open additional ports on their network, it lets users secure communications between the server and clients. Products from CA, CommVault Inc. and Symantec use more common protocols like CIFS, FTP or NFS to communicate; this may present security risks in some environments because they don't offer options to secure transmitted data.

Best practices for using HSM software

There are no hard-and-fast rules on how to best use hierarchical storage management (HSM) software, but the following tips address some of the most important considerations for selecting, installing and using an HSM product.



Classify your data. This usually requires the use of a host-based, file-level storage resource management (SRM) tool that identifies and classifies files. Reports will break files down by attributes such as file size, usage patterns, last-accessed or modified date, and owner.


Implement HSM software as part of a technology refresh. Rather than trying to justify HSM software as a standalone purchase, wait until more storage is needed. Then purchase HSM software as an alternative to buying more high-end storage as it will allow you to migrate files to lower cost storage systems.


Identify what files you don't want migrated. Database files, libraries and executables are examples of files that may not be accessed very often but need to stay on primary disk. Some HSM software packages, like CA Inc.'s BrightStor HSM, include default policies that prevent these files from being accidentally migrated.


Choose an HSM software package recognized by your backup software. Your backup software needs to recognize stub files created by the HSM software so that when backups occur, the backup software backs up only stub files or inodes and doesn't recall migrated files from tape or optical media.


Determine if you want to manage migrated files centrally or locally. Products such as EMC Corp.'s DiskXtender and CA's BrightStor HSM don't provide a central catalog or database; they manage the location of all file migrations on a local database on each host. This eliminates the requirement to communicate with a central server and create a high-availability configuration for the central server. Conversely, centrally managed HSM products, such as IBM Corp.'s Tivoli Storage Manager for Space Management and HSM for Windows, use a central server that eliminates the need for admins to log on to each server to manage the HSM software. It also lets admins create global policies that can be applied to all managed hosts.


Consider the dependencies of HSM software. Most software products have some prerequisites, including the need for specific backup software or versions of file systems. If not already in place, these can add to the cost and difficulty of the installation and configuration of the HSM software.


What operating systems does the HSM software support? The number of OSes supported by HSM software products varies, and most vendors offer different HSM software products for Unix and Windows because of differences in the file systems of these operating systems.

CAS, tape and optical
Most XDSM-compliant file systems don't support the ability to interface with removable media such as tape and optical, or content-addressed storage (CAS) such as EMC's Centera. To interface with these storage devices, some vendors tightly couple their HSM and backup software products. For example, IBM and Symantec have meshed their HSM and backup software to take advantage of the tape and optical APIs developed for their backup software clients and use them to allow their HSM software to support a wider range of removable media. This integration also allows the HSM software to interface with the backup software catalog. By making a call to the backup software catalog, the backup software can recall a file on any tape or optical media and then present that file back to the requesting host.

Other products, like CommVault's QiNetix DataMigrator, let users ease into an HSM implementation that manages tape and optical media without using CommVault's Galaxy backup software. By including tape and optical APIs, and the same database used by Galaxy, QiNetix DataMigrator can migrate data to and recall data from removable tape and optical media.

CA's BrightStor HSM works in a similar manner to CommVault's product; EMC's DiskXtender is also transitioning to this type of architecture. BrightStor HSM borrows its tape and optical interface from CA's BrightStor ARCserve backup software and bundles it into its HSM client agent. CA also provides the ability to communicate with some CAS products like EMC Centera.

EMC faces a different dilemma. Although its Disk-Xtender product uses the same set of tape and optical APIs it always has, EMC has two sets of APIs that interface with disk and optical media: one from NetWorker (through its acquisition of Legato) and one from DiskXtender. An EMC spokesperson says EMC is working on one set of APIs for both products.

HSM Software
Click here for a listing of HSM software products (PDF).

Stub files and inodes
Once the HSM software is configured to work with the different types of media in your storage environment, HSM software products use either stub files or inodes (in a Unix-based OS, an inode is a stored description of an individual file) to track the location of migrated files. Most HSM software products create stub files that are approximately 4KB in size. These files may contain information such as the new physical location of the file, file attributes such as last-accessed date or security permissions, and the file's header information. Some products, like SGI's InfiniteStorage DMF, use inodes created by the host's file system. For DMF, inodes are 256- or 512-byte data structures created for each file by SGI's CXFS file system. DMF then accesses a field reserved for it within each file's inode and sets a flag to indicate if the file has been migrated.

While stub files and inodes have similar purposes, each HSM product varies in how it creates and manipulates them. For instance, when CA's BrightStor HSM migrates a file, rather than using the native file-system copy or move utility, it uses its own copy utility that's installed with its agent software. The utility identifies a file to be migrated, locates its intended destination, copies the file to that location and then creates a stub file on the primary disk that includes the file's new physical location. Before deleting the original file that's still on primary disk, it verifies that the file copy is good.

SGI's InfiniteStorage DMF uses a similar process to migrate the file, but unlike CA's BrightStor HSM, it stores the file's location in DMF's central meta data server and changes the flag setting in the file's inode. When a request for the file is made, the CXFS file system recognizes from the file's inode that the file is no longer on local disk, so it makes a call to the DMF meta data database server for the migrated file's location.

As part of the stub file creation, HSM products like those from CA and EMC let users set the size of the stub file and its contents. The rationale behind this is to control how much of the original file's header information an administrator wants to retain on primary disk. For instance, an administrator who needs to manage a large number of files containing photos or videos may include descriptive information about the photo or video as part of their file content. Rather than migrating the entire file and keeping just the file attributes, the stub file will retain this information, which can be searched along with the file attributes. This expedites searches for specific files, frees space taken up by infrequently accessed file content and prevents massive recalls of file data when searching for just small portions of the file.

Users of EMC's DiskXtender can take advantage of another product option--its high watermark threshold. This feature migrates the file and creates the stub file, but keeps the original file on primary disk until disk capacity reaches a certain threshold or watermark. This lets users keep their data on faster, primary disk until this threshold is reached to facilitate faster file recall. Once the threshold is reached, DiskXtender kicks in and starts to move files, but only after updating each stub file to point to the migrated file's new location. This process of updating stub files and deleting files on primary storage continues until DiskXtender reaches a low watermark or the last file to be migrated.

Ties to backup software
HSM software users may encounter a problem if their backup software can't recognize when a file has been migrated. If the backup app doesn't recognize or work with stub files or inodes, already migrated files could be recalled during the backup, creating a huge amount of network traffic and overhead on the server.

Of the three most-used enterprise backup applications, EMC's NetWorker and Symantec's Veritas NetBackup are HSM-aware and know how to handle stub files created by EMC DiskXtender. But IBM TSM users need to work with EMC because TSM isn't aware of stub files created by DiskXtender, and it will attempt to recall and migrate those files.

SGI's InfiniteStorage DMF HSM software integrates with EMC's NetWorker and Atempo Inc.'s Time Navigator backup software to avoid this. During backup, current versions of these two programs check the inode flag associated with each file. If the inode shows that the file has already been migrated, the backup software only needs to back up the file's inode; this dramatically reduces the backup window.

A benefit of using the same product for both backup and HSM is that the two products often share a common catalog, such as with CommVault, Symantec, and IBM TSM for Space Management and HSM for Windows. During backups, their backup software checks the catalog to see if the file has been migrated; if it has, only the stub files created by their HSM product are backed up.

HSM software products provide a potentially powerful alternative to the brute-force option that most organizations use to manage their data. But HSM software packages require tight links to specific backup software products and file systems. The key to a successful HSM acquisition and deployment is to identify an HSM software product's dependencies before you buy it. And if too many prerequisites exist, the time, effort and cost required to fulfill these requirements may erode the benefits it can deliver.

This was first published in November 2006

Dig deeper on Data management tools

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchSolidStateStorage

SearchVirtualStorage

SearchCloudStorage

SearchDisasterRecovery

SearchDataBackup

Close