This article can also be found in the Premium Editorial Download "Storage magazine: Comparing the top data backup packages."
Download it now to read this article plus other related content.
Faced with the challenges of significant storage and application growth, shortened backup windows and limited IT resources, many organizations are embracing hierarchical storage management (HSM) to archive infrequently accessed data on less expensive storage.
|Storing 80 million images a year|
GE Medical Systems, Waukesha, WI, is a provider of health care productivity solutions and services and is a leader in medical diagnostic imaging technology. The company has more than 300 offices worldwide and $8 billion in annual revenues. Its application service provider (ASP) group offers remote archiving services for digital diagnostic images, allowing its customers to focus on delivering healthcare, rather than managing large-scale storage systems.
GE Medical Systems provides an archiving solution for Digital Imaging and Communications in Medicine (DICOM)-based medical exams and images. "The sheer volume of medical image data is unmanageable for most of our customers," says Sander Kloet, manager of data center operations for GE Medical Systems' ASP services. "Our customers' storage requirements are growing at over a terabyte per year, and managing that growth is one of the biggest challenges faced by these organizations. The demands on a hospital's IT organization are extreme, and often times they don't have the resident storage expertise."
GE Medical Systems supplies a WAN solution to its customers and provides primary RAID-based storage, long-term archival storage or a combination of both, depending on customer requirements. At the heart of the storage solution is an HSM infrastructure consisting of StorageTek's Application Storage Manager (ASM) software running on a pair of Sun Fire 4800 servers that are clustered with a Veritas Cluster Server (VCS). The HSM storage environment consists of a 10TB EMC Symmetrix storage array for the primary storage and a 6,000- slot StorageTek PowderHorn library with 9840B tape drives for secondary storage. The DICOM application servers access the shared ASM file system. ASM manages its file system via user-defined policies to provide unlimited capacity to the application. The DICOM application servers maintain the file system relationship of the hospital's patient records and their associated medical exams in a separate database.
For many years, HSM software solutions such as IBM's DFSMShsm or Innovation's FDR/ABR have been used in the mainframe environment to offset the high cost of enterprise class disk and improve the utilization of tape capacity. HSM--while popular in the mainframe space--has only recently been partially successful in the distributed computing environment.
Factors limiting the use of HSM include: the continuing decline in the price of disk, dramatic increases in disk capacity, limited network or storage bandwidth for data migration and recall and lack of fast access to secondary or tertiary storage devices (optical or tape). But things may be changing. Now storage area networks (SANs), network-attached storage (NAS), Fibre Channel (FC), Gigabit Ethernet and fast access tape solutions provide the technological foundation to build a robust HSM solution.
HSM is the automated migration of files and data across a hierarchy of storage devices. Data management policies govern data migration of inactive data from primary disk or NAS to lower cost storage devices such as nearline tape. The HSM software performs this data migration transparently to the user, and provides fast data retrieval from either online or nearline storage.
Typically, a two-tier HSM strategy is deployed consisting of a high-performance RAID disk or NAS as the primary storage and automated tape as the secondary storage. Optionally, a three-tier strategy would include lower cost, high-capacity disk as the secondary storage and automated tape as the tertiary storage. Each media type in the storage hierarchy represents a trade-off between cost and data access time. HSM hardware and software solutions are available from a variety of vendors including ADIC, Hewlett-Packard, IBM, Legato, StorageTek, Sun, and Veritas.
How does HSM work?
Most commercially available HSM software manages data movement between the storage hierarchies. The HSM software virtualizes storage capacity to users and host servers by representing the physical disk or tape storage capacity as a file system image that's infinite in size. The software also manages the storage media and its own catalog, consisting of pointers mapping the logical file data to its actual physical location. The policy-driven HSM engine will periodically scan the file system directories and identify files that have met a predefined criteria for migration. Once identified, the HSM engine will:
- Migrate (copy) the data from primary to secondary or tertiary storage
- Mark the online storage space available for reuse
- Update the file system directory entries to indicate the files have been moved
- Reclaim the online disk space
This was first published in March 2003