This article can also be found in the Premium Editorial Download "Storage magazine: Comparing the top data backup packages."
Download it now to read this article plus other related content.
|Storing 80 million images a year (continued)|
|Medical images that|
| comprise an exam are written to primary disk in a logical directory structure within ASM. Medical images may typically be added, modified or deleted over a period of four to 12 hours. Based on frequency of access policies that are set within ASM, the associated records for each exam will be migrated to nearline tape. Policies can also be set to ensure patient records are grouped together on the same tape. This grouping facilitates faster retrieval times. Optionally, the images may reside on primary disk until a predefined capacity threshold is reached. Upon migration, the images may be written to one or more different tapes for data availability or off-site vaulting purposes. When a doctor searches for an exam, the DICOM application queries the ASM file system. If the exam--and its associated medical images--resides on the primary disk, it is immediately available for viewing. If the exam isn't on disk, the client experiences a minimal delay, while ASM quickly recalls the exam from tape back to disk. This recall is transparent to the user.
GE Medical Systems chose an HSM solution because of its lower cost of ownership and reduced storage management characteristics. Most hospitals generate between 5,000 and 300,000 medical exams per year and are experiencing storage growth of 5TB to 6TB per year. With its HSM solution, GE Medical Systems plans to store more than 80 million medical images and 25TB per year. Based on this storage growth and the typical frequency of access characteristics for medical exams, it simply wasn't cost effective to purchase disk arrays to store all of this data. In addition, GE Medical Systems wanted to reduce their storage management burden with HSM.
Kloet says, "Once the HSM system is set up with the right policies defined, there is little ongoing management involved." The ASM-based HSM solution also facilitates automated data replication on multiple tapes, one of GE Medical Systems customer requirements. "Customers typically want two copies of data for disaster recovery purposes," he says. "The second copy is sent to an off-site vault."
The design and implementation of an HSM solution can be complex, given the different vendors and storage components involved. Kloet's advice is to get all of the vendors communicating during this phase of the project. "You cannot set up the HSM system right out of the manual. You need different types of expertise, including Fibre Channel, SANs, tape systems and clustering. You need all of the parties involved," says Kloet. He also recommends implementing the HSM solution in a test environment, having a good test plan and verifying the system functionality before rolling it out in production.
HSM and databases
Active--or live archiving--is a new approach to HSM for large databases and data warehouses. While HSM software is well-suited for images and inactive files, databases require a more robust data management solution to facilitate data movement without impacting database integrity or performance. Active archiving software is used to improve the effectiveness of HSM solutions by removing inactive data in a database and creating a file that may be managed by the HSM solution.
Active archiving software will transparently remove inactive historical data from production databases and save it into an archive. The active archiving process will also save the metadata that describes the tables, columns and relationships used to create the archive.
As with all HSM solutions, active archiving allows administrators to define data management policies based on frequency of data access, data type and data relationships that specify when database information will be archived. For example, an insurance company may want to archive all policies that were created more than three years ago for a selected client. This information will be removed from the production database and saved in an active archive file. The user may also restore a subset or the entire archive file with full referential integrity. Optionally, the user may let an HSM solution migrate the active archive file to tape for long-term storage.
An example of active or live archive solutions include Princeton, NJ-based Princeton Softek and their Archive for Servers. Another example would be LiveArchive, a product made by OuterBay Technologies, Campbell, CA. "Most database applications were not designed with data archiving," says Jim Lee, VP of product marketing at Princeton Softech. Once a portion of a database is archived, Lee says, Princeton Softech's customers typically experience a 20% to 25% improvement in database performance.
HSM software allows the storage administrator to set management policies for the automated migration of data from one storage device to another. These policies include such things as file size, frequency of file access, retention period, type of media used for migration and disk capacity. Through the setting of high- and low-capacity thresholds--or watermarks--the storage administrator may control online capacity utilization. Once a high watermark is reached, the HSM engine will search for files meeting the policy-based conditions and automatically migrate them until the low threshold is reached. In addition, the administrator may exclude certain files from the migration process, such as system files, to avoid performance problems.
For example, magnetic resonance images (MRI) files from a radiology center may be written to a high-performance disk array, and if they aren't accessed after a defined period of time, the HSM engine automatically migrates the files to a cheaper storage medium such as tape and leaves a stub file on the primary disk. The stub file is a pointer to the actual location of the data on the secondary or tertiary media, and allows the file to appear to be immediately available. If the image is subsequently needed, the HSM software will intercept the request, automatically recall the image from secondary storage and stage it back to primary disk. If the file isn't changed, it is simply released from online storage, based on the policy settings. If the file is changed, the previous copy on secondary storage is marked invalid and the new file is migrated to a new location. Over time as data is migrated and recalled, the HSM software invokes a process that's called reclamation to free up secondary or tertiary storage by copying the remaining active files off of a highly inactive piece of media onto a new piece of media.
This was first published in March 2003