Migrate data without mistakes

Data migrations are a fact of life. In many cases, the migration ends up being a tedious process. Automated tools can help ease migration woes. Host-based migration software takes the load off the storage array and can easily bridge the gap when migrating data between heterogeneous storage systems. But array-based migration may be preferred for technology refreshes.


Moving data from one array to another, or from one tier of storage to another, is a tedious process that's slowly becoming more automated.

With storage capacity growing at 50% annually, an unending cycle of technology refreshes, server and storage consolidation, and data classification is driving the need to move data from one tier of storage to another or from one array to another. Data migration has become a way of life for storage administrators.

In a perfect world, you would use automated tools to migrate data. Yet data migration is too often done manually: an administrator takes the system offline, backs it up to tape, installs the new array and recovers the data to its new location. A complicated data migration may include 50 or more steps, and take a night or even a weekend to perform. For businesses operating 24 hours a day, 365 days a year, this is simply too long to have a system down.

"Downtime is costly; it costs me $30,000 an hour. That's not really that large an amount, but not having to take the NetApp filer out of service and plan downtime in off hours is beneficial," says Stephen R. O'Neill, VP of technology at Oversee Domain Services, a division of Oversee.net, in Los Angeles. "My engineers don't have to be up in the middle of the night and do all the things you do to mitigate the impact of maintenance."

You would think that migrating data, which is such a routine and tedious operation, would be easier.

According to a survey conducted by the New York City-based research firm TheInfoPro Inc., problems arising from migration include users suffering from extended or unexpected downtime, technical compatibility issues, data corruption, application performance issues, and missed data or data loss.


A migration checklist
Many migration projects suffer from poor planning, which results in excessive or unexpected downtime. Outlining the steps in any migration will assist in successful migrations.

  • Identify the personnel involved in the migration--storage administrator, database administrator, application manager and security officer--and solicit their expectations and goals regarding the migration
  • Identify the applications, functions, host servers and storage impacted by the data migration
  • Discover the data that will be migrated
  • Determine when the migration will occur, how long it will take and how long the systems will be down (if necessary)
  • Back up the data on the device that the data will be migrated from
  • Record the configuration of both the source and target arrays involved in the migration
  • If you're using scripts to perform the migration, review them for reliability and accuracy
  • Record OS-level permissions, directory structure and share permissions
  • Verify LUNs and volumes on source and target devices
  • Perform a fire drill of the migration
  • Review all changes to the system configuration
  • Review LUN and volume information for both the source and target
  • Test the rollback or fallback process
  • Verify the results; check for data integrity and any possible corruption
  • Validate the results; test network access, file permissions, directory structure and applications
  • Review the project for issues to correct or improve for the next migration


The majority of data migrations occur when storage equipment comes off lease and is replaced with a new system. Problems migrating data from one array to another or from one filer to the next are compounded by the heterogeneity of the devices and, often, a lack of software to move the data in an organized, automated fashion.

Consider this scenario: Peter Fitch is the infrastructure manager at Rudolph Technologies Inc., a semiconductor services company in Bloomington, MN. He migrated his firms's data from Dell Inc.PowerVault arrays to a Compellent Technologies Inc. Storage Center SAN two years ago. "Back then, we used the old tape backup method," he explains. "We would have had to back up to tape, create a new volume on the tier of storage we wanted to use, restore that data and recreate all the shares as well. Time constraints were a concern; we would have had to do the migration over a weekend if it was a larger, substantial-sized volume of 600GB or 1TB, and we would have had to do that in the off hours and ruin the IT staff's weekend doing so."

Fitch's migration jobs are a lot easier now with Compellent's Data Progression Module (DPM), which moves data according to policies. "We just let it go," he says. "Nobody in the company even knows [the migration] is happening." Fitch says 99.95% of the data migrated by the DPM ends up on Tier 3 storage, leaving Tier 1 for important applications that need to be fresh, such as boot from SAN or live databases.

With Compellent's Thin Import capability, Fitch can move data without writing scripts and without the need for backup software or hardware. "You could use a version of Robocopy and script the move, or a trial version of tape backup software," he says. "We just plug the NetApp filer into the Fibre Channel [FC] connection on the Compellent SAN--which showed up as an external device--and just move the data over." (Robocopy is a command line directory replication tool that's a standard part of Windows Server 2008.)

Migration plan
While the goal of any data migration is to move data from one device to another, planning needs to take place to ensure that the migration is successful. Variables to consider include how long the migration will take; the amount of downtime (if any) that will be required; and the risk to the business from technical incompatibilities, database downtime or performance degradation. The plan must also define the data to be moved, where it will be moved to and how it's moved.

O'Neill at Oversee Domain Services uses F5 Networks Inc.'s Acopia ARX Series switch to accomplish his data migrations. "I've used the Acopia ARX switch extensively for data migration, typically volume-to-volume or array-to-array migrations. Usually, I'm managing data at the volume level," says O'Neill. "For instance, if I have four to five filers that have a lot of information on them that I want to move to a different filer or I want to move data off a filer while I do maintenance, I would attach the other filer to the Acopia switch and use the rules engine [within the FreedomFabric Network OS] to move the data over."


Data movers
There are three broad categories of data movers: host-based software, array-based data migration and network appliances.

Host-based software has been the most often-used tool to migrate data. It's best for application-specific migrations such as platform upgrades from Microsoft Exchange 2003 to Exchange 2007, and for database replication and simple file copying. Host-based software such as Symantec Corp.'s Veritas Volume Manager or Brocade Communications Systems Inc.'s StorageX frees the storage array of processing and eases data migrations between heterogeneous storage. It's also more economical than other tools for small-scale migrations, but can become problematic when lots of systems are migrated. Other examples of host-based migration software are IBM Corp.'s Softek Transparent Data Migration Facility, which runs on z/OS, Unix, Linux and Windows servers; Quest Software Inc.'s Storage Consolidator for Windows; and Symantec's Veritas Volume Replicator. All of these software packages can be used to migrate volumes, files or blocks of data.

Array-based software is primarily used to migrate data between homogeneous storage devices and to reduce the impact on host computer operations. Users will likely choose array-based software to move data between generations of a vendor's product. Examples include EMC Corp.'s Symmetrix Remote Data Facility, IBM's Peer-to-Peer Remote Copy and Compellent's Storage Center Thin Import capability.

The scope of array-based software has recently changed. Hitachi Data Systems now offers a controller-based virtualization product with its TagmaStore array that supports the migration of data between Hitachi and non-Hitachi vendor arrays. EMC offers Open Replicator for Symmetrix.

The third type of migration tool used is a network appliance like F5 Networks' Acopia ARX Series switch, Brocade's File Management Engine or Sanrad Inc.'s V-Switch. These devices migrate volumes, files or blocks of data depending on their configuration. For example, the Acopia ARX Series switch migrates file-oriented data between NAS devices and file servers, while the Sanrad V-Switch migrates lock-oriented data.

With a network-based appliance, performance can be improved by aggregating and balancing the migration load across the filers, says Oversee Domain Services' O'Neill. In addition, "if I want to take a filer out of operation for a firmware upgrade, I can migrate the data off the filer, pull the filer out from the back of the Acopia, do the firmware upgrade, put it back in and migrate the data back. There's no disruption to the application," he says.


Practical considerations
Considerations for migration are threefold: whether the migration is between homogeneous arrays, heterogeneous arrays or among different tiers of storage. Migration between single tiers of storage--such as from one primary FC disk to another--can occur in homogeneous storage from a single vendor or in heterogeneous storage from multiple vendors.

Barry Thomas is network administrator at the Graves-Gilbert Clinic in Bowling Green, KY, which migrated to a Compellent SAN in January of this year. Thomas needed to use the most complicated approach, migrating between unlike storage devices.

Thomas chose to migrate the data from an EMC array, a Nexsan Technologies Inc. array and local servers to a Compellent SAN he had purchased. "We didn't do a lot of migrating before; we just way-overallocated storage to take care of that. It's expensive," he says. "The few times we had to do that we took the whole system down, moved from one storage solution to another and brought it back up."

The Graves-Gilbert Clinic had three EMC Clariion CX300 arrays, a Nexsan SATAboy and local storage on its servers, says Thomas. "We chose to give a temporary server the original volume and then give it the new Compellent volume. [We] then copied the data over from the old volume to the new volume using the Thin Import capability. If I gave the server a 100 gig volume that only had 80 gigs of data, then I only really consumed 80 gigs of volume space in the new array," he explains.

"We used scripting on one occasion; during the day, I presented a new volume to the server and used the script to basically shut down all the services, copy data from one volume to the next and [then] send me a message when it was done," says Thomas. "I then came in and changed the volume label and it was good to go."

Eric Nelson, director of information technology and CIO at St. Joseph Healthcare in Bangor, ME, used an appliance-based system to accomplish his data migration between heterogeneous devices. The hospital has 140 servers, eight virtual hosts and 94 virtual machines. It manages 14TB of data with a Sanrad V-Switch cluster, 28TB of data on a Hewlett-Packard (HP) Co. StorageWorks Enterprise Virtual Array (EVA) and 15TB on an EMC Symmetrix. To avoid vendor lock-in and redundant SANs at both of the hospital's sites, Nelson used a Sanrad V-Switch to migrate data from the EMC array to the HP StorageWorks EVA.

"Being that they were dissimilar SANs, I couldn't replicate between the two of them," says Nelson. "The only options vendors were giving me was to buy another one of their SANs and then they would set me up for replication. That's pretty expensive."

According to Nelson, "Sanrad was able to do asynchronous replication between different systems. We moved 13TB from the EMC Symmetrix to the HP EVA 8100. The migration was different depending on the application we used. We migrated the virtual machines with VMware tools. We had some issues with our file servers; for those migrations, we used backup and restore operations. At the same time, we were creating four different Microsoft clusters. We took those systems offline, backed up the data and then restored it to the new systems."


Different tiers
Tiering storage is one of the biggest drivers of data migration. According to a recent survey of Storage magazine readers regarding their migration strategies for tiered storage, nearly half tier storage: of these, almost 21% have four or more tiers, approximately 46% have three tiers and 33% have two tiers of storage. Thirty-four percent of those surveyed say that 31% to 50% of their data resides on Tier 1 storage. Not surprisingly, 40% rely on manual methods for moving data between tiers, approximately 20% use automated methods and the remainder use a combination of both (see "Snapshot: Cheers and jeers for tiered storage").

Rudolph Technologies' Fitch regularly performs data migrations between multiple tiers of storage (see "Old data isn't the only candidate for tiering," below). "We have two different tiers of storage: Fibre Channel and Serial ATA drives," says Fitch. "The Compellent Data Progression Module that's part of Storage Center lets us automatically move data between Tier 1 and Tier 3 storage nondisruptively."

Fitch just sets a threshold for the number of days of nonuse for data. "Image files that aren't refreshed frequently are moved down to Tier 3 storage automatically. Only frequently accessed files stay up on our large, expensive Fibre Channel drives," he notes.

Data migrations are a fact of life and automating the tedious process can be well worth the effort. Users have found that host-based software is perhaps the best for application-specific migrations such as database replication. It frees the storage array of processing and can often be used more easily for migrating data between heterogeneous storage.

Array-based migration, such as that used by Compellent Technologies' customers, has also proven to be popular, and legions of customers use EMC's Symmetrix Remote Data Facility to move data between EMC arrays. Many users also rely on their chosen array's data utilities to migrate data from one array to another or to move data between tiers. Array-based migration from one array to another is a popular option not only for technology refreshes, but for heterogeneous storage migration. Finally, the use of a network appliance provides additional flexibility for moving file-, block- or volume-based data among heterogeneous devices.


Old data isn't the only candidate for tiering
Moving data among tiers of storage isn't as easy as it may seem. Many users move data based on its age or length of inactivity. However, there are other criteria for the movement of data that you should consider:

Medical and Patient Information: Moving this type of data depends on the age of the patient. If the data is about a child, it must be accessible for as long as 21 years. If the patient is an adult, data needs to be available, according to differing state laws, for the life of the patient; some states also require availability for a period of time after the adult patient dies. Maintaining this type of information, which may be needed immediately in the event of a medical crisis, is best done on Tier 1 storage for current and recent patients, and on Tier 2 storage for patients seen less recently.

Intellectual Property: Here again, the age of the data isn't a factor for where it should be stored. Many users store intellectual property data that's core to their businesses on Tier 1, which is readily accessible storage.

Electronically Stored Information (ESI): This data, comprising personnel records or any other company information that could be the subject of litigation against the company, is often stored on Tier 1, 2 or 3 devices. If the data becomes relevant to a lawsuit, it must be readily accessible and searchable. The data must be tiered in an appropriate manner so it can be accessed quickly; in some cases, this requires moving some Tier 3 data stored on tape back to Tier 2 or Tier 1 disk storage. Many email applications support the tiering and migration of emails.



Dig Deeper on Storage migration