The best way to move data

Don't get mired in sluggish data. There are best practices for migrating data from point A to point B. Here's how to pick the right method that fits with your company's needs and budget.

This article can also be found in the Premium Editorial Download: Storage magazine: Who owns storage in your organization?:

Data migration checklist
Before starting any type of data migration, follow these steps to ensure a successful migration:
Determine the classes of storage that will be used in your environment.
Identify which data belongs on which class of storage.
Run an analyzer to determine how many files will be migrated and how long it will take.
How many volumes need to be migrated?
How large are the volumes?
Do they need to be grouped? If so, how many need to be migrated together?
How much time is available to complete the migration?
Is the hardware configured correctly?
Are there sufficient resources available in the server (host-based)?
Verify that the high-availability options in place on the source arrays are also in place on the target arrays.
Is sufficient network bandwidth available?
Has all of the needed software been identified, purchased and licensed?
Does all of the data need to be migrated at once, or can it be staggered?
Monitor and adjust performance and migration rate.
Monitor application performance.
Determine the type of application cutover--is it automatic or manual?
Decide how long to keep old data.
Validate the performance on the new storage.
Recover resources from which data was migrated.
Moving data is a constant struggle for most storage departments. Sluggish applications need to move to faster disks for better performance. Rarely accessed data needs to move in the other direction--to less-expensive ATA disks, CD-ROM or tape. If not handled judiciously, data migrations can cause application outages and server reboots, resulting in 2:00 a.m. work sessions.

There are many methods at different price points to move data. Some organizations may be able to simply use the move command that comes with every operating system. Others may need standalone utilities and network- and host-based approaches to get data from point A to point B. What's the best way to move data? To determine which method suits your needs, consider the following:

  • Type of application and data
  • Impact on application performance
  • Storage infrastructure
  • Network throughput
  • CPU and memory consumption
  • Affected users
Once you document your storage environment against this list (see "Data migration checklist"), pick a migration tool. Migration utilities operate at the host, network and array level. Each approach comes with its own set of advantages and disadvantages. For example, host-based commands like move should only be used for files not in use. Third-party, host-based utilities like NSI Software's Double-Take can help users measure and forecast the impact, time and I/O wait times of a data migration before moving the file. Without interrupting the application, EMC Corp.'s Symmetrix Optimizer utility automates load balancing and data placement within the array to improve application performance.

While some applications allow outages, many times it's simply not practical to shut down an application to perform the migration. The utility you choose should be able to monitor the application, increase and decrease the speed of the data migration or even stop the data migration if the application becomes exceptionally busy. This will allow server resources such as CPU and memory to be diverted from the data migration utility to the application.

A specialized migration tool isn't necessarily more effective in moving data. Host-based software such as Veritas Software Corp.'s Volume Replicator is often used to avoid vendor lock-in. However, host-based software can be costly and difficult to manage, depending on the number of servers, operating systems and arrays involved.

Array-based utilities such as EMC's Symmetrix Remote Data Facility (SRDF) and IBM's Peer-to-Peer Remote Copy (PPRC) tie users to a specific vendor's hardware, but can be administrated by a smaller, well-trained staff with minimal intervention required at the server level during the migration process.

Click here for a data replication software table (PDF).

Special types of data migrations
A data migration doesn't always equate to moving data from one disk to another. While these may represent the bulk of data migrations, other types of migrations usually require even more forethought and can make a migration from one disk to another seem like child's play. If you attempt the following data migrations, make sure your back-out plan works and that you can recover to your existing environment.
Database Migrations. Migrating data from a SQL Server database to an Oracle database or vice versa requires that tables, indexes, primary and foreign keys, unique and check constraints and default values as well as security are all factored into the migration plan.
Directory service migrations. Migrating data from Novell's NDS to a Windows ADS or vice versa brings its own set of issues. User accounts and existing domains--along with the corresponding security settings on the objects in the existing environment--must be replicated to the new environment for everything to work. A product such as Bind View's bv-Admin Migration Solutions can help facilitate this sort of task.
E-mail migrations. Moving from one e-mail platform to another, or even upgrading the existing e-mail platform, can disrupt the entire enterprise. Address books, calendars and e-mail retention policies, as well as the messages and their message format, all need to be converted and migrated into the new environment.

Network-based tools such as FalconStor Software's IPStor and DataCore Software's SANsymphony offer a vendor-neutral approach from an operating system and storage perspective. Yet these approaches sometimes make administrators uneasy because of the time required to set up these roducts.

Before a network-based migration commences, an administrator needs to set up zones, allocate LUNs on the new arrays and reboot servers so they can discover the new volumes on the new array. The amount of risk that something will go wrong correlates to how large the networked storage environment is and how well it's maintained and documented. Poorly maintained and documented storage area networks (SANs) may require weeks--if not months--to identify, schedule and verify each server's access to volumes on existing arrays, and also verify their access to volumes on the new arrays after they have been allocated. Migrating data from one database format to another presents a set of additional problems (see "Special types of data migrations").

Host migrations
Host-based replication technologies exist for nearly every major operating system platform, including mainframe, Novell, Windows and most flavors of Unix. The free utilities that ship with the operating systems should only be used with offline applications or files. Look for a third-party solution if you need to maintain application availability while moving data to a new storage location. Products differ in the number of steps required to perform the data migration and how they manage the process.

There are a number of good reasons to use host-based technologies for data migrations such as:

  • Lack of money to purchase replication software
  • Inexperience with array- and network-based solutions
  • Comfort level with existing migration techniques
  • Integration with existing databases or a mix of different vendor's storage arrays
But before selecting a host-based tool, it's important to understand its pros and cons:


  • Migrates from anything (internal or external disk) to anything
  • Inexpensive if used with existing OS utilities
  • Can change volume characteristics during migration
  • Requires root-level access to each server
  • May need to install software in addition to migration software
  • Need to uninstall software following the migration
Administrators also need to consider if and how the utility handles periods of network latency. Not all data migrations will be from one array to another on a high-speed Fibre Channel (FC) network. As organizations consolidate data centers, data migrations will increasingly occur asynchronously over longer distances. So the utility must not only monitor the performance of the application on the server, but the speed of the migration. And the migration software must recover from interruptions that may occur during the data migration.

Tools such as Veritas' StorageCentral and Storage Reporter track data usage and profile storage resources--information essential for a successful data migration. Once the migration begins, Veritas' Storage Foundation analyzes the amount of disk space that will be saved and shows a progress bar as the data is migrated.

Look for products that monitor network traffic during data migrations, and in the event of a network slowdown, store all source changes and transmit them when possible. They should also perform resyncing operations to get the target data back in sync with the source data. And ensure the tools have a central management console from which to manage the migration.

Choose the right data migration approach
The toughest part of any migration may be determining which approach to choose. Here are some guidelines to help you choose the best option.
1 The most practical, cost-effective option may be a simple backup and restore to either tape or disk. Of course, this option will depend on how much data there is to move, how long it will take to complete the backup and restore and whether the application can afford an outage.
2 Host-based data migration software should be the second choice for users who need to keep their applications online while the migration is occurring. Migrations may be done on a host-by-host basis; data can be migrated irrespective of the kind of disk (internal or external); and underlying volume properties can be changed. Look to products such as Softek Replicator or Veritas Volume Replicator for host-based migration.
3 Array-based utilities should be a second choice for shops whose arrays all come from one vendor. These utilities minimize the need to impact the attached servers and the migrations can be configured and completed by administrators trained in the array's tools. Vendors such as EMC, Hewlett-Packard, Hitachi Data Systems and IBM all provide solid products to complete this task.
4 Network appliances should only be used as a last resort because the products are relatively new, virtualization interoperability standards aren't firmly set and vendor lock-in can be costly. While the future of data migration clearly appears to lie with network-based solutions, if you're not committed to moving into a network-based virtualized solution, you should avoid network appliance solutions at this time. If you're ready to make the move, consider solutions from DataCore, FalconStor, HP, IBM or Softek.

Array migrations
For users who want to avoid the pain of installing and configuring software on each server and whose arrays are all from a single vendor, look no further than the utilities natively offered by array vendors. EMC, Hewlett-Packard Co. (HP), Hitachi Data Systems Inc. (HDS) and IBM Corp. enable data migrations between their arrays with minimal intervention on the hosts. These products enable the movement of data between like-arrays while applications are running, regardless of the OS accessing the storage. Yet none of these array-based operations should be confused with point-and-click operations.

Administrators still must do the upfront work. For instance, prior to using EMC's SRDF, administrators must complete a number of tasks, such as verifying that the microcode levels in each array are the same. The SRDF software must be purchased and licensed for both the existing and the new array. The LUN sizes on the new array must be configured to exactly match that of the existing array. Also, as a rule of thumb, almost every array-based approach requires that the source and destination array must be from the same vendor and of the same product line.

But there are a growing number of exceptions to this rule. For instance, data migrations may be done between different generations of EMC Symmetrix arrays as long as each generation's microcode is the same. A migration may also be done from an EMC Symmetrix to an EMC Clariion using Clariion's SAN Copy utility. The SAN Copy utility also enables Clariion arrays to pull or push data to or from any vendor's array. HDS is unique among storage vendors in that it offers the ability to migrate data between its Lightning and Thunder models because they both ship with the same family of code, even though the array models differ.

Monitoring and managing the progress of the migration requires the use of a console provided by each array's vendor. For example, an EMC Symmetrix requires use of its ControlCenter product; for EMC's Clariion arrays, its Navisphere Management Suite must be used. Similarly, on HP arrays, the OpenView Storage Operations Manager is used to manage and monitor the progress of its Continuous Access Data Migration software.

EMC's SAN Copy and HP's Continuous Access reflect an emerging trend toward heterogeneous array support. Even though the software runs on a specific vendor's array, it reflects an increasing willingness by traditional hardware companies to migrate data to and from other vendor's arrays.

Network migrations
Despite the maturity and success of host- and array-based solutions, setting them up is time-consuming. So some companies are experimenting with network-based solutions to simplify and expedite the migration process. Using a network-based appliance requires the same amount of effort as using either an array- or host-based approach. But once the network appliance is firmly entrenched, the pain of future data migrations is eased considerably.

In reality, most users aren't ready to abandon their current storage infrastructure design and move to a network-based solution until standards are firmly established and widely adopted. Also, because virtualization products are relatively new, many companies are hesitant to get locked into a single vendor's solution.

Setting up a server to become a migration appliance is a multistep process. For instance, to use FalconStor's IPStor, the storage administrator needs to designate the hardware to host it. This is usually a general-purpose, off-the-shelf server that supports Linux. Next, a user needs to install and configure the FalconStor software on the server, which is now a migration appliance. Once complete, the appliance needs to be configured to see the existing and new arrays, and then must be enabled to perform the migration. Once the migration is done, the servers and storage infrastructure need to be reconfigured to permit the servers to see the storage on the new arrays. Only after all of the servers can access the new storage can the migration appliance be pulled out. It's a long, tedious process, but it only has to be done once.

Users in homogeneous storage environments should continue to use the utilities provided by their vendors. Users in heterogeneous environments would be well-advised to continue to rely upon host-based methods and give preference to those solutions that offer a central interface to manage the data migrations. In the longer term, all users should be watching and testing the maturity of network-based solutions because these solutions will transform how future migrations will be done. Network migration solutions will decrease a user's dependence upon any single vendor's storage products and will simplify migration management.

This was first published in May 2004

Dig deeper on Data management tools



Enjoy the benefits of Pro+ membership, learn more and join.



Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: