|Data migration checklist|
There are many methods at different price points to move data. Some organizations may be able to simply use the move command that comes with every operating system. Others may need standalone utilities and network- and host-based approaches to get data from point A to point B. What's the best way to move data? To determine which method suits your needs, consider the following:
- Type of application and data
- Impact on application performance
- Storage infrastructure
- Network throughput
- CPU and memory consumption
- Affected users
While some applications allow outages, many times it's simply not practical to shut down an application to perform the migration. The utility you choose should be able to monitor the application, increase and decrease the speed of the data migration or even stop the data migration if the application becomes exceptionally busy. This will allow server resources such as CPU and memory to be diverted from the data migration utility to the application.
A specialized migration tool isn't necessarily more effective in moving data. Host-based software such as Veritas Software Corp.'s Volume Replicator is often used to avoid vendor lock-in. However, host-based software can be costly and difficult to manage, depending on the number of servers, operating systems and arrays involved.
Array-based utilities such as EMC's Symmetrix Remote Data Facility (SRDF) and IBM's Peer-to-Peer Remote Copy (PPRC) tie users to a specific vendor's hardware, but can be administrated by a smaller, well-trained staff with minimal intervention required at the server level during the migration process.
Click here for a data replication software table (PDF).
|Special types of data migrations|
Network-based tools such as FalconStor Software's IPStor and DataCore Software's SANsymphony offer a vendor-neutral approach from an operating system and storage perspective. Yet these approaches sometimes make administrators uneasy because of the time required to set up these roducts.
Before a network-based migration commences, an administrator needs to set up zones, allocate LUNs on the new arrays and reboot servers so they can discover the new volumes on the new array. The amount of risk that something will go wrong correlates to how large the networked storage environment is and how well it's maintained and documented. Poorly maintained and documented storage area networks (SANs) may require weeks--if not months--to identify, schedule and verify each server's access to volumes on existing arrays, and also verify their access to volumes on the new arrays after they have been allocated. Migrating data from one database format to another presents a set of additional problems (see "Special types of data migrations").
Host-based replication technologies exist for nearly every major operating system platform, including mainframe, Novell, Windows and most flavors of Unix. The free utilities that ship with the operating systems should only be used with offline applications or files. Look for a third-party solution if you need to maintain application availability while moving data to a new storage location. Products differ in the number of steps required to perform the data migration and how they manage the process.
There are a number of good reasons to use host-based technologies for data migrations such as:
- Lack of money to purchase replication software
- Inexperience with array- and network-based solutions
- Comfort level with existing migration techniques
- Integration with existing databases or a mix of different vendor's storage arrays
- Migrates from anything (internal or external disk) to anything
- Inexpensive if used with existing OS utilities
- Can change volume characteristics during migration
- Requires root-level access to each server
- May need to install software in addition to migration software
- Need to uninstall software following the migration
Tools such as Veritas' StorageCentral and Storage Reporter track data usage and profile storage resources--information essential for a successful data migration. Once the migration begins, Veritas' Storage Foundation analyzes the amount of disk space that will be saved and shows a progress bar as the data is migrated.
Look for products that monitor network traffic during data migrations, and in the event of a network slowdown, store all source changes and transmit them when possible. They should also perform resyncing operations to get the target data back in sync with the source data. And ensure the tools have a central management console from which to manage the migration.
|Choose the right data migration approach|
For users who want to avoid the pain of installing and configuring software on each server and whose arrays are all from a single vendor, look no further than the utilities natively offered by array vendors. EMC, Hewlett-Packard Co. (HP), Hitachi Data Systems Inc. (HDS) and IBM Corp. enable data migrations between their arrays with minimal intervention on the hosts. These products enable the movement of data between like-arrays while applications are running, regardless of the OS accessing the storage. Yet none of these array-based operations should be confused with point-and-click operations.
Administrators still must do the upfront work. For instance, prior to using EMC's SRDF, administrators must complete a number of tasks, such as verifying that the microcode levels in each array are the same. The SRDF software must be purchased and licensed for both the existing and the new array. The LUN sizes on the new array must be configured to exactly match that of the existing array. Also, as a rule of thumb, almost every array-based approach requires that the source and destination array must be from the same vendor and of the same product line.
But there are a growing number of exceptions to this rule. For instance, data migrations may be done between different generations of EMC Symmetrix arrays as long as each generation's microcode is the same. A migration may also be done from an EMC Symmetrix to an EMC Clariion using Clariion's SAN Copy utility. The SAN Copy utility also enables Clariion arrays to pull or push data to or from any vendor's array. HDS is unique among storage vendors in that it offers the ability to migrate data between its Lightning and Thunder models because they both ship with the same family of code, even though the array models differ.
Monitoring and managing the progress of the migration requires the use of a console provided by each array's vendor. For example, an EMC Symmetrix requires use of its ControlCenter product; for EMC's Clariion arrays, its Navisphere Management Suite must be used. Similarly, on HP arrays, the OpenView Storage Operations Manager is used to manage and monitor the progress of its Continuous Access Data Migration software.
EMC's SAN Copy and HP's Continuous Access reflect an emerging trend toward heterogeneous array support. Even though the software runs on a specific vendor's array, it reflects an increasing willingness by traditional hardware companies to migrate data to and from other vendor's arrays.
Despite the maturity and success of host- and array-based solutions, setting them up is time-consuming. So some companies are experimenting with network-based solutions to simplify and expedite the migration process. Using a network-based appliance requires the same amount of effort as using either an array- or host-based approach. But once the network appliance is firmly entrenched, the pain of future data migrations is eased considerably.
In reality, most users aren't ready to abandon their current storage infrastructure design and move to a network-based solution until standards are firmly established and widely adopted. Also, because virtualization products are relatively new, many companies are hesitant to get locked into a single vendor's solution.
Setting up a server to become a migration appliance is a multistep process. For instance, to use FalconStor's IPStor, the storage administrator needs to designate the hardware to host it. This is usually a general-purpose, off-the-shelf server that supports Linux. Next, a user needs to install and configure the FalconStor software on the server, which is now a migration appliance. Once complete, the appliance needs to be configured to see the existing and new arrays, and then must be enabled to perform the migration. Once the migration is done, the servers and storage infrastructure need to be reconfigured to permit the servers to see the storage on the new arrays. Only after all of the servers can access the new storage can the migration appliance be pulled out. It's a long, tedious process, but it only has to be done once.
Users in homogeneous storage environments should continue to use the utilities provided by their vendors. Users in heterogeneous environments would be well-advised to continue to rely upon host-based methods and give preference to those solutions that offer a central interface to manage the data migrations. In the longer term, all users should be watching and testing the maturity of network-based solutions because these solutions will transform how future migrations will be done. Network migration solutions will decrease a user's dependence upon any single vendor's storage products and will simplify migration management.
- Storage Optimization Analytics: An Overview –Accenture