Manage Learn to apply best practices and optimize your operations.

Data migration tips

Data moves. Or, it has to be moved when you're refreshing array technology, merging storage resources with an acquired company or shifting data around to more economical tiers. Data migration is a common task, but it's often a difficult one. We describe some technologies and tools to ease the pain of data migrations.

Data migrations can be complicated, time consuming and happen all too frequently. Here's how to simplify the process.

whatever storage media your data sat on a year or two ago, chances are it's moved since then and will likely move again soon. There are plenty of reasons why that data may have to move: maybe the lease is up on an old Fibre Channel (FC) SAN and you're upgrading to new hardware, you're moving to a new data center or you need to move older files to less expensive storage to keep up with soaring data demands.

Data migration may be a common chore, but that doesn't mean it's easy. Disk (and tape) drives are linked to applications and business processes through servers, routers, switches, and storage and data networks, not to mention access control policies and other layers of security. The more complex your environment, and the more data you're managing, the less likely you'll be able to use simple copy functions built into operating systems or arrays to pull off your required migrations.

Migrating data involves a lot more than just ripping out one storage cabinet and plugging in another. The following tips will help make your data migrations go more smoothly.

  1. Understand your mapping.

before migrating any data to new storage arrays, be sure you understand how servers are currently mapped to storage so you can re-create those mappings in the new environment. Otherwise, servers may not reboot correctly after the migration.

To avoid unplanned outages, administrators should "understand the true end-to-end relationships among the platforms you're moving across," says Lou Berger, senior director of products and applied technologies at EMC Corp. This is especially important if, for redundancy purposes, your storage infrastructure is a multipathing environment where hosts may boot from alternate arrays if the primary array is down. If administrators fail to check the parameters on the host HBAs to ensure the pathing software is set up correctly, he says, the host may not reboot properly.

Administrators also need to be sure the host will discover storage resources in the proper order after a migration. "Some applications and databases are sensitive to the order in which they discover volumes," says Berger, because an application boot sequence might be on one LUN and its data on another.

Administrators may not even know a server exists until it fails to reboot after a migration, "because oftentimes people install them and forget them," says Ashish Nadkarni, a principal consultant at GlassHouse Technologies Inc., a Framingham, MA-based consulting and services firm. While storage discovery and auditing tools are valuable, he says, none of them can capture 100% of the misconfigurations that can cause a problem.

  1. Gather metrics.

jalil falsafi, director of information technology at electronic components distributor Future Electronics Inc. in Montreal, had to migrate data from IBM Corp. DS4100 and DS4300 entry-level arrays to Hewlett-Packard (HP) Co. StorageWorks XP24000 arrays during intervals of relatively slow network traffic over a period of six weeks. That required an in-depth understanding of the capacity of his SAN and when other functions, such as a database backup, would increase network loads.

"You have to scope how many LUNs, or logical disks, you're going to migrate. You have to know their size; you have to know the speed of your array; you have to know the speed of your switch as well as 'hot spots' when traffic loads are very heavy," says Future Electronics' Falsafi. "You need to take the worst-case scenario into consideration, not the average or the minimum."

Falsafi used monitoring tools available in FalconStor Software Inc.'s IPStor network storage server, as well as host- and array-based utilities, to gather those metrics.

"Migration can have a severe impact on overall system performance," says Chris McCall, product marketing director at LeftHand Networks Inc. (which is being acquired by HP). "It becomes a fairly nasty issue [with questions such as] 'Is my controller performance maxed out already or close to maxed?'" He warns that overloading a storage or data network with migration traffic can reduce the availability or performance of not only the data being migrated, but all of the data on the network.

Measuring network bandwidth needs before performing a migration is a chore that can be easily overlooked, says Greg Schulz, founder and senior analyst at StorageIO Group, Stillwater, MN. "Unless you know for sure, go out and doublecheck to see what the impact is going to be," he says. Once an administrator is sure how much bandwidth should be allocated to the migration and when it will be available, the bandwidth can be managed with tools such as optimization technologies, replication optimizers and traffic shapers, he adds.

Migration vs. replication
While "migration" AND "replication" are often used interchangeably, their textbook definitions--and the tools required to perform them--are quite different.

Migration means moving data from one platform to another, without leaving the original data in place. It's used when upgrading hardware, moving to a new site, creating a test database, or moving a virtual machine to a new physical server with more processing or network resources.

Replication means creating a second set of data and synchronizing any changes made between the original and the copy so that either set can be used at any time. It's often used for backup and recovery, for continuous data protection (CDP) or in high-availability architectures.

Users may only need their replication tools to support a single vendor's storage arrays. But multivendor support is usually more important for migration tools because the data is often being moved to a different vendor's storage platform.

  1. Downtime isn't so bad.

some vendors claim they can migrate data without causing any downtime for applications. But some observers, such as Gary Fox, director of national services, data center and storage solutions at Dimension Data, recommend building in some downtime because it's tricky to migrate data and ensure its consistency while doing a migration during regular production hours. If possible, he suggests, do migrations during non-business hours "so you're not under so much pressure" in case something goes wrong.

"I'm kind of old school in this regard," he adds.

Managing migration in virtual environments
Server virtualization can make data migrations more of a challenge, with the prospect of migrating virtual machines (VMs) among physical servers, as well as migrating the data the VMs use. The system images, and the data needed by applications, may also have to be converted into new formats for use in the virtual environment.

Both virtualization vendors and third parties provide tools to perform such functions. VMware Inc.'s Virtual Machine File System (VMFS), for example, gives multiple VMs shared access to pools of clustered storage and, says the company, provides the foundation for live migration of virtual machines and virtual disk files.

VMware's VMotion also allows customers to perform live migrations of multiple VMs among physical servers with no downtime. It also provides management capabilities such as the ability to prioritize migrations to ensure that the most important VMs have the computing and network resources they need. VMware's Storage VMotion allows customers to migrate the data used by VMs among arrays with no downtime, but requires more manual work than is now required with VMotion, says Jon Bock, senior manager of product marketing at VMware. Either VMware, or storage vendors writing to VMware's API, will provide more automation tools for Storage VMotion in the future, he says.

Microsoft Corp. doesn't offer live migration of virtual machines in the initial release of its Hyper-V server virtualization technology, but says that capability will be included in the next release, which isn't expected until next year.

VMware recently announced vStorage, which includes new APIs designed to enable storage vendors to give their storage management tools better visibility into and integration with the VMware virtual environment, and to provide better visibility through the VMware vCenter Server management interface into how VMs are using storage.

VMware recently unveiled updates for Storage VMotion such as the ability to migrate volumes from thick- to thin-provisioned devices, to migrate data from Raw Device Mapping volumes to Virtual Machine Disk Format (VMDK) volumes, and to better integrate storage management into the VMware vCenter Server management interface.

The latest release of Symantec Corp.'s Backup Exec provides support for heterogeneous migration and replication of data in both VMware ESX and Microsoft's Hyper-V environments, says the company. Other third-party offerings include DataCore Software Corp.'s "Transporter" option for new licenses of its SANmelody and SANsymphony software that allows administrators to migrate disk images and workloads among different operating systems, VMs and storage subsystems.

Among other updates to its backup software, Hewlett-Packard Co. recently extended the Zero Downtime Backup and Instant Recovery features of its Data Protector software for VMware VMs, giving customers "zero impact backup of mission critical application data residing on virtual machines," according to the company.

  1. Watch for security leaks.

when migrating data among arrays from various vendors, permissions and security settings can be left behind, making the data vulnerable to theft, corruption or misuse. Even moving data among file systems--say, from NTFS to NFS--can result in a loss of permission and security settings, says GlassHouse Technologies' Nadkarni. "If you're moving ... from Windows to Unix or Unix to Windows, you have to be very, very cautious because more often than not the user permissions are completely destroyed," he says.

The easiest way to avoid security issues is to do a block-level rather than a file-level migration. That way, the migration is performed at "a level below the file system, so the host doesn't even see the difference" in the data, says Nadkarni.

It's possible to maintain security settings in a file-based migration, he notes, if the source and target systems lie within the same authentication or authorization domain in a service such as Microsoft's Active Directory. Some file-based migration tools also have the intelligence required to maintain such security settings, he notes.

Digging into the details of how a file copy utility works is important, says StorageIO's Schulz. "What does it copy? How does it copy? Does it simply copy the file, or copy the file as well as all other attributes, meta data and associated information? Those could be the real gotchas if you haven't brought along all of the extra permissions and access information. Dig into the documentation, talk to the vendor or service provider, and understand what type of data is being moved, and how it is to be moved."

  1. Virtualize carefully.

host-based storage virtualization, which is available from a number of vendors, is a fairly reliable way to accomplish such cross-vendor migration. Future Electronics' Falsafi says the host-based virtualization provided by the FalconStor software made the actual migration painless. "We zoned the XP with a Fibre Channel switch so [it] came up as another set of hard disks to the IPStor. We created a mirrored LUN on the HP StorageWorks XP24000 array and did synchronization. Once the primary array and the backup LUNs were synchronized ... all we did was flip the switch from the primary to the backup, and the backup became the primary," he says.

But not all virtualization is created alike. Some virtualization appliances can add to the work administrators have to do, or cause application outages while administrators update drivers or the volume managers used to manage the storage, says Nadkarni. For example, he says, a virtualization appliance can cause problems by changing the SCSI Inquiry String used to identify a specific array. If the appliance changes the inquiry string, the volume manager used to manage the storage must be reconfigured to recognize the new string, he says, or applications that depend on that volume may not run properly. Storage admins should ask virtualization vendors whether their products are "completely transparent," says Nadkarni, or whether their installation will require changes to servers or other components that could cause application outages.

Nadkarni also suggests staying away from virtualization appliances that require an array or entire storage network to be taken out of service to virtualize (or unvirtualize) storage resources. Some appliances "may require you to take an outage to reconfigure your network or to take an outage on the entire storage array, to insert the appliance," he says. They can also require the administrator "to change things on the host" such as drivers, multipathing software or volume managers.

  1. Thin provisioning.

thin provisioning helps preserve storage space by only taking up space on a disk when data is actually written to it, not when the volume is first set aside for use by an application or user. This eliminates waste when the application or user doesn't wind up needing the disk space. However, many data migration tools write "from block zero through to the very last block" of a volume on the target system regardless of which blocks are actually being used, nullifying the benefits of the thin provisioning a user had applied on the source array, says Sean Derrington, director of storage management and high availability at Symantec Corp.

File-system utilities or host-based volume managers "that are intelligent enough to figure out if the block is being accessed or not" before deciding to write to it can help circumvent this problem, says GlassHouse Technologies' Nadkarni. Block-level migration techniques that are good for preserving the security around data aren't good for preserving thin provisioning, he says, "because they write to the entire volume."

Migration toolkit
Migration can be done on the host or on the network, at either the block or file level, or on the array itself at the block level. Users can choose from hundreds of tools ranging from simple utilities supplied with storage arrays (most useful for migrating data among the same vendor's arrays) to open-source software or complex suites that could cost thousands of dollars.

Host-based software tools are often effective at migrating data without downtime. Some support only Windows file systems, while others support multiple operating systems at either the file or block level. Among the host-based file-level tools is the open-source rsync, which synchronizes files across Unix systems. Many operating systems already include host-based, block-level migration tools. Among the network-based, file-level migration tools are virtualization appliances such as EMC Corp.'s Rainfinity. Network-based, block-level migration tools include Brocade's Data Migration Manager, an application that runs on Brocade's DCX Backbone high-end switch and can migrate as many as 128 LUNs in parallel at speeds of up to 5TB per hour, according to the vendor.

Among the relatively few players in the array-based block-level migration tools is Hitachi Data Systems' Universal Replicator software, which can migrate data among Hitachi arrays and those from other vendors.

Many vendors use file systems to mask the complexity of moving data among multiple platforms. Among them is Ibrix Inc.'s Ibrix Fusion FileMigrator, which adds data tiering capabilities to its Ibrix Fusion 4.2 file system. FileMigrator, says the company, allows IT administrators to set policies and move data according to usage patterns. FileMigrator "addresses a huge pain point" by performing data migration "as a background process under the covers based on policies," says Terri McClure, an analyst at Enterprise Strategy Group, Milford, MA.

  1. The devil is in the (software) details.

something as simple as different patch levels applied to software in the old and new environments can cause server crashes after a migration. Nadkarni says migrating among storage arrays also requires uninstalling the previous vendor's software from servers and installing the new vendor's. Not only does this require time, but it could cause instability if components left behind by the incomplete uninstall of older software conflict with other applications.

  1. Build in enough learning time.

if there's a common theme to these tips, it's that storage migration is complex and full of "gotchas" that can compromise application uptime, reliability or security. "The key to a successful data migration is not having any unknowns in your environment," says Nadkarni. "The more unknowns," he adds, "the bigger the risk." Storage administrators often underestimate the time required to learn their new storage environment and what it takes to migrate data to it successfully.

Besides the technical challenges involved in each data migration, it's also important to clearly understand the business objectives for the migration, says Terri McClure, an analyst at Enterprise Strategy Group in Milford, MA. For example, what's the ROI of the data migration? Is the aim to migrate seldom-used data to less expensive media to reduce disk and power costs, to decrease the data's RTO or both? If so, it may be possible to create automated storage policies to avoid an endless round of manual migrations, she says.

"To do anything successfully and seamlessly you have to do a lot of preparation, thorough preparation," says Future Electronics' Falsafi. "That means analysis, data gathering, trend analysis. For me, it's very vital you get this information and know exactly how your systems behave before you do anything. The cost of an unsuccessful data migration--interrupted business operations, and a loss of revenue and credibility--far outweighs the additional amount of time it may take to thoroughly understand your source or target environments."

Dig Deeper on Storage migration