Regardless of what type of enterprise data storage media you house your data in, data migration can be a complicated...
process. According to Bloor Research, an independent IT research and analyst firm in Europe, more than half of all data migration projects experience cost overruns or schedule delays. This occurs despite companies spending approximately $5 billion a year on data migration.
I know of one midsized Boston-area non-profit organization (with a handful of offices spread across the country) that made several data migration mistakes as it attempted to move its library to a new, comprehensive, open-source library management system. The company wanted to leave behind an old, inflexible proprietary system and was assured that data migration was straightforward, even easy.
Data migration, the librarians discovered, is never as easy as the IT experts promise. It turns out they made common mistakes, starting with not planning for enough time to do the data migration in the first place. Instead, they rushed to move data, which led them to make careless mistakes, such as using the wrong delimiter to separate data elements. This was easy to rectify, but each mistake added a couple of days to the migration project.
Other mistakes were more serious. For example, they failed to fully think through the different ways they could use their data in the new system. They also wanted to cull duplicate records before the migration but ran out of time before completing the task. This caused them to migrate some duplicate data.
However, the company did do a few things correctly. After the first batch of data was migrated, for example, they tested it before doing any more. "Most IT managers consider data migration a routine chore and give it scant attention," said Mark Teter, chief technology officer at systems integrator Advanced Systems Group Inc. An organization can simply copy the files from one volume to another or use data replication tools like EMC Corp.'s Symmetrix Remote Data Facility (SRDF), IBM Softek's Transparent Data Migration Facility (TDMF) or even a host's native volume manager. And there are a variety of extract-transform-load (ETL) tools that can help simplify the data migration process.
Why data migrations fail
But, Teter noted, data migrations can fail for a number of reasons:
- The copy process fails.
- The server crashes.
- The target storage device crashes or becomes unreachable.
- A minor data center issue occurs (array failure).
- A major data center issue occurs (complete systems failure).
- Bad data from the start or gets corrupted during migration.
Teter's biggest worry with data migration is the risk to data integrity. "Organizations have to strive to ensure data integrity throughout the entire migration. Otherwise, the organization cannot be confident the data is current, accurate or complete," he said. In the worst cases, the organization can't even identify with certainty the last valid copy of its data.
Data migration: Five best practices
To protect data during the migration, you should follow these five best practices:
- Understand, select and locate data to migrate at the outset. Know what data you are migrating, where it resides, what form it's in and the form it will need to take when it arrives at its destination.
- Extract, clean, transform and deduplicate the data. All data has problems. Use data migration as an opportunity to clean up the data.
- Move the data in a systematic way by enforcing data migration policies. For example, restrict data migrations to overnight hours when network usage is low and won't interfere with your project.
- Test and validate. Test the migrated data to ensure it's accurate and in the expected format. Without testing and validating the migrated data, you can't be confident in its integrity.
- Audit and document the process. Regulatory compliance requires you to document each stage of the data migration process and to preserve a clear audit trail of who did what to which data and when.
So, how well did the non-profit library do following Teter's steps? As for the first step, they knew where the data was but failed to think through how they would want it when it arrived at its new destination. They also failed to clean their data completely, which only became apparent when they tested the data. When they didn't complete the first two steps correctly, they were set up to fail.
Luckily, a solid testing procedure helped them recognize trouble early. None of the mistakes proved fatal; they only slowed down the overall migration and increased the cost.
BIO: Alan Radding is a frequent contributor to SearchStorage.com.