Big files create big backup issues
This article can also be found in the Premium Editorial Download "Storage magazine: How to plan for a disaster before a software upgrade."
Download it now to read this article plus other related content.
Consistency and timeliness
Some backup systems are better at handling issues than others, but all will have difficulty when faced with a single file system with millions of files or hundreds of gigabytes of data to back up. Although the point-in-time consistency across different files in large file systems isn't always required, it can be critical; during an eight-hour backup, while the application is running, there may be inconsistency problems with files the application uses and some in the backup. The prime solution to the problem of consistency is to cheat the constraints of time by creating a snapshot copy of the data to be backed up. Leveraging the technology included in many storage arrays and OSes, a snapshot-based backup can freeze the data set at a point in time and copy it to tape at its leisure. This technology ensures that the entire set of files to be backed up is consistent with respect to changes over time. But snapshot technology isn't a native component of a backup application, and the particular type used must be supported by the backup system or custom scripting is required.
The problem with big backups
No matter how you slice it, backing up big file systems is a problem:
applications need a few moments to examine each file and determine if it should be backed up or not, and another moment to store a record of each backup in the database. Multiply these moments by a few million, and they add up quickly.
Massive files generally can't be backed up in parallel, and traditional backup approaches copy them in their entirety even if just a few bytes have changed.
Even if you can wait for the backup to complete, the backup copy might not be consistent with the latest copy of the file.
Data is backed up so that it can be restored, but many methods for speeding backups make recovery time unacceptably long.
A massive number of files
Sheer numbers can overwhelm any backup product (see "The problem with big backups," above). Sean O'Mahoney, manager of client/server information systems at Norton Healthcare in Louisville, KY, saw his Meditech Electronic Medical Record (EMR) file server grow to contain more than 25 million files in 1.3 million directories. "It took almost five hours just for Windows to count the files," explains O'Mahoney, "but we have trimmed the backup time for this half-terabyte LUN to around three hours." The fix was a simple one: Ignore the files and dump raw disk blocks to tape. Although it lacks an index of files, this solution fits fine because all of those files are part of a single massive app.
This was first published in May 2008