Five things that mess up your backups

Data backups are still job No. 1--and problem No. 1--for most storage managers. In this article, backup guru W. Curtis Preston describes the five most prevalent backup system problems and explains what you can do to prevent or remedy them.

This article can also be found in the Premium Editorial Download: Storage magazine: Salaries rise, but storage jobs get tougher:

Backup is still the greatest pain point for storage managers. The following five vexing backup problems can become less onerous if you use these simple procedures to improve your backup performance and reliability.

  1. Unhappy tape drives
unhappy tape drives cause more backup and restore issues than any other problem. The most common thing to fail in most backup environments is a tape or tape drive. Tape error may frequently masquerade as another problem. (For example, one backup software product often reflects a drive failure as a network timeout.) And because most environments achieve less than half of the available throughput of their drives, corporate IT buys more and more drives to meet the throughput demands of the backup system.

Modern tape drives are designed to operate at their advertised speeds, and operating them at lower speeds is what causes them to fail more often; there's a minimum speed at which the tape must move past the head to achieve a good signal-to-noise ratio. Even variable speed tape drives have a minimum speed at which they can write data. LTO-4, for example, has a minimum native transfer rate of 23MB/sec. And while few users experience the 2:1 compression ratio advertised by drive manufacturers, whatever compression rate they do experience must be multiplied by the minimum transfer rate of the drive. For example, data that experiences a 1.5:1 compression ratio being sent to a tape drive with a minimum speed of 23MB/sec makes that drive's minimum transfer rate 34.5MB/sec (23 x 1.5).

Depending on which backup software you use, you can increase the speed of backups that go directly to tape with the following: LAN-free backups, multiplexing and not using additional tape drives until you've given the initially used tape drives enough throughput. The second (and simpler) solution is to stop using tape as your primary target for backups and instead back up directly to disk. Using disk as an intermediary staging device usually gets the initial backup done much faster, and then the local (LAN-free) movement of data from disk to tape can go much faster. These backup methods will keep the tape drives much happier, they'll fail less often and you can reduce the number of tape drives you'll need to buy to get the job done.

Virtual server backup tips
There are a lot of questions buzzing around VMware backups, but there aren't a lot of problems. Most people can back up their virtual machines (VMs) as if they were physical machines, and everything works just fine. Most major backup packages have changed their pricing so that you only pay for one license for the VMware server, regardless of how many guests you're backing up.

The big challenge some storage environments face is resource contention, especially if they're doing a lot of full backups. The first thing you can do to solve this problem is to better stagger the full and differential backups across the week and month to minimize the number of backups that could occur at any one time. You should also check out the ability of your backup software to limit the number of concurrent backups on the VMware host. Finally, you should investigate your backup software's ability to do incremental forever inside the VM using features like Synthetic Full Backups from CommVault, Saveset Consolidation from EMC Corp.'s NetWorker, Progressive Incrementals from IBM Corp.'s Tivoli Storage Manager and Synthetic Backups from Symantec Corp.'s Veritas NetBackup.

If, after using these techniques, you still have resource-contention issues inside the virtual server when you're backing up its guests, you should consider more advanced methods such as VMware Consolidated Backup (VCB), esXpress from PHD Technologies Inc., esxRanger from Vizioncore Inc. or using a snapshot-based filer that's VMware-aware.

  1. Missing data
the second big problem in today's backup systems is the data that backups miss. This isn't data you tried to back up and failed to do; it's the data the backup system never attempted to back up because it simply wasn't told to. Missed backups don't generate error messages, but they can (at some point) cause an RPE--a resume processing event. If this problem isn't addressed, you can be sure that someday someone will ask you to restore something that hasn't been backed up.

Consider the following two real-life stories: One day a backup administrator was asked to restore a set of files on server hpdbsvk. According to the firm's naming convention, this meant HP-UX database server "k." The backup administrator also knew that because servers were named in alphabetical order, there were also database servers hpdbsva through hpdbsvj, and he was only backing up servers hpdbsva through hpdbsvj. Immediately, he knew he had some work to do, but soon afterward someone walked into his office and asked him to restore a database on hpdbsvk. While the data was never restored, the administrator didn't lose his job and didn't even get in trouble. How is that possible?

Real-life story No. 2: One day an administrator was asked to restore some code sitting in /tmp on an HP-UX system. The file system had disappeared upon reboot because it was a RAM file system. The customer requesting the data was furious when he found out that the backup system didn't back up /tmp. Again, the administrator didn't lose their job or get in trouble. Why not?

In both cases, the reason the backup administrator didn't lose their job was the same: documentation. Back in the days before the Web, the backup system in question used a paper-based request form users had to fill out if they wanted a system backed up. The form included a line that read "Do not consider this request accepted until you receive a copy of it in your in-box signed by someone on the backup team."

In the case of the customer who requested a restore from hpdbsvk and started fuming because it wasn't being backed up, the backup administrator asked to see the form with his signature on it. The customer didn't have the form, so the issue became what I like to call a "YP not MP"--Your Problem, not My Problem--as far as the backup administrator was concerned. As for the /tmp situation, it was excluded from backups, and the exclusion had been approved by upper management and well-advertised. (After all, the "T" in tmp stands for temporary, so why would you back up temporary things?)

Applying the paper backup request system to today's Web-based world is simple. Create a backup system request Web page that notifies the user who requested the backup that the backup is being performed. If you're using a data protection management tool, the user who requests the backup can even be notified every time the backup succeeds or fails. How's that for customer service? The Web page should also list standard backup configurations, including things like what gets backed up (or not backed up) by default.

It's also important to mention how important it is to use your backup software's ability to automatically discover and back up all file systems or databases on a given machine. If your backup software has this feature, use it; don't attempt to manually list all file systems. You're just asking for trouble and an RPE when you discover that you forgot to add the F: drive on a particular server. If your backup app doesn't have this feature, get a new one.

  1. Unnoticed trends
backup administrators spend most of their time looking at last night's backups. They want to know what failed last night and the escalation procedure for that server. Can they rerun the backup? If so, what do they do if the rerun backup continues into prime business hours? Must they notify someone?

As a result, backup administrators often don't notice if a given server, file system or database doesn't successfully back up for multiple days. Some environments where I've performed backup assessments have had servers that have gone several days--even as much as a month--without a successful full or incremental backup; and the larger the environment, the greater the problem. At one customer's site where they back up 10,000 systems, more than 1,000 systems went four days or more without a successful backup of any kind.

Servers that go several days without a backup are obviously at greater risk than others. If a backup administrator was aware of such a trend, they might do a number of things, such as cancel less important backups so that the server that hasn't backed up for several days can be given more resources. At a minimum, the storage admin may set the priorities on the backup system so that a server that hasn't backed up for several days is more important than other servers.

  • Here are some examples of other trends that are important to detect:

    • Servers backing up significantly more data than they used to back up

    • Tape libraries/disk devices approaching capacity

    • Tape and disk system throughput numbers

Most backup products don't provide the kind of tools necessary in their base product to see this kind of information. The solution is a relatively simple one, but not an inexpensive one: Buy a data protection management tool. There's a reason a whole industry has grown around such tools, and it's difficult to properly manage a backup system without one.

What role does deduplication play?
Without deduplication, the use of disk in the backup system is relegated to storing only one or two nights' worth of backups in a process known as disk staging, as backups are staged to disk before they go to tape. This helps backups but doesn't help restores, as most restores will still come from tape.

Dedupe allows you to store several weeks or months of backups on the same disk that was previously storing only one or two days' worth of backup. Keeping more data on disk allows for much faster restores for all data, not just the backups made in the last few days.

Deduplication can also help you get data offsite without shipping tapes. Because the dedupe system stores only the new, unique blocks every night, backups can be replicated offsite, allowing you to have onsite and offsite backups without touching a tape.

  1. Overuse of custom scripts
customization comes in a variety of flavors and can be a good thing. It can make your backup system do something it wasn't originally designed to do, allowing you to work around limitations. But customizing your backup process can also create extra work and make things much more complex.

Backup administrators good at shell or batch scripting can create programs that help them automate certain tasks. One customer I visited had 150 custom scripts written around their backup system. The problem with this kind of customization is that it's hard to maintain and even harder to pass on to the next backup administrator. Administrators who create too many scripts may find themselves stuck as "the backup person" because no one wants to take on and maintain all of those custom scripts.

Another way customization manifests itself is in unique backup configurations. Instead of having a standard backup configuration for everyone, some environments create custom backup configurations for each customer that requests one. For example, "For this server, we're going to back up only the F: drive and we'll do it only on Thursday nights from 3:00 am to 4:00 am." Besides making things much more complex, this kind of customization also goes against the way most backup software is designed. Backup software is designed to share resources and automatically send things to the right resource as it becomes available and as priorities dictate. Unique backup configurations drastically reduce the overall utilization of all resources by not allowing the backup software to do its job.

  • Overcoming this problem is relatively simple: Create standard backup configurations and stick with them. The following is an example of a standard for file-system backups:

    • All systems back up all drives

    • *Temporary Internet Files*, C:Temp, *.mp3 files are always excluded

    • All systems receive a full once a month

    • All systems receive a differential/cumulative incremental/level 1 once a week

    • All systems receive an incremental once a day

    • Fulls and differentials will be distributed across the week/month as dictated by the system load

    • All backups occur between 6:00 pm and 6:00 am

Deviations from this standard must be justified by business reasons and approved by a business unit manager who will receive a chargeback for the extra cost involved in such customizations.

Regarding custom scripts, the best thing to do is to consult the forums and mailing lists for the backup software you're using to find out if anyone has discovered another way to meet your requirement without custom scripting. Software updates often fix such problems found in earlier versions, but people continue to use their old ways because it's what they know.

Finally, if the software you're using can't be made to do what you want it to do without all of those custom scripts, perhaps it's not the right backup software for you and another backup application would do what you need it to do out of the box. Although changing backup software packages should be considered a last resort, it may actually be the best thing in some cases.

  1. Unencrypted data
news reports of lost or stolen tapes have become more frequent. Most states now require public notification of such a loss. Regarding personal data, however, there's a moral obligation to keep it safe that goes beyond the risk of public exposure. According to, someone steals a person's identity every 79 seconds, and then opens an account in that name and goes on a buying spree. And a Gartner Group study reveals that 1 in 50 people have suffered from some type of identify theft. Given the incredible popularity of this crime and the huge impact it has on those targeted (you could be the next victim), do you want it to be your backup tape that helps some identity thief?

There are two solutions to this problem. First and foremost, encrypt your backups. There are a number of ways to encrypt data, such as using backup software encryption and encryption engines built into fabric switches, tape libraries and disk drives. The second solution is to not ship tapes offsite but to use a disk-based deduplication backup system that replicates your backups offsite. If you still want to make tapes, make them at your offsite location.

In my opinion, anyone in management who refuses to fund the security of backups should be relieved of their duties, and very well could be if things go wrong. Make sure that person isn't you. If your company is shipping unencrypted backup tapes with personal information on them, you should immediately notify your superiors in writing of the seriousness of this problem and request a project to solve it. Document your request and the response, especially if it's a negative one. Continue to make yourself a pain until they solve the problem or give you another job; you don't want the job of enabling identity thieves.

In sum, while some of these solutions may be simpler than others, a lot of what you can do to make your backups better comes down to understanding the limitations of what you're using and knowing how to document and improve your backup processes. Sometimes it pays to spend money on specialized backup tools that provide a clearer view of your backup environment.

This was first published in November 2008



Find more PRO+ content and other member only offers, here.



Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:



  • Flash technologies remain hot in 2016, experts predict

    Experts predict solid-state technology will remain hot in 2016, leading to the demise of high-speed hard disk drives, as ...

  • Tintri VMstore T5000

    Like all of its VM-aware storage systems, Tintri’s first all-flash array -- the Tintri VMstore T5000 -- allows admins to bypass ...

  • SolidFire SF9605

    The high-capacity SolidFire SF9605 uses SolidFire’s Element OS 8 (Oxygen) to deliver new enterprise features such as synchronous ...