This article can also be found in the Premium Editorial Download "Storage magazine: Tips for unifying storage management."
Download it now to read this article plus other related content.
Today, the most common backup design is a tape-based system that's been enhanced with disk. But those who are willing to rethink things from scratch are examining other ways to improve their backup environment such as replication, object-based storage, real-time protection of data, protection of data in its native format and using systems that perform incremental backups forever.
Cutting the tie to tape
Many of the challenges with most backups stem from reliance on tape. Granted, tape drives are faster and more reliable than ever before, but tape is still a sequential-access medium that offers access times in seconds, instead of the nanosecond access times disk delivers. Tape is also an open system easily infiltrated by contaminants, unlike disk drives that are sealed at the factory. A tape drive can reliably write millions of bytes per second at a relatively low cost. However, due to its sequential operation, tape will always be slower to access and less reliable than random-access disk.
One of the greatest advantages of tape over disk is the ease with which tapes can be sent off site, which is
For example, let's assume that the process of creating and identifying the tapes to go off site has been completely automated. The process of moving a tape from a library into a container to the off-site facility and back again is very labor-intensive. While this work is routine for the most part, someone's got to do it and that labor is expensive. I've seen companies where a dozen people's sole responsibility was to manage such a backup process. It's also important to mention that in each step of this process, there is a chance for human error.
It's a fact that in many environments, backup is neither automated nor effortless. Many people spend many hours a day ensuring that their backups are complete. This effort is required for many reasons. The first is that the process of performing nightly incremental and occasional full backups requires a lot of processing power, network bandwidth and is usually directed at a target that's not perfectly reliable (tape). Every part of the process is capable of screwing up the backup.
If everything works and all the backups are completed, they should be copied, instead of just sending the originals off site like many companies do. But most environments have spent so much time and effort making sure that the backups are completed, there's little if any energy, time or capacity left in their system to make copies. Yes, many backup software products now allow you to create both the original and the copy simultaneously. But according to an informal survey of my clients, few companies are taking advantage of this important functionality. Therefore, most companies are sending their originals off site.
This means that they must wait for a tape every time they do a restore. While this is acceptable for low-priority systems, it's completely unacceptable for a high-priority critical application. But this is the status quo at many companies. That is, of course, until the first time they try to do a major restore and it goes horribly wrong. This happened at a company I was talking to last month. What should have taken hours took days, and the CIO is now looking for a new job.
Even when restores are successful, it's always a bad day when you have to restore something large. Unless you've adopted some of the technologies discussed later in this article, it often means hours of downtime, and it's rare that the system is restored up to the point of failure. There's almost always a gap of time that isn't restored. That gap translates into lost work for everyone using the system and is a huge loss of money in a large company.
And consider how backups affect applications. Many companies have grown used to slow access or no access to applications during backup and recovery time. While this may have been fine in days past, with today's 24x7 global operations this is no longer acceptable. Your work force expects uninterrupted access to all applications at all times.
This was first published in February 2004