This article can also be found in the Premium Editorial Download "Storage magazine: Managing data storage for remote employees."
Download it now to read this article plus other related content.
|What a backup and restore management plan should contain|
|Staffing requirements. The BRMP should|
| spell out staffing needs. What's required depends on several factors including backup schedules, backup windows and service level agreements with customers. Many large enterprises will require some form of coverage on a 24X7 basis. A 24X7 coverage model requires a minimum of seven to 10 people to adequately staff all shifts throughout the week, including management support.
Motivating staff isn't part of the BRMP, but a critical area of concern. Typically, the backup and restore function is delegated to a junior-level system or backup administrator. As part of the management plan, a training program should be implemented to ensure the person(s) responsible for successful backups are up to speed on the latest enhancement and functions of the systems and backup software. Consider implementing a mentoring program in which senior-level personnel work closely with their less experienced co-workers to coach them and help with their career development. Another motivational initiative would be to provide a bonus structure based upon a successful backup percentage, or elimination of unplanned outages.
Define operational procedures. The management plan must contain the procedures for monitoring the backup infrastructure, for ensuring successful backup and recovery job completion, for complying with the change management process and for testing the restore process.
What should be monitored? All components of the backup infrastructure must be monitored to quickly identify and resolve any problems that will surface. These components include backup and restore job status, backup servers and clients, automated libraries, LANs, network-attached storage (NAS), storage area networks (SANs), backup networks and the storage itself. Most of these components may be monitored via in-band and out-of-band communication methods such as real-time backup and restore activity monitors, error and event logs and SNMP traps aggregated up to enterprise-level frameworks.
What's an appropriate monitoring frequency? Unfortunately, many organizations only monitor their backup job status daily. Typically, this is done in the morning. The disadvantage of this approach is that typically, the backup window has closed and the backup jobs can't be restarted. If a restore is required, the enterprise would potentially lose a full day's worth of changes, resulting in lost revenue and productivity. To ensure the highest levels of backup and restore success rates, the backup software should be monitored during the entire backup window.
What actions should be taken in the event of a backup or restore failure? What information must be captured to facilitate root cause analysis? When should the backup be restarted? These operational processes must be documented and tested. The process should also contain a technical and business escalation procedure defining whom to contact at the appropriate time.
How are changes to the environment recorded? A solid backup and restore management plan will outline a change management procedure requiring signature authority from all stakeholders. This process should be invoked for changes such as adding or removing backup clients, upgrading backup servers, adding additional capacity to storage subsystems, reconfiguring backup networks and capturing software/microcode revisions.
Are the restore procedures documented? How often is the restore process tested? It's imperative to have an up-to-date disaster recovery plan. An effective plan captures the actions to take for multiple levels of incidents ranging from a server crash to a full-blown disaster declaration. Test the plan at least every six months.
To minimize the effects of unplanned downtime and to maximize data availability and recoverability, smart IT organizations must create, implement and maintain a BRMP. A BRMP provides a framework for understanding the backup environment, a vehicle for documenting the standard procedures to be followed for backup and restore operations and a repository for the corporate best practices and backup policy definitions that have been implemented. Here's how to create a BRMP plan along with tips for best practices.
With myriad storage methods such as servers, disk and tape storage subsystems, storage area networks (SANs) and network-attached storage (NAS) topologies, successful backup and restore management can be a daunting task for even the most seasoned storage professionals. Every day, administrators wage a war against data corruption, virus attacks, network problems and a host of other incidents in a valiant effort to keep their mission-critical systems up and running.
Additionally, enterprise organizations face an array of other storage challenges, such as squeezing more data into shortened backup windows while meeting demanding service level agreements and performing ongoing backup infrastructure capacity planning. The 24X7 data access requirements of database and Web-based applications are forcing many organizations to rethink their traditional backup and restore strategies. Not only must these applications be backed up while online, but in most cases, they must be restored in less than half the time it takes to back them up.
Today's complex environments demand highly skilled IT professionals to ensure the backup solution is working as designed. Unfortunately, managing the backup and recovery environment is a job no one really wants. It can be a thankless job with high expectations for success and no tolerance for failure. A general perception among administrators is that no one has ever been promoted for ensuring successful backups. And, sorry to say, the opposite is all too true: Jobs have been lost as the result of unsuccessful backups.
Without proper backup schedules and retention policies, backup media can't be used efficiently, resulting in increased costs for data cartridges, automated libraries and off-site storage. Lack of media management policies can also result in lost or damaged backup media, impacting data availability and recoverability.
The following seven steps can help you create a BRMP.
Step 1: Understand the backup environment
Before a successful BRMP can be created, it's important to conduct a thorough assessment and inventory of the existing backup environment, including backup servers and clients, automated libraries, backup media and storage networking components. At a minimum, the following questions should be answered:
- Is the current infrastructure designed for backup and recovery? Most backup solutions are designed to move a fixed amount of data to backup media within a given backup window. While this is certainly an important consideration, the primary emphasis for solutions design should be on ensuring that the business-critical applications can be restored quickly in the event of a disaster.
- Which systems are mission- critical? What are the availability requirements? What's the cost of downtime?
- What are the backup software and licensing requirements? Have enough licenses been purchased to satisfy the requirements?
- What are the database or application backup requirements? Is there a requirement for hot backup?
Step 2: Perform capacity planning
Once the assessment and inventory are completed and the backup infrastructure is understood and documented, the next step is to perform capacity planning. The purpose of capacity planning is to identify the sources of storage growth and perform a gap analysis to determine the differences between the current infrastructure capabilities vs. expected requirements. Important questions to answer at this stage include:
- What is the expected storage growth over the next six months and in one to three years?
- What are the anticipated increases in the number and types of backup clients?
- Will the current backup architecture and infrastructure scale to meet this growth?
This was first published in September 2002