This article can also be found in the Premium Editorial Download "Storage magazine: Tools for successful data migrations."
Download it now to read this article plus other related content.
Evaluate your daily/weekly/monthly/as-needed tasks. Document them and make sure they're performed and reported on schedule.
Keep in mind that time flies. Before you know it, a year will have gone by and a complete annual cycle will have passed. It may seem tedious at first, but eventually you'll come to realize the benefits of a more optimized environment.
3. Review backup logs daily. A review of backup application error and activity logs is a key daily task--but one that's often easier said than done. Log analysis can be time-consuming, but it can pay extremely valuable dividends and is essential to reliable backup.
Backup problems tend to manifest themselves in a cascading effect. One event results in a series of subsequent problems that don't have an immediate, obvious linkage. For example: A backup job doesn't kick off because a required tape drive was never released from an earlier job. This prior job was backing up an application server executing an unscheduled batch process, consuming system resources and causing the backup to finish late. The system administrator responsible never informed the backup administrator to reschedule the backup.
It takes considerable skill and detective work to determine whether or not one is observing a root cause or a symptom of some other problem. You must also establish good communications and working relationships with system administrators, DBAs, network administrators and others to effectively troubleshoot
|The backup operations lifecycle|
Daily: Validate backup activities
Weekly: Validate backup system
Monthly: Validate backup process
Quarterly/Annually: Validate backup solution
The catalog should be treated like any other critical application database. It should be mirrored, or at least RAID-protected, and you should verify successful multiple-copy backup of the database or catalog on a scheduled basis.
5. Identify and resolve backup window failures daily. Backup window failures are successful backups that exceed the expected backup window. Because the backup job itself completes, no errors are reported in the error log, so this problem is often overlooked. In addition to affecting production environments and creating user dissatisfaction, jobs that approach or exceed the backup window may be warning signs of impending capacity limits or performance bottlenecks. Recognizing and addressing these issues as early as possible can prevent future failures and avoid user dissatisfaction.
6. Locate and back up orphan systems and volumes. Your backup software invariably provides you with some level of reporting information about daily backup success. If you depend on this as the authoritative source on backup, then you're likely still at risk.
The backup application reports only on the servers it knows about. Large environments often have orphan systems--systems that have been brought into production but not incorporated into the backup plan. This can happen for a variety of reasons, but it's often the result of a business unit purchasing a system outside of IT's purview. The system may have been backed up independently at one time, but over time has fallen through the cracks. Usually these systems are discovered after it's too late: Data loss occurs and a restore request comes to IT for a system it knows nothing about.
Addressing this problem can be challenging and time-consuming. It entails regularly discovering and mapping new network addresses to nodes, filtering out unrelated addresses (e.g., additional NIC cards, network devices, printers, etc.), identifying the locations and owners of these nodes and establishing policies for managing the addition of storage volumes. Regular reporting to system and application owners of exactly what's being backed up and what's not being backed up (by choice) is also critical.
7. Centralize and automate backup management as much as possible. A key to successful data protection is consistency. This doesn't mean that all data must be treated in the same manner. What it does mean is that all data of equivalent value and importance to the organization should be managed in a similar fashion. The orphan problem is an excellent example of an inconsistency that can result from non-centralized backup administration.
In many environments, backup operations for Unix and Windows servers are run independently. This organizational alignment may pre-date networked storage--but it's questionable if the old arrangement still makes sense. Besides being inefficient, it suggests a different set of policies and procedures should be applied to data based on its operating platform. Is there any line-of-business owner that would apply that measure to data valuation?
This was first published in November 2005