This article can also be found in the Premium Editorial Download "Storage magazine: Expanding SANs: How to scale today's storage networks."
Download it now to read this article plus other related content.
Retention and recall
The raw capacity requirements for backup are considerably different than for primary data. To begin with, there's much more backup data in an environment. Depending on retention policies, there can be anywhere from five to 50 times more backup data than primary data. A critical question when developing the capacity equation is how quickly this data needs to be recovered.
|Backup policies make a difference|
This graph illustrates the effect of optimizing backup operations and policies. The company would outgrow the tape library capacity by February 2004 without any change. Implementing changes in processes extends that out to December 2004, and the addition of a second library further extends the viability of the system until past August 2005.
Because tape is a removable media, its capacity is nearly limitless--in theory. Yet from a practical standpoint, limits have to be set related to data retention and the restore time. The key factors under consideration are the size of the tape library, the media required and the costs associated with transporting and storing tape media.
To recover data in a timely manner, a tape library should, at a minimum, have the capacity to retain the current version of all data. That means it should be able to hold the most recent full backups and related incremental volumes. In many environments, the requirement is considerably greater, often targeting an onsite goal of 30 days or more. To calculate this, one must consider how many backup cycles (full backups plus associated incrementals) are required to meet retentions and recovery requirements along with the amount of data associated with each. It's also wise to factor in a realistic estimate of average tape utilization--something less than 100%.
Given this current capacity, you should then determine the current utilization rate and apply the same data growth projections used in calculating bandwidth to determine projected volume retention capacity requirements (see "Backup policies make a difference").
Up to this point, we have considered capacity planning based upon business as usual, making no modifications to the existing environment. Assuming an accurate growth forecast, consider this the high-dollar budget number and use it as a baseline for the capacity planning process. Our experience has been that in almost every environment there is room for improvement, including operational changes that increase tape utilization and policy changes that dramatically reduce the quantity of backup data that must be maintained in a library.
This is the stage in the capacity modeling process to begin to play "what if." Look for inefficiencies in the environment and explore options to improve them. Then rework the capacity numbers and translate them into budget dollars. "Forecasting your tape drive needs" shows an example of forecasting tape library capacity, including some "what if" options.
What if your backup architecture is not the traditional network architecture, or you are considering moving to a LAN-free or disk-based design? The need for capacity planning becomes even more critical because the costs are typically greater. A LAN-free environment usually requires more expensive software licenses as well as higher costs for storage area network (SAN) infrastructure. It can also potentially increase the resource contention for tape devices if it is not planned correctly. A disk-based design eliminates the tape drive resource contention, but still retains network bandwidth considerations. In a pinch, you can buy those additional terabytes of tape media and find a place to store it far more easily than purchasing and integrating an equivalent amount of storage in the data center.
Developing a capacity planning model for backup is an important step in improving backup efficiency. By maintaining and publishing capacity trending numbers, an IT manager can begin to socialize the impact of enterprise data growth on backup.
This was first published in November 2003