Published: 12 Sep 2002
One of the best ways to generate a quick return on an investment in storage area network (SAN) technology is to implement LAN-free backup. While LAN-free backup solves a number of problems through brute force performance, you still need to sweat out the operational details. To get the most from your LAN-free backup SAN, it's important to understand all the performance capabilities and constraints and solve them in parallel. In this article, we look at the steps needed for a meaningful and realistic schedule for LAN-free backup.
|Model LAN-free backup schedule|
It shows how schedules can be created for the model SAN - which could then be optimized achieve the desired performance, redundancy, scalability and cost goals.
Building a SAN for LAN-free backup begins with knowing how much backup traffic each server generates. Of course, the amount typically varies according to the day of the week and number of free backup operations that are run. For example, backups running on weekends are usually full system backups; backups on other days are either incremental or differential. Analyzing several weeks' worth of backup logs will give you a reasonable handle on day-to-day backup workloads.
Tape device performance
The next step for a smooth LAN-free backup is knowing the maximum transfer rate of the system's tape devices. This is normally calculated as the device's streaming speed with a 2:1 compression ratio. The differences between streaming and non-streaming speeds can make a significant difference using helical scan tape technology. This isn't a problem for helical technologies with transfer rates on 6MB/s or less because the streaming speed is slower than the system's transfer rate. Higher speed helical tape technologies can only realize their performance potential if they are paired with systems having equal or better transfer rates. In general, linear tape technologies provide the most consistent performance for LAN-free backup and are recommended over helical scan technologies.
Data compression increases the transfer rate of a tape drive and its capacity. The performance boost from compression depends on the ability of the system to transfer data to the device fast enough. Unfortunately, system backup transfer rates are difficult to determine and depend on many variables, including I/O activities driven by other applications. The best backup performance is achieved by running cold backup when no other applications are running in the system.
For planning purposes, you should assume Intel-based systems (Windows and Linux) can support a maximum cold backup transfer rate of 10MB/s; non-Intel Unix systems can support cold backup transfer rates up to 25MB/s. This means that an Intel-based system probably can't take full advantage of compression on tape drives with non-compressed transfer rates of 10MB/s. When planning LAN-free backup for Intel systems, you shouldn't calculate backup data transfer rates of more than 10MB/s, even if the tape device can provide performance at twice that level. Remember, the actual backup transfer rates for Intel systems may be slower than 10MB/s and there's no good way to estimate until you test it on real hardware.
Before configuring the number of tape drives you need for a LAN-free backup SAN, you should test the backup transfer rates of your systems using a single tape drive connected to a SAN storage router or a high-speed SCSI bus, SCSI-2 or higher. When Intel and Unix InfiniBand systems are released, it's safe to assume the cold backup transfer rates will be much higher than tape, making it easier to predict maximum backup transfer rates.
Now it's time to consider the pros and cons of connecting the tape drives by Fibre Channel (FC) interfaces or directing the backup through storage routers. Native FC interfaces have obvious ease-of-installation advantages compared to connecting SCSI tape equipment through storage routers. However, the advantages of multiplexing backup data paths through fewer switch ports and the ability to control access to backup devices through LUN mapping and LUN masking in storage routers may outweigh the additional installation efforts.
|How to model your SAN for LAN-free backup|
Modeling backup on a storage area network (SAN) is an iterative process, involving the four layers of servers, switches, routers and backup devices. This diagram shows a rough starting point for a network with 50 servers; the numbers would be different for every environment. Refining the model requires specific analysis of how data flows would stress downstream components.
It's important to understand that the SAN model you create is a work-in-progress that will probably change as you start applying your backup schedules to the resources in the model. For instance, you may need to move some tape devices from one bus to another in a router or move them to another storage router. Likewise, you may decide to change the switch configuration in order to achieve more flexibility in backup paths.
There's nothing that says your SAN model needs to be a single multiswitch fabric SAN. The model you create may be several physical SANs, and can be different sizes. The model used in this article could be implemented as a single SAN or as many as eight SANs, depending on whether any of the switches are connected. In the model discussed here, it wasn't necessary to include interswitch links as a resource for scheduling because none of the potential interswitch links would carry backup traffic.
High-speed storage routers are capable of supporting seven 10MB/s tape devices running concurrently through a single FC port. High-speed storage routers with two SAN ports can support an aggregate backup transfer rate of 140MB/s, which is approximately equivalent to 500GB/hr. Compared to traditional network backup performance, this is an enormously high number.
FC SANs are capable of supporting backup transfer rates of approximately 75MB/s over any supported link. An important design consideration is configuring the backup streams that will run over the individual SAN links. For instance, if there are 40 backup jobs running over four links through four storage router ports, you may want to try to distribute the backup load evenly, over time, across the four links. This creates the shortest overall backup process. Keep in mind that servers capable of using multiple SAN backup links simultaneously probably need to be Unix systems with powerful I/O capabilities.
Creating the model SAN
After you have some idea of the amount of backup traffic you are working with, it's time to construct an initial SAN model. The idea isn't necessarily to get it right, but to create a hypothetical system that you can use as a tool as you develop your plans.
The key pieces of the model SAN are the number of servers, switch ports, router ports and tape devices you will use. One of the key outputs of the model SAN is an estimate for the cost. Netreon's SAN Designer is a software product that can help you build a model SAN complete with a parts list that you can use to start setting a detailed budget. Over time, a tool like SAN Designer saves you a lot of time as you tweak your SANs to fit changing requirements as they are identified.
Once the model SAN is created, you can start putting together an operations schedule for your LAN-free backup system. Building the backup operations schedule is probably the most difficult part of the project because it usually forces you to rethink some of your expectations about how the system will work.
The first step is to identify all the backup targets that will be part of the LAN-free backup system. A backup target can be a complete server or storage-volume/database-partition that can be managed as a discrete entity by your backup system. These targets are probably already identified in your current backup systems.
Now calculate the backup transfer rate (BTR) for each target. The BTR is determined primarily by the slower of the system backup transfer rate or the tape drive's transfer rate. Obviously, you first have to have some idea of type of the tape drive you'll use for each target. If you are using tape drives that support transfer rates of 10MB/s or greater, you may want to use a conservative transfer rate of 8MB/s for Intel systems.
Determining the backup window
As you may know, the backup window is defined as the starting time and ending time that a backup operation for a single target is expected to complete in. Setting realistic backup windows is critical for success for all backup systems, including LAN-free backup. Often backup windows are determined by the access characteristics of the primary applications running on a system. For instance, if an Internet server is expected to be used 24 hours a day, you might find the hours of least use are between 2 a.m. and 5 a.m., thus determining a three-hour backup window between 2 a.m. and 5 a.m.
Obviously, determining backup windows this way can be a problem if you have many backup targets, all with the same user access characteristics. The brute force, cost-is-not-an-issue approach is to build a large SAN with multiple storage routers and tape devices and run as many parallel backup streams as necessary. But cost probably is an issue and you shouldn't expect to backup more targets than the performance of your LAN-free backup system will allow. In that case, there's no choice but to shift the start times of some of your targets' backup windows.
An effective concept is to think of backup windows as time containers, similar to buckets that have set capacities. This allows you to think about managing LAN-free backup systems in terms of maintaining time capacity in the system. In the long run, this is probably the most effective way of maintaining a healthy and reliable backup system.
Be careful to avoid setting impossible backup windows. Backup windows should be determined conservatively using your estimated backup workload and target transfer rates so the backup operation can complete within the time prescribed by the window. If the backup window is too small, you have two options: find a way to subdivide the backup target and do the work in parallel - which may not be possible - or increase the backup window to something that's realistic. This isn't merely an exercise in making backup schedules work out, but it has a direct impact on the recoverability of your systems. You're better off re-calibrating your expectations for backup windows than you are having chronic incomplete backup operations constantly exposing your ability to recover.
The master backup operations schedule
Next on your list is the creation of a master operations schedule. Consider this a work in progress: Any operation schedule is an iterative process - expect to make many adjustments as you fine-tune the process. Backup windows for all targets should be laid out chronologically by start time and end time - similar to a television broadcast schedule - so you can easily see the workload as a function of the clock. If you don't have a method for doing this, create a master backup schedule where the top row is a 24-hour clock starting and ending at 8:00 p.m, with 30-minute increments. Then assign a row in the schedule to each backup target and block out the backup window for the target. This schedule will have a lot of blank space because each row only contains a single backup window.
You'll want to make four derivative schedules from the master that analyze bottlenecks in other resources in the LAN-free backup system. The idea is to apply the SAN model you previously developed and then test its capacities against the master backup operations schedule.
The four derivative resource schedules you'll create are for SAN links, router ports, SCSI buses and tape drives. First, make a schedule where the rows are SAN links. Assign each target's backup window to a link and the estimated transfer rates for each backup window are combined to show what the instantaneous load is on any link in the system. If the load exceeds 75% of the link speed on any FC link, expect performance degradation due to resource contention. You'll need to add more links, change the link assignment or reschedule the backup windows.
Next make a router port schedule, where the rows of the schedule represent storage router SAN ports. Assign each target's backup window to a storage router port that can be connected to the SAN link identified in the SAN link schedule. If the load exceeds 70MB/s through any router port, consider making adjustments in the hardware configuration or the timing of your backup windows.
Each backup stream will be processed by the storage router and assigned to a SCSI bus. As in the previous schedules, make sure the SCSI bus can make a connection to the storage router SAN port. In general, this won't be a problem, but it's a detail that needs verification.
Traffic maximums on the SCSI bus can be conservatively estimated at 70% of the rated bus speed. If the backup load exceeds that, then you'll probably want to make hardware or schedule adjustments.
The last resource schedule is for tape drives. The point of this schedule is to see if there are any conflicts over any individual tape drives. Tape drives can be assigned to different backup targets at different times, as long as there's a good way to ensure the proper tapes can be loaded.
It's preferable for different targets on the same server to have backup windows that don't overlap - especially if high-speed tape devices are being used. This prevents the system's I/O capabilities from being a surprise bottleneck. Looking for these types of system bottlenecks doesn't require a separate analysis schedule as they can be determined by analyzing the master operations schedule.
As adjustments are made in any of the resource schedules, they need to be reflected in all the others. Each update to a specific resource schedule should cause a recalculation of the workload for that resource to ensure that the bottleneck is being removed from the system and not merely transferred to another resource.
After working through the detailed resource schedules, you can create a finalized LAN-free backup schedule and SAN design. One of the benefits of working through resource schedules is that it determines how many resources you'll need to incorporate in your LAN-free backup system. You'll also have some reasonable ideas of which resources you are likely to need as your SAN expands.
Of course, the final schedule and SAN design will change over time as the system environment changes. You should monitor daily operations on a regular basis to identify problems that may be occurring.
Beyond hardware and media errors, the place to look for impending problems with your LAN-free backup system is whether or not your backup windows are still longer than the actual backup times. The concept of the headroom in a backup window may be helpful to some readers.
As the amount of data increases, the backup windows will be exceeded, creating overload situations and bottlenecks in the SAN. When that happens, it may trigger another iterative round of schedule juggling or additional resource purchases to allow the backup performance to stay in compliance with your backup goals.
Moves and changes
Resource overloads can occur suddenly if applications are relocated from one server to another. For instance, an application that's moved from one system to another could change the amount of data in the receiving system's file system, creating a sudden change in the workload. This workload change could cause the backup target's actual backup time to exceed the time allowed by the backup window.
Similarly, the installation of new applications can also cause significant changes to the backup system. Generally, new installations won't be as immediately noticeable as a moved application because it normally takes some time for the new application's data to grow.
LAN-free backup can solve numerous backup problems currently in your computing environment. As you plan your LAN-free backup solution, you need to understand the various constraints to backup performance, which can include system I/O performance in addition to tape and network performance.
To make LAN-free backup work, you need to assign realistic backup windows to each backup target and to manage the scheduling of these time containers within a master operations schedule. The master schedule is used to create several derivative resource schedules which can be quickly viewed and analyzed to determine if backup has gone out of compliance and whether or not these changes are likely to result in system overloading and performance degradation.