Manage Learn to apply best practices and optimize your operations.

Tape libraries on the SAN: sharing isn't always good

If your latest project is choosing a new tape library for your SAN, you have a lot of decisions to make--the most important being how to architect your backup.

If you've been evaluating enterprise class tape libraries, you've no doubt been inundated with product glossies...

from different vendors touting the many connectivity options available when connecting their particular library into your storage area network (SAN). Some of those options amount to different strokes for different folks, but you can untangle the various connection options for your SAN-enabled tape library and make an informed decision.

Data center logistics
The first requirement you need to understand is distance. Where will your tape library be in relation to the backup servers and hosts it will backup?

Due to the initial and expandable size of most enterprise class libraries, invest some time determining the optimal location of the incoming library. Not only do you need to account for its initial dimensions, consider the time frame in which your overall data growth would require you to add an extension cabinet to your initial tape library investment (see "Implementing enterprise class tape libraries").

Useless redundancy
Vendors providing native Fibre Channel (FC) tape drives often provide both A and B ports for connectivity into separate storage area networks (SANs) for redundancy. Although this functionality exists, the ability to exploit it isn't readily available in the field. Perhaps in the near future, as tape drive speeds increase--making them more viable solutions for high profile hierarchical storage management applications--vendors in the volume management space will enhance their applications to allow a physical tape drive to be referenced over an alternate path once the primary path has failed.

Locating an expandable enterprise class tape library in your data center that's already congested with equipment could require you to relocate the entire backup and recovery environment or move production application servers to accommodate expansion cabinets. Imagine walking into an application owners' office and suggesting they bring down their production servers and storage to move them to a smaller section of the data center so you can move your behemoth tape library. Not pretty.

Before Fibre Channel (FC), that wasn't likely to happen because smaller libraries had to be close to backup servers due to SCSI's distance limitations. We were likely to look at the backup and recovery hardware as a complete environment, instead of single pieces of equipment to be managed separately. However, since the allowable distances inherent in FC's architecture are greater than SCSI, IT organizations have been tempted to locate their tape libraries away from their backup servers simply because they could. It was easier to just plop the library down in the next available section of raised floor.

So if your design calls for the tape library and backup servers to be in the same data center, try to locate your tape library adjacent to your backup servers with enough allotted space to accommodate the total projected amount of space necessary for cabinet expansion. That will save you in both the downtime and support cost often times associated with moving equipment.

However, if a SAN design requires that the backup servers be located in data center along with the application hosts that it's protecting, and the tape library is located in data center B for maximum protection, you'll want to ensure that the allotted space in data center B will support additional cabinet space as well.

This second scenario is a scenario many IT groups with stringent business continuance objectives implement for campus environments. The backup data is located in a building other than the application hosts, protecting it from disasters. By placing the backup server in the same location as the application hosts, the hosts are able to deliver their data to the backup server and ultimately to the FC-attached tape library faster than if the backup server was located a few more IP hops away at data center B.

Tape drive connection
The criticality of your production applications drives some decisions on connecting the tape library to the SAN, as well as calculating how many servers will be necessary to drive data to and from tape. For example, if you're backing up infrequently updated data, then it may not be the end of the world if the storage node--to use Legato terminology--responsible for driving the data to the library fails during the backup window.

At the other end of the spectrum, you may be supporting thousands of updates a minute. In that case, the failure of a storage node-or media server in Veritas speak, data mover in EMC lingo-in conjunction with data loss on the associated application host would be more costly. Having more than one storage node provides the application host with more protection, and most backup and recovery software vendors support the ability to define more than one storage node in case of a failure.

Now that you've decided whether one or multiple storage nodes are required, you can start to answer some of your connectivity questions.

Ask yourself what your organization's attitude has been toward managing the storage application where the library is intended. Be honest about your staff's attitude and ability. For instance, if there aren't any procedures or practices in place for adding servers to the backup schedule, then good capacity planning is likely nonexistent. The skill and effort displayed by your staff plays a role in deciding how to provision the tape drives in your library. Here's why.

There are basically two approaches to mapping available drives to servers: a shared pool or an allocated pool (see "SAN tape libraries," this page). If your tape library includes 12 tape drives that must be available to three storage nodes, you could make all of the tape drives a physically shared pool available to all of the storage nodes yielding 36 logical instances of the 12 physical drives. That means your master backup server has to manage who gets the physical drives without assigning the same drive to different storage nodes.

Alternatively, you could allocate some number of drives to each of the three storage nodes depending on their current and projected workloads. In this connection scenario, the storage nodes won't have a defined path to all 12 tape drives. Instead, they'll only be able to see those drives allotted to them via fabric zoning.

The shared pool approach will definitely exercise your storage application's ability to recover from device delays because there's no direct way to throttle any one storage node's ability to request tape drives. You could control the number of tape requests issued by the storage node by massaging the schedule. But that assumes some person or group is actively managing the schedule, which in this scenario isn't the case. Therefore, depending on the time that a particular storage node was scheduled for work, that node could potentially request and reserve all 12 tape drives for as long as it needs to complete its work. During that time, the two remaining storage nodes would have to wait until the first storage has completed enough work to release its reservation on a tape drive, possibly causing backup failures.

The allocation approach calls for the pre-assignment of physical tape drives to each of the three storage nodes in whatever combination meets the expectations of the storage nodes. For example, suppose you had two dedicated backup storage nodes or media servers to drive backup data for 100 application hosts. In addition, you had one storage node or media server that doubles as a high-availability Oracle server with a significant amount of data. The server dumps logs to tape on an hourly basis.

Bringing file and block together

In this scenario, you might want to allocate five tape drives to each of the dedicated storage nodes, and two to the storage nodes doubling as an Oracle server. This way, the Oracle server will never have to wait for a tape drive during the busiest times of the other two storage nodes.

That approach requires more emphasis on capacity planning because once you have provisioned your tape drives in the desired configuration, a dedicated storage node that's using all five of its tape drives can't automatically request a sixth, even if there's a tape drive available from another storage node's pool. Of course, you can change your fabric zoning configuration to shift drives from one storage node to the other, but not in real time. I've implemented and managed the results of these approaches, and they work, but only for the right scenarios.

If the operations group or help desk are responsible for managing the backup schedule for a modest amount of application hosts--say, up to 100--then the best thing to do is to utilize some of the money saved from not having your engineering group support this environment to purchase an additional tape drive or two, and go with the shared pool approach, making your tape drives visible to each of your storage nodes or media servers. Ongoing capacity planning won't be necessary, unlike with the allocated approach. And because the application server count isn't an inordinately high number, overscheduling backup jobs during the backup window is less likely. That isn't to say care shouldn't be given to the backup schedule--you still don't want to schedule too many backup streams to any one of the storage nodes or media servers during any one time frame.

For organizations supporting more than 100 application hosts--especially if they're already having problems completing their backups before the end of the backup window--capacity planning should be a high priority. A high number of application hosts, and an increasingly high amount of utilized storage associated with these hosts, are early indicators that capacity planning is justified regarding how you connect the tape drives in your tape library to the backup servers on your SAN.

Often times, just get it done, administrators stuff hundreds of backup clients into the schedule wherever they fit, and assign them to a storage node in a random fashion without any regard for its current load or the resource requirements of its neighboring storage node. In an environment where all of the tape drives are visible to all of the backup servers, an uncontrolled mount storm is likely to occur and cause resource shortages in your tape pool.

However, in an environment where the 12 drives are divided in some combination and specifically provisioned to a server, a similar mount storm won't yield unpredictable results. Each of the servers will only have access to the tape drives that they were provisioned. The same number of mount requests will still be generated, because that depends on the number of simultaneous streams the tape drive can support.

But allocating drives yields some predictability, because you know how much data to backup and how many tape drives you'll have to perform those backups. With a shared pool, you know how much data is going to be backed up, but the amount of tape drives the storage node has available could change from day to day depending on what work has been scheduled for the other storage nodes at the same time.

If you're thinking the allocated approach appears to go against the benefit of sharing tape drives in a SAN, you're right. All installations won't be able to take advantage of the real-time tape drive sharing capabilities of their backup and recovery software. Instead, because of the number of backup clients and the volume of associated data backed up daily, these installations will benefit from the speed, scalability and management enhancements of a SAN-attached tape library. Certainly, there are monetary reasons why you should share tape drives, but if you're supporting hundreds of backup clients in your environment, with individual business units having access to the backup schedule and able to add and remove backup clients to the environment, then you must institute predictability somewhere in your plan.

Robotic arm connection
The robotic arm is controlled and accessed by the master server in your backup or hierarchical storage management (HSM) environment. The storage nodes or data movers make mount requests to the master server and the master server then issues SCSI commands to the library's robotic arm to select and load the requested tape. How you make the robotic arm visible on the SAN depends on the logistics of your data center(s) and your business continuance objectives.

If your SAN design requires your tape library be located within the SCSI distance limitations of your master backup server, then there's no real benefit to having the robotic arm bridged into the SAN with an FC/SCSI bridge. However, there could be a downside to making the robotic arm visible to the master server via the SAN instead of directly attaching it via a SCSI cable. This is especially true if the master server is also doubling as a storage node and is responsible for moving large amounts of data. In this scenario, make sure that the data stream resulting from scheduled backups isn't impeding on the command, data and status information units being directed to and from the robotic arm. This possibility is due to the hardware interrupt management routines involved in streaming data to the tape drive when accounting for the length of time it takes the status of a SCSI mount command to return to the initiator (master server).

However, if your design and business recovery objective requires that the robotic arm be accessed over a distance, consider installing a second FC Host Bus Adapter (HBA) in the master server-possibly on a separate bus-to access the robotic arm. With 1Gb/s HBAs selling on eBay for $200 and prices per switch port declining, this configuration isn't as much of a luxury expense as it use to be.

The ability to access the robotic arm over FC does have its benefits, but these benefits are related to distance with regards to disaster recovery or building logistics. To that end, be sure to certify and maintain firmware revisions within your interconnect devices, as well as any device drivers and SCSI tape patches on the master server.

If you're anticipating deploying an enterprise class tape library onto the SAN, ask lots of questions. Make sure your vendor(s) explain in detail the pros and cons of the design methods used to integrate the hardware into the SAN, emphasizing the cons. Often times, we focus on the benefits of implementing new hardware without looking at the other side of the coin. Having an independent integrator or tenacious employee on your side may help protect your interests by adding a different perspective.

Web Bonus:
Online resources from "Quick Takes: Forever Tape," by Kevin Komiega.

Dig Deeper on Primary storage devices