This article can also be found in the Premium Editorial Download "Storage magazine: Managing data storage for remote employees."
Download it now to read this article plus other related content.
SCSI or Fibre Channel (FC)? DLT or LTO? LAN or LAN-free? How many slots do I need? Are two tape drives enough? Which features are really important?
|Data path options from disk to tape|
A fundamental question for tape library selection is, "What size library do I need?" Unfortunately, this question is often translated to, "What size library can I afford?" The answers to these two questions aren't always in sync. While the budget is an overarching factor, an incorrectly-sized library can actually increase costs in the long run.
Proper sizing is dependent on several factors, but the following are the most important:
- How much data will I need to back up?
- How quickly is my data growing?
- How granular are my recovery requirements, i.e., do I need an online recovery of every file version from the last 30 days, or is the end of the week's version sufficient?
- How long do I need to keep my backups in the library?
If you don't know the answers to these questions, it's extremely unlikely that you'll select the right tape library. You're likely to end up like the financial organization who thought they only had 500GB of data, but really had six times that much. Needless to say, the tape library they purchased was more than a bit too small, and they had to upgrade it in less than a year.
Here's a hint to get you started with sizing: You probably have a lot more data than you think. Do you know about all of the servers in your environment? Do you know the backup requirements for new projects and applications? How many of these projects are likely to be rolled out in the next two years? Do you know how fast your data is growing? Are there pockets within your organization currently outside of your responsibility you're likely to inherit? All of these can increase the sizing requirements of your library.
Here's another hint: Data on tape is greater than data on disk. Practices such as doing daily full backups with long retention periods - or requirements to retain many incremental copies online - can result in multiple copies of data written to tape, which extends tape capacity requirements to several times that of disk. However, if these policies aren't well understood, it can get out of control. A division of a large financial institution had over 50TB of data on tape, dramatically overflowing the capacity of their libraries. An analysis of their primary data showed they actually had less than 4TB of primary data on disk. They were able to avoid the expense of purchasing a new library by simply adjusting their policies.
Do I have the bandwidth?
Another major consideration is bandwidth. The first bandwidth-related question usually asked is, "How many tape drives do I need?" The drill is to calculate this answer by estimating the amount of data to be backed up per night, and dividing it by a realistic estimate of the drive performance rate multiplied by the backup window.
However, tape drives are only one end of the data pipeline, and there are a number of potential bottlenecks along the way that must be considered. A follow-up question should be, "What data rate is required to sustain this performance?" To answer this, look at the data path from the original location of the data on disk, through servers, networks and I/O interfaces to tape, and then make a determination of whether this can be sustained (see the sidebar "Data path options from disk to tape").
Otherwise, it's easy to end up like the data services company who upgraded their old 8mm tape library to a new high-performance LTO tape library and were shocked to discover their backup times had actually increased. In this case, it turned out that their backup server couldn't send data to their high-performance LTO drives fast enough to keep the tape streaming. When the drives started their stop-reposition-start repetition, backup performance nearly ground to a halt. The server's I/O limitations had previously been masked by their older, slower drives.
Another related concern is connectivity. How does a
FC storage network affect tape library decisions? Do FC libraries offer higher bandwidth? Should SCSI libraries still be considered? FC certainly has benefits over SCSI, including allowing increased distance from servers to libraries, and improved flexibility in configuration. But don't make the mistake of selecting FC for speed. There's currently no tape drive available that can saturate an UltraSCSI bus. Whether a drive has a FC or SCSI interface, it won't vary in performance if properly configured. In fact, in many FC tape libraries, SCSI drives are used internally and routed through a SCSI to FC gateway to provide FC ports externally.
A major advantage of FC libraries is their ability to share tape drives among servers. The ability to dynamically share drives in a LAN-free environment provides greater drive utilization, and reduces LAN data traffic. Be aware, though, that a storage area network (SAN)-based shared tape environment requires a greater investment in hardware, software and services to implement. Find out all you can about the combination of libraries, switches and arrays you plan on using, including firmware levels. And then expect to test exhaustively to see whether your combination is reliable and has decent performance.
So, is there a reason to consider a SCSI-based library in this day of FC? If the flexibility and distance advantages of FC aren't required, the total investment required to purchase a SCSI-based library can be significantly less. This can make a big difference in budget-constrained environments.
Understanding your data and answering these key questions will help you make better choices. Your answers will allow you to overcome this first big hurdle in implementing a centralized backup environment.
This was first published in September 2002