Published: 15 Sep 2003
Supporting data-driven scientific research is always a challenge for storage managers. That's something that Timothy Belfield has discovered first-hand.
A senior technical analyst with the Donald Danforth Plant Science Center (DDPSC), in St. Louis, Belfield's team of four technical engineers has built and maintained a storage strategy to support the ever-increasing needs of some 200 researchers studying health-related plants, plant nutrition, disease resistance, novel bio-based products and tropical agricultural biotechnology.
Given the rapid growth in biotechnology research in recent years, data is being generated with increasing frequency and volume. What began as a modest amount of data at DDPSC's founding in 2001 has quickly blossomed into a full terabyte of data that's continuing to expand.
Data is stored on a storage area network (SAN) consisting of a 1.5TB Hewlett-Packard Co. (HP) StorageWorks Enterprise Modular Array 1200 system, three HP SAN switches, four HP HSG80 controllers, three HP MSL5026SL tape libraries, HP StorageWorks Modular Data Routers and 25 ProLiant DL380 servers. The environment includes a hodgepodge of operating systems and it supports business applications such as SQL Server 2000 and Exchange Server 2000.
Like most companies, DDPSC relies on tape to back up its critical data. But with data changing and growing so frequently, Belfield says, tape is struggling to keep up and has pushed the organization's backup window out to nearly four days. This is a constant source of frustration for the team, who would ideally like to see full backups completed in less than a weekend.
It's becoming increasingly clear that reaching their goal may never be possible using tape, and that is driving Belfield to consider alternatives.
"The amount of data we're backing up is growing, growing, growing, and our backup window is growing daily," he says. "For us, tape is the best now, but we never know exactly what we should be looking for in speed. If we can't get the backup window to come down, we'd have to look at disk. IDE disks would be a little slower than SCSI, but a lot faster than tape."
Although its primacy as a backup medium was unchallenged for decades, the plummeting cost of disk drives and the sheer bulk of data now being backed up in the average business have highlighted tape's major weakness: It's a slow backup medium. Each generation of tape technology provides faster performance, but those tapes still work the same way--by physically moving the tape past a read/write head with either linear or helical data streams.
This makes tape use a heavily mechanical process that still depends on getting a constant stream of information from the servers it's backing up. In today's high-performance network-attached storage (NAS) and SAN environments, ensuring this exclusive access can be downright difficult. It's an issue that quickly leads to tape problems as networks become flooded with backup data and tape performance slows, thanks to undue numbers of tape rewinds. "You're constantly fighting with tape and the drive itself," says Belfield. And those problems just refer to backup--not restores.
Disk to disk to tape
With the advent of less-expensive ATA disk, Gartner Inc. is now advising users to use cheaper disk technologies as a cache buffer between primary disk and tape. Nick Allen, Gartner's vice president and research director, says: "With multiterabyte tape on the horizon, tape should be used primarily for archiving." (See "Backup and restore recommendations")
10TB of capacity
The relatively recent boost in tape capacities and vendors' road maps for their tape product families suggest tape will continue to cost less than enterprise disks by roughly a factor of 50 or more for the foreseeable future. Even ATA disk-based subsystems will remain roughly 10 times more expensive than modern tape. These estimates assume that the tape environment is completely automated, and these calculations assume 100% utilization of the media.
Source: Gartner Inc.
In other words, cheap disks aren't going to replace tape, just tape's role in the backup and restore process. Quantum Corp., whose DLTtape range of tapes now fits from 80GB to 320GB of compressed data on a cartridge, last year enumerated a storage road map, taking its first major step with the recent release of SDLT 600, which pushes per-cartridge capacity to 600GB with transfer rates of 64MB/s. By 2006, Quantum expects to extend its linear Super DLT technology to squeeze 2.4TB onto a tape.
For its part, the HP/IBM/-Seagate-backed LTO Ultrium technology anticipates an 18- to 24-month generational cycle, with the format's capacity due to expand from the 400GB of today's Ultrium-2 to approximately 800GB in the next generation, and 1.6TB after that.
The other major high-end tape solution--Sony's helical-scan Advanced Intelligent Tape (AIT)--is currently lagging its competitors at around 260GB capacity, but it is getting a facelift. In February, OEMs received the first shipments of first-generation SAIT-1 drives, which compress 1.3TB on a half-inch tape and promise up to 10.4TB before the format hits its limitations around 2010. Sony proudly proclaims that OEMs will build SAIT-1 libraries in configurations of up to 1,000 cartridges capable of storing 1.3 petabytes of data (see "Tape road maps").
The tape-SAN challenge
As tapes hold more data, customers are finding that tape presents a new set of problems when it's introduced into increasingly high-speed SAN environments. Underlying the problem is the fact that tape, with its roots as a one-to-one server-attached device, doesn't generally play well with devices in the SAN. It expects a continuous stream of data, and if that stream is interrupted, it will issue a SCSI interrupt, stop the tape, rewind and continue again.
This interruption can cause major problems for a Fibre Channel arbitrated loop (FC-AL) SAN--even when FC-attached tape libraries are being used--because the break will invoke a loop initialization primitive (LIP) call that suspends the entire SAN until attached devices are detected and port addresses reallocated. This could also be a problem if slow servers mean the fast-moving tapes get ahead of themselves and have to pause and rewind to compensate for the interruption in data flow. Although FC-AL SANs have become less common with the shift to SAN switching, tape-disk contention must still be addressed to eliminate potential hiccups.
Another potential source of problems is from the configuration of tape within the SAN: Sharing a host bus adapter (HBA) between tape and servers can throttle back performance when contention for the HBA becomes an issue. Spend the extra dollars to get a second HBA just for tape, then consider a third, redundant HBA to make sure the servers have enough bandwidth. Many customers use SAN zoning to link tape and server resources.
"Vendors will claim you can configure an HBA for both disk and tape, but usually you get conflicting parameters and more often than not, you'll have problems," says John Verdonik, director of professional services with storage integrator CNT, in Minneapolis. "Contention issues become apparent when you're connected to a SAN. People have expectations of throughput, but in reality we have to choose to get that throughput."
|Backup and restore recommendations|
Nick Allen, vice president and research director, Gartner Inc., says you should:
The disk-to-tape shuffle
Whether they're too fast or too slow, tapes present a challenge for users migrating their data onto high-speed SANs. Over the past year, however, they've gotten a new option with the rise of storage arrays designed specifically for online backup, which feature scads of slow IDE disks and often support tape control languages so they can appear to the network as nothing but a fast tape drive. Some market leaders are Network Appliance's NearStore R100 (from 12TB to 96TB) and R150 (offering 12TB or 24TB each), Quantum's DX30 (3TB of RAID5 storage) and StorageTek's BladeStore (from 4TB to 160TB).
Deciding how much disk to add to existing tape drives requires intuition and experimentation (see "ATA disk outperforms tape"). It's important to consider, for example, how much data needs to be online and available all the time, or how much simply needs to be put onto tape and archived for legal reasons. Trying to keep everything on disk will soon prove frustratingly difficult, but putting it there with the intention to eventually move it to tape can provide much-needed flexibility.
In most cases, such disk arrays are being positioned not as replacements for tape but as a sort of staging area to be interjected between the production environment and tape. Using conventional snapshot techniques, data can be quickly duplicated and then backed up at the user's convenience: "We could make a copy to disk," says DDPSC's Belfield, "then back up that copy [to tape] and, if it took a week, it wouldn't be an issue because we'd still have a copy of it."
It's also possible to replicate the slow-fast duality of a tape-disk solution using tape exclusively. StorageTek, for one, balances speed and capacity between its 9840, which stores 20GB per cartridge and offers four-second load times and 12-second average seeks; and the 9940, which offers 60GB or 200GB uncompressed per cartridge, but has an 18-second load time and 51-second seeks.
If you need to support hierarchical storage management (HSM), you could easily install both to provide a tape-only infrastructure in which data is moved to fast tape first, then transferred onto the slower cartridges for archiving. Disparities between the performance of installed tape drives become irrelevant, however, when a near-line disk is added to the mix.
The approach you choose depends on your performance needs and backup windows. "[With tape] you still have the benefits of a much lower-cost storage medium than a disk system," says Pam Baker, advisory systems integration engineer with StorageTek, in Louisville, CO. "Yet near-line storage has added another option for customers to choose from; it can provide a fast-access buffer that provides enhanced functionality in the overall backup architecture."
|ATA disk outperforms tape|
Try to use less-costly disk technologies as a cache buffer between primary disk and tape.
Source: Gartner Inc.
Disk alone won't a comprehensive backup strategy make, however. Effectively making use of near-line backup requires a tool such as StorageTek's EchoView, a journaling storage monitor that provides a continuous backup allowing for restores to any point in time. Such capabilities will rapidly work their way into enterprise storage management solutions as near-line disk becomes even more pervasive among customers.
Ultimately, the performance and accessibility of disk is likely to find it a role in even the most tape-intensive environment. That's what happened to Data Base File Tech (DBFT) Group of British Columbia, Canada. The managed backup service provider recently installed a Quantum DX30 to provide better performance (compared with tape) when backing up and restoring as many as 20 customer servers at the same time.
"We wanted to run remote, but tapes require someone to be onsite all the time," says Maurice Auger, director of operations with DBFT. "Tapes do not have the throughput required to do that unless you have great libraries with caching systems on them. But disks are designed to look like tapes, and the software we're using, which runs best on tape libraries, ran almost flawlessly the first time. Overall, we're seeing in the order of a 4x speed improvement, and this gives the customer high availability of their data any time they want it. If we'd backed up to tape, we would probably have to get involved [in the restore]--but this way we don't."
Whatever the motivations for their embracing of disk, analysts are widely convinced that customers will increasingly flock to disk, while retaining tape as an offline archival method. Speaking last month at Gartner's "PlanetStorage 2003" conference in Las Vegas, Gartner vice president Bob Passmore proclaimed that increasingly efficient incremental backup techniques would combine with improved snapshots, mirroring and disaster recovery features to make disk the primary medium for data restores by 2008.
By then, however, current data growth figures will suggest that data will have grown to many times its current size. Tape will always be priced lower than disk (see "Cost comparison: 10TB of capacity"), and it can be transported to vaults.
Says Andrew Senior, president of tape solutions provider Avax International, of Erin, Ontario: "Tape is a nice, cheap way of keeping large amounts of data stored off-site. If you don't have sufficient historical copies, then you're dead. Tape is going to hang around for a long time."