Published: 10 Mar 2003
Ten years ago, the Hobart Corporation, a commercial food equipment and service provider in Troy, OH, started to move applications off its mainframes to open systems platforms. We also needed to be able to backup and recover those systems. In 1993, the reliability and well-established procedures of the mainframe operations made it the obvious choice for open systems backup.
Like many businesses at that time, IT consolidated numerous departmental LANs into an enterprise network, and took responsibility for its file and print servers. With that responsibility increasing, backup and recovery responsibilities grew.
Fast forward to 2000: The open system backups now occupied 1,660 mainframe tapes and kept eight tape drives spinning full time. The backup tape contained a total of 2TB of data. Just recycling the obsolete onsite tapes required dozens of operator mounts every day. The large tape inventory and high volume of mounts ate up a significant amount of operator time. The constant filing and retrieval also made lost tapes inevitable. We had a growing problem.
On the mainframe, the eight drives occupied two refrigerator-sized IBM tape drive units and an equally large controller unit. The seven-year-old drives weren't aging gracefully: Tape errors were becoming the norm rather than the exception, which interfered with the backup processing.
In addition, our management wanted us to install off-site disaster recovery, which under the present system was impossible. Why? On the mainframe, the group of backup tapes that should have been copied for storage off-site was larger than the number of available scratch tapes. This kept us from making use of the ADSTAR Distributed Storage Manager's (ADSM) Disaster Recovery Manager feature, which inventories tapes off site and in transit.
Advantages of Cron scripting
The Unix environment permits more flexibility in scheduling than Windows or the mainframe, so you can use Cron scripts to schedule jobs.
That way, you can check logs to be sure that a scheduled procedure has ended before running. Each script also writes its own log to record the start and end of the process, which helps to fine-tune the process. Errors are also logged, including any process that takes longer than expected. By using loops, the script can stop waiting after specified time, then kick off critical tasks.
A Cron script can also automatically respond to replies that would normally require an operator. With the tape robot, this permits TSM to automatically recycle old off-site tapes that the operator has placed in the I/O station. Onsite or offsite, tapes are recycled when more than half of the data on them is obsolete.
Our overnight backups end by 4 a.m. Using two scripts, by 6 a.m., all of the disk pools have been migrated, off-site copies have been made, the TSM database backed up, a recovery plan created, the RS6000-S7A's operating system backed up and the off-site vault retrieve and inventory lists are queued to the printer. At 9 a.m., the operator:
- Removes the operating system backup tape (labeled by day of the week) from the S7A's 8mm drive
- Inserts in the next day's operating system backup tape from a storage box on the S7A
- Removes any LTO tapes from the library I/O station
- Prints a single page with the vault retrieve and inventory lists
- Takes all the removed tapes to the vault
- Pulls the LTO tapes to be retrieved, as well as the oldest operating system backup tape
- Checks the resulting inventory of the vault
- Puts the retrieved LTO tapes in the library I/O station
- Puts the old operating system tape in the box on the S7A
At 3 p.m., a Cron-activated script checks in the tapes from the library I/O station and loads them back into the library for reuse. Operators don't need to log into Unix. The operators report any discrepancies in the inventory. There is a script to search the logs for errors each morning. Problems are becoming rare.
Searching for a better way
We began investigating the costs of moving backups from the mainframe to a Unix platform with newer tape technologies. Our first proposal recommended that a new robotic tape library be purchased for the existing IBM's RS/6000-S7A server to support all of the open systems servers. In round numbers, this proposal included an additional expenditure of $16,600 for the current year and savings of $28,000 every year thereafter.
Because the RS/6000-S7A was only lightly accessed by a handful of people, it was definitely underutilized. The server almost never used more than two of its eight processors, and its 8GB of memory would be more than adequate to handle additional processing. Most open system backup processing is done at night, so the existing interactive Oracle users wouldn't be affected. And moving the backups to the RS/6000-S7A would free mainframe cycles for crucial batch work.
The eight mainframe drives had a total throughput capacity of 192GB/hour. The proprietary tape library--which was initially proposed--would have just two drives for throughput of 216GB/hour. Management requested a full review of all open system backup hardware, software and procedures to ensure the best long-term solution.
The review starts
When the review started--in addition to the mainframe backups--we also used Veritas' Backup Exec, Computer Associates' ARCserve software and an Exabyte 8mm tape autoloader to back up the operating systems on Netware and Microsoft Windows servers. By contrast, most enterprise backup solutions are designed to protect only user data and assume that the client machines have a working operating system and backup agent installed. While our current configuration was taken into account, all tape technologies and backup software products were to be considered fair game.
It was imperative that the project save money. Any change had to ensure that the technology chosen would continue to be enhanced in the future to accommodate expansion of our open systems environment and would have continued vendor support. Potential vendors' financial stability and their ability to provide local/timely service were also weighed as a factor in both the hardware and software evaluations.
The platform used for the backup server had to be robust enough to handle a 25GB nightly load from 40 clients. The backup software had to support NetWare, AIX, Windows NT and Windows 2000. We also required clients for any operating system that we may support in the near future, such as Linux or AS/400. Also a requirement was support for agents--which greatly reduce the nightly load by only backing up changes--for DB2, Oracle, MS-SQL, and Lotus Domino.
And the winners are
We investigated a large number of software vendors, paying particular attention to those with which we had existing relationships and to industry leaders for stability. The product list included SyncSort Backup Express, Veritas NetBackup and Tantia Network Storage Manager. Only two vendors--Veritas and IBM--met all our criteria and our requirements for reliable support and financial stability. Of those two, based on cost, we chose IBM's Tivoli Storage Manager (TSM), the successor to ADSM.
Several alternatives to the existing drives were considered, most of which were either technically insufficient or were highly expensive overkill. For example, StorageTek's L180 library offered more than enough capacity, but was out of our price range. Upgrading our mainframe's existing IBM 3494 robotic library with its small 2GB cartridges also proved to be too costly.
Of all the tape formats, we standardized on LTO mainly because it is a non-proprietary technology. Although there have been some incompatibilities in the early implementations of LTO specifications, we believed that market pressure will make vendors resolve the problems.
Two LTO drives would have provided only a slight increase in throughput over the eight IBM 3490 mainframe drives. However, three LTO drives provide a 68% throughput increase over the old configuration and a 25% edge over the initially proposed Exabyte 8mm AME drives. Additional drives may yet be required as the open systems server farm grows. The same problem would have to be addressed regardless of the backup server platform. By using a smaller number of newer drives, our hardware maintenance costs were cut by more than half, from $18,000 to $7,230 annually.
The increased tape capacity is one of the project's greatest benefits. The results have been dramatic. The LTO tapes have a compressed capacity of 200GB compared to our mainframe tapes' 2GB storage. While each new LTO tape costs about 15 times the price of its mainframe counterpart, the hundred-fold increase in capacity meant a dramatically lower media cost, from $4 per gigabyte to 60 cents per gigabyte. The number of onsite tapes went from 1,660 manually mounted to 19 in the robot. Only 16 tapes are needed for off-site disaster recovery copies.
One of the most important benefits of reducing the number of tapes is the reduction in labor required for the library maintenance and off-site backup. The operators only take and retrieve two to four tapes to and from the off-site vault each day. The effort needed to keep track of the tape cartridge inventory onsite was eliminated. The added work to track the off-site inventory is minimal. This also reduced the probability that cartridges will be misplaced.
After reviewing other LTO units, we purchased an IBM 3583 LTO library, which is essentially identical to the ADIC Scalar 100. IBM and ADIC cooperate in the manufacture of these units. The purchase was part of a larger arrangement encompassing a mainframe upgrade.
Our library came with 42 tape slots and a 12-slot I/O station for removing and recycling off-site tapes. The 8.2TB capacity of the LTO library will provide ample storage to triple our current open system production disk base. The LTO library only takes four sq. ft. of floor space and can operate in normal office temperatures and humidity. Replacing the mainframe drives freed 20 sq. ft. of valuable raised-floor real estate.
How we chose our libraries
We had three objectives. They were: save money, enhance support in our open systems environment and choose a stable vendor with good service. The product we used had to meet five criteria:
- We took into account which platforms it supported.
- The platform for the backup server had to be robust enough to handle our 25GB nightly load from 40 clients.
- The client platforms had to include any of our current operating systems (NetWare, AIX, Windows NT and Windows 2000).
- We required clients for any OS we were likely to support in the near future, such as Linux or AS/400.
- The database agents supported (Hobart required agents for the DB2, Oracle, MS-SQL and Lotus Domino databases).
Backup software synergies
The backup software costs were also reduced under Unix. An upfront purchase cost of $35,000 netted a reduction in maintenance from $19,000 to $5,800 per year. Staying with the same product line allowed us to retain experience and maintain continuity. The TSM backup server's administrative interface is extremely uniform across platforms. TSM Windows server administration used in training classes is essentially identical to the administration under the z/OS and AIX operating systems. On our clients, the only configuration change was to the name of the backup server.
The hardware and software installation and initial client tests were completed in one month. All of the remaining clients were converted the following month.
Most enterprise backup products provide scheduling. In addition to scheduling backups, this controls tape recycling, making off-site copies and flushing disk cache. Using time-released commands can also provide simple monitoring, such as an hourly count of active sessions.
By implementing TSM's hierarchical storage structure, we took advantage of available disk space on the Unix box to cache smaller backups and migrate them en masse to tape. This allows more backup sessions to run concurrently and reduces the number of tape mounts.
Using an enterprise-class product can also provide disaster recovery management. This should at least provide an off-site inventory and track tapes in transit to and from the off-site location. More sophisticated implementations can contain a database of hardware system descriptions for disaster recovery and can generate detailed instructions for recovery at a new location.
Additionally, most enterprise backup products offer retention options, such as classifying files by drive, by directory, by extension or by using generic characters.
Since we switched to the open system Unix solution, our daily backups are running almost routinely. The automated Cron scripts and tape robotics have helped cut errors tremendously. In fact, problems are becoming rare.