This article can also be found in the Premium Editorial Download "Storage magazine: Backup overhaul: From a mainframe to an open-systems environment."
Download it now to read this article plus other related content.
|Fixing mainframe Tivoli Storage Manager backup problems|
Phase I: Assessment
L.L.Bean's direction with open systems is to move to Linux, but the mainframe is there to stay. The new backup and recovery approach will encompass all company data, open systems and the mainframe.
"The assessment phase was key to the whole project," says Rideout. It involved evaluating the existing backup and recovery infrastructure with a focus on mainframe backup operations and the open-systems TSM environment. "We spent several months just gathering data," she says.
This entailed interviewing mainframe and open-systems people, using the TSM reporting tool to capture data at different time periods and analyzing mainframe SMF records for various time slices. The team was surprised by what it learned. "One problem we had in the summer was running multiple TSM instances on the mainframe to back up portions of the open-systems environment," says Rideout. That created contention for CPU. If it was a problem in August, it would be far worse when business peaks in November and December. "CPU becomes a hot commodity then and there's a lot of contention," she adds.
The team found other troubling issues. For example, all file and application servers are treated the same in terms of recovery time objectives (RTOs). The company was backing up and retaining data for the same amount of time whether it was a production system or a development system. The team also discovered that TSM backups averaged only 8MB/sec, whereas the native speed of the IBM TotalStorage 3590 enterprise tape drive should be 12MB/sec. Restoring data was equally slow.
Although the team generally liked how EMC's SRDF performed synchronously replicating mainframe data, they had concerns about SRDF's tendency to propagate mistakes in the data and its inability to provide logically consistent data. In the event of a rolling disaster such as a security breach or code error, SRDF would replicate bad data to the remote site.
This was first published in April 2007