Home > Storage Magazine > Features > Recovering from the WTC: a personal account
EMAIL THIS LICENSING & REPRINTS
Storage Magazine

  CURRENT ISSUE  

  FEATURES  

  TOOLS, TRENDS & ANALYSIS  

  COLUMNS  

  ARCHIVES  

  SUBSCRIBE/RENEW  
 

Recovering from the WTC: a personal account
by Darryl Brooks
Issue: Jun 2002
printer-friendly
licensing & reprints
< PREV PAGE   |   1  |   2  |   NEXT PAGE  >
It was shortly after the world trade center towers collapsed that I stood in the lobby of an undisclosed location peering across the Hudson River at what used to be the World Trade Center complex. In its place were blaring lights and huge tractor trailers that were aiding in the search and recovery effort for victims. I was part of a different type of search and recovery effort being lead by Legato Professional Services. Our mission was to establish a backup and recovery infrastructure that would allow one of the world's largest brokerage firms to find and recover their data, while at the same time, continue to meet SEC regulations regard-ing data protection. During this engagement, I witnessed the trials and tribulations this firm experienced during what must have been one of the most strenuous times in U.S. history. Here's my report.

The firm's data center was located in the WTC complex, resulting in the destruction of their local computers and storage due to the dust and debris that fell around the WTC towers. In addition to any live data being destroyed by the collapse of the towers, and because the off-site vendor hadn't yet arrived that morning, the previous night's backup tapes were still in the building, now considered a crime scene.

For those applications that were deemed mission-critical before the attack, data had been mirrored using a Hitachi SAN. Thus, those applications were up and running within hours of the collapse. For the many other applications, however, we had the difficult task of making the necessary data available to each business unit looking to restore its data.

Supporting infrastructure
Initially, we needed to get duplicate hardware and software to reconstruct the destroyed production environment so recovery could begin. The supporting vendors (Legato, Hitachi, StorageTek, and Sun) were all great in providing the necessary pieces of equipment. After procuring the hardware to rebuild the backup and recovery infrastructure, the process began by recovering the firm's six Legato NetWorker servers: four for recoveries and two for continued backups. These servers were Sun Enterprise 6500s with 8GBs of memory, four CPUs and direct-attached StorageTek L700 libraries using DLT tapes.

Because the firm used DLT drives, we experienced severe performance problems loading and unloading hundreds of tapes used during the recovery. DLT drives are great once the tape has been loaded into the drive, but they cause problems with an unusual amount of tape mounts, such as in an enterprise-wide disaster. This problem was exasperated by code in the binary command responsible for loading and unloading the tapes, which needed to run atomically. As a result, the multiple requests for loads and unloads during the many recoveries caused several delays. These delays aren't exclusive to DLT tape drives - any tape drive that is designed for increased capacity instead of speed would yield the same dismal results.

Additional problems rapidly surfaced: In our recovery operation, the Legato NetWorker servers weren't only responsible for mounting tapes and updating the indexes, they were also responsible for moving data between the tape drive and the recovering client system. Although it should be understood that a disaster of this magnitude couldn't have been imagined, and getting the recovery systems up and running as soon as possible was at the forefront of everyone's mind, the deployed design negatively impacted the established service level agreements between the recovery team and the business units that they supported.

A better solution would have been to station storage nodes between the recovering clients and the NetWorker server. In such a configuration, the storage nodes would have been responsible for the movement of the data, freeing the NetWorker server of all of the hardware interrupts associated with opening and closing the tape drive and NIC card interface. Thus, the loading and unloading of tapes would have proceeded more smoothly.

Nonetheless, recovery of the six Legato NetWorker servers was completed without incident. Each took approximately five to six hours once the hardware was set up properly.

Lack of consistent IP connectivity
Most of the requested recoveries were completed without incident. Many of the ones that did fail, however, failed because of the lack of consistent IP connectivity and name resolution. For one reason or another, client systems were randomly dropping off of the network. Luckily, the firm's day-to-day management practices included a set of nicely written scripts to test the functionality of the managed client before executing a backup.
< PREV PAGE   |   1  |   2  |   NEXT PAGE  >





TechTarget Storage Media
Storage Magazine View this month\\'s issue and subscribe today.
Storage Decisions Apply online for free conference admission.
SearchStorage.com
HomeNewsMagazineTopicsLearningMultimediaWhite PapersBlogsEventsAbout Us

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2000 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts