This article can also be found in the Premium Editorial Download "Storage magazine: Expanding SANs: How to scale today's storage networks."
Download it now to read this article plus other related content.
One of the main objectives of an organization developing a disaster recovery (DR) plan is to make the plan specific, yet simple enough so that a storage or system administrator who isn't familiar with the organization's day-to-day processes can walk into the recovery site and restore operating system and application data without the organization's IT staff being available for consultation.
That notion was put to an extreme test after Sept. 11, 2001. Consider Cantor Fitzgerald, a financial services provider that lost more than 700 employees in the World Trade Center (WTC) disaster. Many of those employees were the IT personnel. For any organization put in that position, the resulting question is: "Can your applications survive this kind of human loss during a disaster?" I recently gained some insight into that question and more during a DR experience with Sungard Recovery Services in Philadelphia on the second anniversary of the WTC disaster.
Sanology Inc. was called by a partner consulting firm two days before the DR test at Sungard to provide a resource for one of their clients and to fill in for a departing consultant who had been working on the project for months. The DR test was going to span the anniversary date of Sept. 11, and I was wondering how this large insurance company would fare in the recovery of its data on such a momentous date, considering that terrorists have a history of reminding their enemies of significant dates with further attacks on
Being a curious creature, I took the assignment and discovered that in spite of my concern about the significance of the Sept. 11 date, the Sungard recovery site was not exactly jumping that day with IT organizations exercising their DR plans. This was interesting because we are a computer- and network-dependent nation that needs to ensure we have fault-tolerant resources to power our applications should an attack on our infrastructure occur on the Sept. 11 anniversary date repeatedly.
Before arriving at Sungard, I had the opportunity to speak with some of the administrators involved in the DR exercise from the beginning. They provided me with such pertinent information as the recovery objective date, operating system types and levels, as well as the hardware and software configurations of the Legato Systems backup and recovery environment. Armed with nothing more than a diagram of the provided information and a contact name, I showed up ready to go.
The Legato recovery environment had already been set up by the insurance company's core staff and was composed of a StorageTek L700 library using eight 9940 tape drives provisioned over a storage area network (SAN) and discovered by two Sun enterprise-class servers acting as NetWorker servers. StorageTek's tried-and-true ACSLS software allowed the two NetWorker servers to share the robotic arm of the library and permit DMZ and non-DMZ applications to be recovered simultaneously.
After certifying the recovery environment and being introduced to the principles at the insurance company, I started fielding recovery requests without having intimate knowledge of the applications I was recovering.
While troubleshooting tape drive failures, I realized that the method used to connect the tape drives to the Legato servers wasn't known by the core staff. Apparently, when communicating their hardware needs to Sungard, the insurance company didn't specify a method (i.e., a SAN or direct-attached storage) in which that connection was to take place, only that they would need eight 9940 tape drives. But as long as there are no problems with connectivity or the storage device itself, there isn't a real need to know how the storage device is connected to the server, only that the amount, type and connection speed of the requested storage is in fact provisioned.
Sungard's approach to provisioning storage over the SAN for DR exercises could prove to be beneficial to your recovery efforts in the time it takes to ready your environment. Imagine large, multiple SANs composed of servers and storage that's being managed by a drag-and-drop application allowing administrators to simply drag and drop storage resources onto a server icon, while updating zoning and LUN masking information in the background. Automating a process like this could have your production SAN environment duplicated and primed for recovery faster than ever before.
This was first published in November 2003