Is it really a disaster?
The first step in an effective disaster recovery plan is defining what constitutes a real disaster.
What happens when you say "disaster recovery" in a crowded room? Everyone thinks of something different. That's because the term--like so many others used in the IT world every day--lacks precision. At the very least, we need to clearly define what we mean by disaster recovery.
Thankfully, not all disasters are created equal. I once stepped on the power cord for the storage array I was managing. It popped out of the socket, crashing the array and taking down a data warehouse in the middle of the day. You can bet I called that a disaster! But the mainframe staff didn't even notice.
Because all disasters aren't equal in importance, it can be hard to decide just what type you're dealing with when an outage occurs. The first thing a company has to do is decide what constitutes a disaster. In general, if something is localized in scope and time (like my fancy footwork), we call the response an operational recovery. This includes outages of just a few systems. But if the number of affected systems and the timeframe for recovery is sufficiently large, it constitutes a real disaster.
It's critical to differentiate between operational recovery and disaster recovery because the tools and techniques used in each situation can differ significantly. Many IT systems are designed
Requires Free Membership to View
When you register for SearchStorage.com, you’ll also receive targeted emails from my team of award-winning editorial writers. Our goal is to keep you informed on the hottest topics, the latest news and the biggest challenges you face as a storage professional today.
Rich Castagna, Editorial DirectorOne of the most effective is to create frequent disk-based snapshots of the running environment. Many operational outages are related to data loss or corruption, which wouldn't be prevented by a high-availability system architecture. By using disk snapshots, you can quickly revert to a previous version of the system state. Continuous data protection technology is a newer method for dealing with operational outages, allowing for a much more fine-grained selection of the recovery point.
When enough systems are affected by an outage, it may be time to declare a disaster. We've recently seen a number of full-scale disasters caused by weather, power failures and terrorism. In these cases, the affected companies determined that it would take too long to attempt to recover the systems in place, so they began recovery operations at remote locations. Remoteness provides the greatest challenge for disaster recovery: How can a company ensure that a complete copy of its critical data will be available for recovery given the high cost of wide-area network bandwidth? This is the key technical challenge addressed by most disaster recovery products from replication to wide-area file services.
This was first published in December 2006