Problem solve Get help with specific problems with your technologies, process and projects.

Better data availability for the masses

A few basic rules of thumb for better data availability. Lessons learned from the terrorist attack on the WTC.

Better data availability for the masses
Commentary by Steve Duplessie

Unfortunately, it took a heinous act to bring this topic to the forefront of our IT consciousness, but it's still a topic that needs covering. Fortunately you'll hear stories soon about how things worked perfectly -- in the face of disaster, hot sites came up and business kept running. There will be some other stories of ultimate failure, but those will hopefully be few and far between.

New York is the economic capital of the world. As such, it has always been the primary place where DR (disaster recovery) and BC (business continuance) have been more than fancy terms. In NY, they have been the beacons of technology, allowing for the ability to withstand massive failure and continue operations.

There were hundreds of businesses located in the World Trade Center that continue to operate -- even though there is no more World Trade Center. This article speaks to everyone about ways to improve overall availability.

We've already heard a lot of stories where large shops running EMC's Symmetrix Remote Data Facility (SRDF) were put through the ultimate test -- and came up exactly as planned. There are dozens of Sungard, Comdisco and IBM disaster recovery clients that were back in business within a day of the disaster. Many more experienced only minutes of downtime. (We hope that this horrific event will give pause to those mid-tier companies out in Kansas.)

Some general rules-of-thumb for DR and BC planning
For most users, the concept of DR has been equated to massive cost structures. I'm here to tell you that every organization can improve their ability to deal with a major outage, without breaking the bank.

The first thing (really the only thing) to understand when looking to recover from an outage is recovery itself -- meaning all of this is only worthwhile if you can actually recover good data. That gets me to backup -- the first line of typical defense. We deal with local data loss by restoring from tape. We hope the stuff on the tape is valid.

Here are a few basic rules that can help you to better achieve data availability:

  • Rule Number 1: Make sure your backups work. For that, the Enterprise Storage Group (ESG) likes a small player like Bocada. Bocada makes a cross-platform backup reporting tool. Tape backup is good for both local restores, and off-site disaster recovery. (If you back up garbage, you'll restore garbage.)
  • Rule Number 2: Implement local clustering for high availability. We've always been huge fans of this strategy at ESG, because it mitigates the need to recover from tape at all. Legato and Veritas are still the Unix kings here. NSI and Availant are up and coming. Most large shops cluster some applications in one fashion or another, but the mass market has still not caught on. Mainframers, it seems, have always been ready to deal with events such as this.
  • Rule Number 3: Think about replication schemas. The real key is to be able to recover from any local outage instantly. This is where replication comes into play. EMC has shipped over 15,000 SRDF licenses to the big guys. These are the folks who can afford the large up-front equipment, software, and telecom costs. What most people don't know is that you can get into a multi-site data replication schema for very low cost. NSI has shipped over 20,000 licenses of its DoubleTake product -- and no one has ever heard of them. Users can pick up the ability to replicate critical data off-site (anything-to-anything, anywhere, anytime) for only about $2,500 per server -- and use whatever connectivity they have -- whether it's on a VLAN or the Internet. Is this 100 percent perfect? No. Is it significantly better than 99 percent of the shops we talk to? Yep. Legato has products via it's Octopus acquisition. Veritas has stuff. Cost should not be the reason you do nothing.
  • Rule Number 4: Consider restore speeds. Most outages are simple pilot errors. Someone deleted something they shouldn't have, and that forces a restore. It's common sense that if the restore comes from disk, it is much faster than tape. There are products about to be unleashed on the market that promise faster and more accurate restore -- from folks such as Avamar, Atempo, and Neartek. We've recently been pushing people to look at disk-to-disk backup as primary, and tape as secondary, simply to enhance the speed at which you can restore. iSCSI is set to play a role in this arena.

You can take either the "no-cost" or "low-cost" approach to DR
I'm not advocating that you break the bank in order to build a better infrastructure. I'm simply saying that there are probably no-cost ways to improve (such as, evaluating your overall processes), as well as low-cost ways. I'm also still saying that if you can afford to be totally bulletproof, you should be.

The point here is that DR or Business Continuance is not only for the mainframe shops. It is for you. You already run multiple servers. You already have multiple locations, most likely. All you need is a little thought and a little insurance money in order to dramatically improve your ability to deal with an outage -- whether that outage results from a simple accidental deletion or a major catastrophe.

Other advice?
Here are a few other suggestions to consider:

  • If you don't have a second site, find a sister company or a supplier, then replicate to each other.
  • Use a storage service provider (SSP) company to provide you with a DR site at a very low cost.
  • Whatever you do, do something to improve your systems.

Make it better. Our economy -- and the economy of the world -- is based on business. You are the business. We count on you. So, be there for us.

Additional Resources:
* For more information about any of the companies mentioned in this commentary, run a searchStorage TargetSearch.

* Check out this recent searchStorage Q&A interview with NSI Software's CEO about customer demand in the wake of terrorist attacks.

* How many terabytes can one person manage? Check out this searchStorage Administrator tip for some answers.

About the author: Steve Duplessie is the founder and senior analyst of Milford, Mass.-based Enterprise Storage Group, one of the storage industry's key independent authorities on the enterprise-class mass storage market. His role as advisor to the A-list of storage vendors gives him unique insight into the breadth of storage solutions available. Prior to ESG, Duplessie founded Invincible Technologies Corp. (ITC), a high-end NAS and clustering vendor, which he sold in June 1998. Before that, Duplessie held senior management positions with other storage companies, including Clearpoint Research, and EMC, where he began his career.

Dig Deeper on Storage management tools

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.