WALTHAM, Mass. -- Users from two Boston-area medical centers attending the Storage Decisions Disaster Recovery...
Seminar with John Toigo said they've been through complex data classification efforts and are in the final stages of setting up replication to secondary data centers. But, even after careful planning, the users said, complex application interdependencies can still sometimes throw a monkey wrench into disaster recovery.
"Three years ago at a corporate retreat, IT directors did classification on all our data and decided the recovery time objectives and recovery point objectives for each," Passe said. The classes are quadruple-A, triple-A, double-A and A, with quadruple-A the highest priority. Passe said his department finally got budgetary approval this year for a secondary data center, for which it has already purchased two CX3-80 arrays and laid down a dark-fiber link for data replication.
In the intervening years since the data classification was done, however, Passe said the classifications, as well as the RTOs and RPOs associated with each, has had to be adjusted due to real-world prices for things like replication. "Triple-A applications, for example, went from less than five minutes data loss and twelve hours to resuming production to 36 hours of data loss and back-end service in 24 hours."
Also, complicating the issue, according to Passe, is the fact that some applications that have been designated on two different "tiers" are actually inseparable. "For example, payroll, a mission-critical application," he said, "It's done in our PeopleSoft database, and we can't just extract payroll-related data from the HR and financial data that's also in that database. If payroll has to be replicated, it all goes."
Meanwhile, however, the secondary site is one-third the size of the primary data center. "Within a year we may run into the reality of how much space is really needed," Passe said.
"I'm starting to get questions like, 'if we have replication, why do we need backup?'" Passe said. "It's getting cloudy, especially since so many of these tools do multiple things." Further complicating matters, Passe said, is the fact that he doesn't have the time or money to do a bake-off between products. "With any of these products, I really want to know what actually works -- sometimes features can be tacked on in multi-use products," he said.
Nervousness about new products, according to Passe, means he'll probably be keeping array-based replication tools in use, even if they're redundant -- and despite the further complication that introduces into the cost/protection equation. "We have to have a plan B," he said. "I won't shift entirely to a new tool until I'm convinced it will work reliably."
Passe said his organization is also trying to be careful to pick a tool that can be versatile if disaster recovery plans change. "Right now, we have our own DR facility with a dark fiber link, but what happens if we decide to switch to a hosted facility outside [route] 495 that we access over IP?" he said. Right now, RecoverPoint looks like the best fit for that situation, but it doesn't replicate system data, something IPStor does. "That's an interesting angle," Passe said. "But, IPStor doesn't have the market share RecoverPoint does."
MIT Medical Center: interlinked databases a concern
According to Alison Grice Knott, manager of information security and integration for MIT Medical Center, her department, too, has already tackled data classification efforts, a "tiered" disaster recovery plan, and established a replication scheme, though the second of two Compellent SANs holding the storage for replication has yet to be moved to the secondary facility across campus; for now, MIT Medical is replicating locally between the SANs for redundancy until the secondary facility is finished. A fiber link between the two buildings has also been established.
Meanwhile, the department's file shares are pre-classified according to job function: each physician, specialist and support staff employee is given a standard file directory on the department's shared storage; employees also have personal file directories on the shared storage, and keep no documents on their workstations, Knott said. End users at the medical center are generally good about keeping up with data classification schemes, but Knott said they sometimes need reminders, and the IT staff periodically has to "clean up" the file directory. "It's a never-ending process," she said. "We're constantly brainstorming ways to make it easier."
The file shares are sent using IBM's Tivoli to a central backup server belonging to MIT's main data center (otherwise, the medical center functions as a separate institution within MIT). "There we don't mind if there's a delay in recovery -- normally if a user file is lost, the user doesn't realize it right away," according to Knott. After 45 days, users understand their files are no longer available on the Tivoli system, she said.
Meanwhile, the medical center will be using the Compellent StorageCenter SAN's snapshots and replication to send data from mission-critical practice management and electronic medical records (EMR) databases. The databases are based on SQL, and Compellent's snapshots work with the hot backup options within SQL to make sure the databases are quiesced before a snapshot is taken.
However, Knott said, her concern lies in the databases themselves and the complex transactions that take place between them, which is not to mention the number of separate interfaces that carve up access to the databases between departments like the pharmacy, lab and radiology.
"At any given moment, a snapshot of one database might be out of sync with the other," she said. For example, a patient's prescriptions might be updated by a physician on the practice management system but not on the EMR system when a snapshot is taken. "There could be important information on its way from a local system to the central database that hasn't reached it yet, and we have no way of knowing that before we take a snapshot," Knott said.
The concern doesn't lie in the quality of the replicated data -- "it's a good copy if you have a disk failure or corruption… But, if everything is wiped out, I am concerned about having to rebuild entire systems from those snapshot copies." Knott said that for now, the medical center will be testing snapshots once the campus replication is set up, and studying the issue. "There may be some methods available to address this, but it's hard for us to devote time and resources to new systems."
Another take: bypassing the problem with SaaS
But, what do you do if you're a small business without the resources for a secondary facility? According to Robert Chadwick, IT manager for J. Calnan and Associates, Inc., a construction company located in Quincy, Mass., his company cut costs on primary storage by using Buffalo's TeraStation NAS product, which has 2 TB capacity and is priced at $999.00. Chadwick said that for disaster recovery, the company is looking into server clustering or long-distance mirroring for its Exchange environment, but for another mission-critical project management application, the company is using a Web-based program, AutoDesk Inc.'s ConstructWare, which also maintains a large portion of the company's document data. "It's made our DR job much easier," Chadwick said. "If we got into a disaster scenario, once we had email up and running, users could access it through Outlook Webmail, and also access their project documents on the Web."
Toigo: "DR needs to go away"
Toigo took his usual candid approach to discussing disaster recovery at the seminar, particularly when it comes to the role of the storage vendors. "DR as a separate discipline needs to go away," he said. "It should be a set of standards by which we build recoverability into our systems."
Until that time, Toigo had a few more tips for users: "Forget planning for specific scenarios," he said. "Focus on a meltdown situation, and build your DR plan so your systems can be recovered incrementally in less disastrous situations."
Toigo also had particular words of advice for those using offsite tapes and hot sites to rebuild in the event of disaster. "Pay attention to your contract" with a hot-site vendor, Toigo warned users, since some contracts specify only technical facilities, and don't guarantee the use of a specific facility. "So, if your primary hot site is overfilled, you might be shifted to another location, which breaks your affinity with the people who helped you through your tests and setup at your original location," Toigo said.
In the event a location switch of more than 50 miles takes place, Toigo warned, half-jokingly, "Don't put tape cartridges in the cargo hold of an airplane -- the environmentals are all wrong. But, a standard tape case is also not going to fit in the overhead compartment or under the seat in front of you, so plan to buy a couple of seats next to you for your data."