State-of-the-art disaster recovery
MasterCard International, the global financial processing company, has been following a formal business continuity program since 1990. Over the last 15 years, it has continually evolved its program as needs and technology capabilities have changed, reports Randy Till, vice president of business continuity. Today the company operates two primary data centers and multiple secondary data centers. Its critical debit card operation is run concurrently at both primary facilities, a coprocessing strategy that provides 100% redundancy.

    Requires Free Membership to View

In the event of a problem at one facility, the other continues processing with no discernible impact on the operation. For other key apps, MasterCard duplicates data from one location to the next as frequently as every hour. Eventually, most of its top critical applications will be protected through coprocessing. These systems will have to be rewritten to take advantage of the latest coprocessing technologies.

The company relies on business impact analyses to determine how to protect each system, and has established three tiers of protection. Tier 1 addresses the most critical systems. But even within tier 1, MasterCard has defined four levels of protection based on recovery point objectives. These subtiers are for immediate, zero to four hours, four to 24 hours, and one to three days recovery. Most problems, Till points out, are fully resolved within 72 hours and the organization resumes processing on its primary systems. The other tiers address longer term outages that might require more and different equipment.

The company tests its business continuity capabilities in April and October. Every tier-1 system is tested at least once each year. A test may include as many as 40 tier-1 systems. MasterCard strives for end-to-end testing that tests not only the tier-1 system, but all the other systems it depends on. Tests typically run almost a week, and involve 50 to 70 people. MasterCard is using EMC TimeFinder software for local storage replication and EMC Symmetrix Remote Data Facility software for remote replication.

-Alan Radding

Virtualizing storage arrays
For a large southern U.S. manufacturer, DR recently became a primary issue because of regulatory compliance. Before the regulations, the manufacturer thought of DR protection as nothing more than backing up to tape and shipping the tape offsite. But before it could implement a better DR system, it had to solve a difficult problem between its VMware servers and IBM Corp. ESS (Shark) storage systems.

To get around the Shark's inability to dynamically allocate storage to a logical unit number (LUN), the manufacturer carved its Sharks into 4GB LUNs. It then used the server's volume manager to aggregate LUNs as needed for applications. The workaround was successful until it was moved to a VMware-based environment.

The problem arose when the storage requirements on the VMware servers increased to 2TB. VMware is limited to 128 LUNs, which is more than sufficient in most circumstances. But the Shark workaround was now a roadblock: 128 LUNs multiplied by 4GB per LUN equals a maximum of 512GB--barely one-quarter of the new requirement. The manufacturer could have carved the Shark up again, but that would have meant migrating all the data from the Shark, re-initializing and reformatting the Shark and then migrating the data back--a process that would have been disruptive and time consuming. Another option would have been to rip out the VMware environment and to go back to one server image per platform, but that would have incurred dramatic disruptions and high costs.

The company thought SAN fabric-based virtualization might solve its primary problem and, as a side benefit, provide a cost-effective DR solution as well. It evaluated solutions from DataCore Software Corp., FalconStor Software Inc., IBM and Troika Networks Inc. It settled on Troika's Accelera and SAN Volume Suite (SVS), which includes Troika VMware multipathing fabric agent, StoreAge Storage Virtualization Manager (SVM), multiMirror, multiCopy and remote mirroring over TCP/IP. The SVS is primarily deployed in active-active pairs with each one connected to its own Fibre Channel (FC) switch or director.

The VMware multipathing fabric agent, part of SVS, resides on the Troika Accelera, not the VMware server. The StoreAge volume management, replication, snapshot, mirroring and data migration tools reside on the SVM appliance. The SVM appliance provides the virtualized LUN map to the fabric agent, which directs each VMware server's access to the physical LUNs. Replication over distance is done using TCP/IP from the SVM appliance.

The company experienced only one significant setback, which occurred when it was implementing the high-availability (HA) option. The administrator pulled a series of FC cables to prompt a failover. Apparently, the sequence of cable pulls revealed a bug in the failover code, which Troika then patched.

Another problem that occurred was attributed to operator error. The company had implemented StoreAge server agents on a number of server platforms, including Windows and Novell NetWare, before they were available as "fabric" agents. When new Novell NetWare servers were added, the agents weren't loaded on the servers. This let the Novell servers connect directly to physical LUNs (instead of the virtual LUNs) they shouldn't have had access to and data was corrupted. The error was quickly discovered and fixed, although it took much longer to correct the corrupted data. As a result, the company is looking to migrate all of its server agents to fabric-based agents to prevent similar errors. Implementation costs were approximately $120,000 MSRP for the HA configuration; ongoing OpEx is approximately $18,000.

Virtualization appliance eases replication
A manufacturer uses Troika Accelera and SAN Volume Suite for storage virtualization to VMware and remote replication.

The Troika SVS system solves the VMware/Shark dilemma by presenting 1TB virtual LUNs to the VMware servers. The company expects payback in as little as 12 to 18 months based on the savings vs. current costs for storage provisioning and offsite tape storage. The estimated costs don't include the savings from disk-based DR. The Troika SVS will reduce disk-based replication costs from approximately $90,000 per TB to $10,000 per terabyte by allowing the replicated data to reside on a lower cost disk system such as IBM's DS4000, Nexsan Technologies' ATAboy or EMC's Clariion.

This was first published in May 2005

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: