Managing and protecting all enterprise data


Too many SAN islands

One of the main challenges to growing SANs is the proliferation of independent SAN islands. We look at how and why a multinational financial services company consolidated many islands into larger ones, but stopped short of a single, unified SAN.

Four years ago, when business unit managers at the Royal Bank of Canada (RBC) Financial Group in Toronto began to understand the advantages of networked storage, storage area network (SAN) deployments at the $15.5-billion multinational financial services company gained momentum faster than Lance Armstrong accelerating down the Alps.

Putting its new gear to the test
In 1999, officials at Royal Bank of Canada (RBC) learned the hard way that compatibility testing is essential. The company set out to deploy its first storage area network (SAN), a tape SAN. Without first checking its tape vendor's compatibility matrix, RBC chose to use what Harold Durnford, the manager of systems-managed storage, now calls an "off-brand host bus adapter." But when Durnford's team attempted to put the shiny new SAN into production, it started experiencing throughput problems and frequent crashes.

"We'd have a failure every couple of days," says Durnford. "It got to the point that we'd have scopes and tracers running constantly. The only solution turned out to be to change the actual host bus adapter (HBA). Later, we discovered the device we'd chosen wasn't on the tape vendor's compatibility list. From then on, we tried to make sure someone else had tested the equipment first and that we did our own tests as well."

RBC set up its own SAN compatibility testing lab at a 70,000-square-foot data center in Ontario, approximately 90 miles northwest of the company's main data center in Toronto. The facility, with between 30TB and 40TB of storage at any given time, also serves as RBC's disaster recovery site. That enables SAN equipment testers at the site to get access to a complete set of the Windows and Unix servers configured identically to production servers. Storage devices and HBAs for testing are usually provided on a 90-day trial basis by vendors.

While no data center, the testing lab consumes the equivalent of three full-time employees, says Durnford. Currently, the lab is testing new StorageTek 9940B tape drives.

The first step in compatibility testing at RBC, according to Durnford, is to check equipment vendors' own compatibility testing results. RBC officials have learned to expect trouble when vendors say equipment "should" be compatible. But even when compatibility is guaranteed, RBC tests just to be sure.

The company has a standard menu of tests. They include:

  • End-to-end connection: Can the single host see all the disk? Are the sizes correct? Can it see the other host disk?
  • Cluster/failover: Does a clustering configuration (AIX or Sun, for example) see all of its own and other servers' disk?
  • Multipathing: Are all paths accessible?
  • HBA firmware and drivers: Can HBA and firmware at least connect and coexist?
  • Throughput: Do the HBAs deliver the rated throughput--1Gb switch or 2Gb switch--end-to-end?
  • Shared ports: What hosts (HBAs and drivers) can still function when sharing storage ports on a switch?
  • Software and firmware upgrades: Does all of the above still work after software is upgraded?
Testing has revealed that the most persistent problems have cropped up in multipathing software, says Durnford, particularly when RBC has attempted to run storage devices from more than one vendor on the same server. Often, multipathing software used by the different storage devices conflicts. In fact, testing has shown that it's better simply not to mix storage on a server, says Durnford.

RBC also has learned to retest whenever changes in firmware or software occur. For example, Durnford's team retested each cluster when, earlier this year, it upgraded from 1Gb to 2Gb switches.

Until then, rbc banks, brokerage business, insurance companies and other units had been saddled with a growing mess of direct-attached disk systems running on a wide range of servers--from Novell NetWare to IBM AS/400s. As the number of direct-attached drives--many of them IBM 7133 SSA serial disk systems--increased, RBC's far-flung businesses were having more and more trouble keeping up. When an application required more disk, provisioning took up to eight weeks. And because of the limits on SCSI bus cable lengths, many data centers were simply running out of room for more direct-attached storage (DAS).

So when RBC's central technical support division gave SANs the green light after having tested SAN firmware interoperability and issuing best practice recommendations, RBC's business units began switching to Dell, EMC and IBM SANs as soon as leases on older servers expired.

"We'd tell them, 'By the way, when your leases expire, you're ordering your next server with Fibre Channel cards. This is the firmware, the brand and literally, the part number to order.' And they did it," says Harold Durnford, RBC's manager of systems-managed storage who oversaw much of the switch to SANs.

SAN island proliferation
RBC's businesses complied with Durnford's recommendations perhaps just a bit too enthusiastically. It wasn't too long before SAN islands were popping up all over the place. And with there was no central IT authority to dictate when, where and under what conditions a new SAN should be deployed, RBC then found itself recreating many of its pre-SAN management problems, only this time it was because of SAN proliferation.

While RBC managed to restrict its Unix environment to two 256-port dual fabric SANs, its Windows environment was a different story. Rapidly growing and highly distributed applications such as Exchange e-mail were increasing total Windows-related storage by upwards of 100% per year. By early 2003, Durnford says, RBC found itself with upwards of 30 Windows-based SANs--many using 8- or 16-port gigabit switches--spread throughout the 30 countries in which RBC operates.

"On the Windows side, the proliferation of islands came about because the solution we were using would only provide for two terabytes per SAN," says Durnford. "Each application server might want one tenth of that, so after we got 10 servers using the SAN, with server No. 11, I'd have to start another SAN island." The result was that RBC was hiring more people to manage SAN islands and taking even longer to provision storage, says Durnford.

Now RBC is moving to end--or at least slow down--the SAN island proliferation. Durnford has put in place what amounts to a two-phase plan. The first plan for RBC is to use inter-switched links (ISLs) and larger, 2Gb switches to consolidate existing Windows SAN islands, while continuing to manage SANs supporting the company's different operating environments--Windows, Unix, mainframe and IBM iSeries--separately. In Phase 2, the company plans to use a common storage resource management (SRM) software tool to manage all of its SANs and may even build centralized storage utilities that remote servers could access using LAN-free ISLs running over dark fiber connections.

"Our initial exploitation of SANs was purely connectivity and volume purchase-driven," says Durnford, adding: "Now we've got to consolidate onto fewer SANs that we can view and manage more simply."

RBC is certainly not alone in attempting to stem SAN island proliferation by consolidating onto fewer, larger fabrics. According to estimates by Gartner Inc., SANs with a port count of 320 or greater will grow at a rate of 40% over the five-year period ending in 2007. That compares to a growth rate of 12% for SANs of all sizes over the same period.

Simplify storage provisioning
Like RBC, many organizations consolidating storage onto larger SANs are hoping they can simplify storage provisioning.

"Next to backup, provisioning is the number one issue that IT is dealing with. At most companies, it's in the dark ages," says Arun Taneja, founder and consulting analyst of the Taneja Group in Hopkinton, MA. "Usually, you've got the systems administrator, database administrator and the storage administrator trying to figure out why an application isn't running fast enough. By the time they figure out they need more storage and go through the process of buying and deploying it, what should have taken a few minutes, takes weeks. The move from direct-attached storage to SANs was supposed to fix that, and it has made things easier," says Taneja, adding, "but not easy enough."

In Phase 1 of RBC's SAN consolidation push, a first step was to establish standard SAN deployment, management and procurement processes and encourage the company's lines of business to use them.

Eighteen months ago, Durnford's group created a storage steering committee, which included IT and business representatives from each of the company's businesses.

The purpose of the committee, says Durnford, was to announce SAN management and consolidation best practices, such as when you're consolidating SANs; how and in what sequence you hook up the switches; how you document the changes and how you back up data during the change and restore it. "We wanted to make sure we shared our experiences so that they could financially benefit from them, avoiding costly mistakes," Durnford says.

And authority over storage still remained with the lines of business. But with business executives and IT managers on the steering committee, Durnford was then able to generate business buy-in for storage consolidation.

Interoperability testing
Among the key best practices that Durnford's group emphasized was the need to thoroughly test and verify the interoperability of SAN elements, including switches, directors, arrays, etc.--before putting consolidated SANs into production. When in 1999, the technical support division started evaluating SANs, Durnford's group tested interoperability of various devices at different release levels from different vendors. They soon learned to anticipate compatibility problems, says Durnford, even in cases where vendors had certified interoperability between their devices. In a consolidated SAN, interoperability testing is particularly important.

"It doesn't matter what the vendor tells you, oftentimes when you hook up two switches that are not of agreeable [release] levels, you'd get what providers like to call 'unexpected results,'" says Durnford. "In fact, oftentimes when we heard from the vendor that it should work, it was a clue to us that we needed to test it, certify it and lock down the firmware."

Centralizing information about what devices and firmware releases actually work with each other and disseminating it to the rest of the organization, says Durnford, accelerates SAN consolidation. The technical support division also developed standard change management procedures that should be used when deploying and consolidating SANs.

Just-in-time purchasing
As part of Phase 1 of its plan, RBC standardized its SAN hardware vendors and then centralized storage procurement with Durnford's group. On the Windows side, for example, RBC will use Dell Computer Corp., EMC and IBM gear. On the Unix side, it will be EMC and IBM. Boiling down vendors to just this group will greatly reduce compatibility problems as more SAN consolidation projects take place. And centralizing procurement allows RBC not only to make sure all groups are buying only from the approved list, but it will also allow for much speedier provisioning.

RBC has begun what Durnford calls "just-in-time" provisioning. What this means is that vendors ship switches and disk arrays to RBC just before they are really needed. Durnford's group then configures them and tests them in advance. Only once a SAN has reached an agree-upon threshold of consumed storage space is the additional disk deployed. And only then does the company actually pay for it. Using this approach, Durnford says, RBC has been able to cut the time it takes to provision new SAN storage from up to eight weeks to, in some cases, in only 24 hours. (For a look at a different SAN consolidation project also intended to simplify provisioning, see "AXA Group's major consolidation push")

Using these procedures and standards, RBC businesses have begun to consolidate their existing SANs. In one case, six older Windows-based SANs in three different sites--totaling 11TB--are being consolidated into three SANs. In another, multiple SANs supporting 50 Exchange servers deployed across two sites are being consolidated along with the application servers.

RBC is also consolidating the mainframe side, mainly just to reduce software license charges. Last year, the company consolidated the North Carolina data center supporting its United States-based Centura banking business to an existing Toronto data center. Another consolidation project is under way, this time RBC's West Coast data center is merging with its Toronto site.

For now, says Durnford, RBC plans to continue consolidating SANs within each operating environment without creating central SANs that can support servers running different operating systems. Currently, says Durnford, the company is shooting for creating standard SAN fabrics of 500 ports. Although there are few technical reasons to avoid creating consolidated heterogeneous SANs, RBC is wise to focus first on consolidating within operating environments, says Taneja.

"There is LUN masking software for switches and HBAs that allow you to consolidate storage for different operating systems onto one SAN fabric, but organizations should probably think twice before attempting that," says Taneja.

RBC's two-phase consolidation project
Phase 1:
  • Tested and confirmed interoperability of storage area network (SAN) elements, shared results with business units
  • Centralized SAN procurement and standardized on hardware
  • Began to move to just-in-time utility provisioning of new SAN hardware
  • Established standard SAN deployment practices such as documentation
  • Used inter-switched links and 2Gb switches to create fewer, larger SANs for Windows systems
  • Continued for the time being to separately manage SANs attached to different operating systems, such as Windows, Unix and mainframe
Phase 2:
  • Deploying standard Storage resource management software, possibly going with EMC ControlCenter
  • Evaluating whether to create central, shared SAN utilities using LAN-free links running over dark fiber

Phase 2 kicks in
In Phase 2 of its SAN consolidation efforts--now in the evaluation stage--RBC will begin to address cross-platform management and create storage utilities that can be accessed remotely. Before consolidating SAN storage from multiple operating systems, however, RBC officials want to find a single storage software tool it can use to manage all fabrics, regardless of the kind of servers or storage arrays they contain.

Currently, RBC is testing EMC's ControlCenter software package, which is being refashioned as a storage management suite capable of managing SAN elements from various vendors. If ControlCenter actually proves to be a capable heterogeneous storage management tool, it will replace the variety of hardware-specific storage management tools that RBC now uses, giving Durnford's group the ability to manage any SAN from a single console.

So far, says Durnford, RBC has concluded that, while ControlCenter doesn't provide the richness of information that a storage management tool native to a specific piece of hardware can, it may do enough.

"It looks like [ControlCenter] will provide a scaled-down view, the basic nuts and bolts stuff you would expect," says Durnford. "But some of the value add--details such as how the internal arrays of an IBM Shark are laid out, for example--I don't think we're going to see that in the product. The question is--is that information important enough, or can you go with a more generic view?" RBC is still evaluating that question.

Also under review is the idea of creating large storage utilities that could be managed centrally and accessed remotely. RBC is trying out the idea first on tape backup, providing remote access from a tape SAN to its Toronto-based Capital Markets business unit using dark fiber to support a LAN-free ISL. Capital Markets will use the link to tap into tape backup located in an RBC site about one mile away. If that works, says Durnford, the company will use the same approach to provide remote tape backup to a business unit 30 miles north of Toronto. And if that works, RBC will look at using dark fiber and ISLs to provide remote SAN disk storage.

"It depends on the proof of concept," says Durnford. "Tape is not very sensitive when it comes to things like latency, and disk is. We'll look at it, and based on what we see, we may be able to go back to one of the business units and say, 'If you need additional disk storage, why not use the switches and connections we already have in place for tape? The disk storage doesn't really need to be right next to your server. It can be over in our building.'"

The ability to build storage utilities that can be accessed remotely would take SAN consolidation to an entirely new level. But it won't happen overnight. Not only must the dark fiber connections and ISLs first be tested, but such large SANs will also require fast, 4Gb switches that are just now under development. And at RBC and many companies, it will require business unit managers to accept the idea that they can give up direct, local management of stored data without losing control of other risks.

AXA Group's major consolidation push
RBC isn't the only large financial institution driving toward storage area network (SAN) consolidation. $85-billion AXA Group, with corporate headquarters in New York City, has also launched a major push to merge SAN islands and direct-attached storage (DAS) into fewer, larger and more manageable SANs.

But while both RBC and AXA see just-in-time storage provisioning as a major benefit, they're going about consolidation in different ways. RBC is taking a phased approach, and AXA is mounting a frontal assault, simultaneously consolidating SANs in all six of its major countries of operation and pulling together storage for servers running Windows and a variety of Unix flavors.

Over the next 12 to 18 months, AXA plans to consolidate 15 SAN islands in six different countries to six SANs--one in each country. Coupled with a major server consolidation effort, the SAN consolidation project will enable AXA to cut overall storage needs--currently about 250TB in all--by between 40% and 50% by improving disk utilization, says Ron Roberts, global program manager for server/storage consolidation. By consolidating SANs and managing them in a consistent way around the globe, AXA will install capacity before it's needed, but pay only as it is used.

But why the decision to consolidate SANs simultaneously around the globe rather than in a less-risky phased fashion? Doing so will allow AXA to consolidate quicker and gain significant benefits sooner. "It is ambitious," says Roberts. "But in order to allow us to achieve the greatest savings in the shortest period of time, it's being parallelized."

AXA's recent acquisition history--including its 1992 purchase of The Equitable Companies--contributed to the SAN island proliferation, Roberts says. Until 2001, different units of the company had their own IT organizations and made their own decisions on storage. So AXA ended up with a wide variety of storage platforms and operating systems and a proliferation of servers and SANs. Besides the 15 SANs worldwide, AXA has about 5,000 Unix and Windows servers. The company is consolidating them and expects to end up with fewer than 1,000.

Groundwork for the consolidation efforts was laid in 2001, Roberts says, when AXA centralized control over its IT infrastructure into AXA Technology Services. Under CEO Leon Billis, the unit set out to consolidate and standardize IT infrastructure and to move to a just-in-time utility model for procurement under which AXA would neither purchase nor lease equipment but simply pay for capacity. Earlier this year, AXA signed a $1 billion "Infrastructure on Demand" deal with IBM Global Services.

On the storage side, the company has standardized on Cisco switches for its SAN fabric and on Hitachi Data Systems Lightning and IBM Shark storage devices. Some AXA country operations will use HDS devices, others IBM. Native IBM and HDS storage management software tools will be used. Overall, AXA will end up with 50% HDS storage and 50% IBM.

While AXA is attempting to consolidate SANs in its six operating countries at the same time, the company does plan to leverage best practices and lessons learned as much as possible between countries. Like RBC, AXA's IT organization has been performing compatibility testing and will share knowledge about what works with what.

In addition, AXA is running unique pilots in some countries. The results of those pilots will be shared with other countries. Three countries, for example, are currently testing a variety of application workloads to determine how best to architect both the SANs and consolidated servers. Results of those tests will be shared with all AXA's country operations via weekly meetings, workshops and collaborative Web sites, Roberts says.

Article 8 of 16

Dig Deeper on SAN technology and arrays

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Storage

Access to all of our back issues View All