Managing and protecting all enterprise data


The networked storage project: Getting started

If you're just getting started, check out how West Virginia University built its first SAN to handle e-mail.

What we learned
Manage your resources. More often than not, the difference between a successful project and a failure comes down to project management. It's the one tool often overlooked in the designer's arsenal. Experience has shown that successful projects are managed projects. Therefore, from day one, we assigned a project manager who had the responsibility for project coordination, inter- and intradepartmental communications and expense control. He used Microsoft Project 2002 to chart our milestones and critical paths on a PERT chart. This was an import asset to us in keeping on schedule, assuring that everyone stayed informed and making sure that the SAN was implemented properly and economically.

Watch the bottom line. Heavy duty storage systems are attractive, but so is the tendency to add on additional features, capability and thus cost. You can avoid this by having a clearly defined budget for the project and then sticking with it. Project managers who complete large projects on time and within budget usually live another day to manage other projects.

Training and more training. Fibre Channel (FC) has LAN-like characteristics. It has switches like a LAN, but don't be fooled-FC isn't a LAN. It requires specialized knowledge that network administrators typically don't have. That means training, so be sure to include training as part of your SAN project budget. Dell initially provided some FC training, but the hours included in their proposal were a bit on the anemic side. We asked Dell to increase both the duration and number of people who could be trained on FC at their headquarters in Austin, TX. They agreed and did so.

Know what you're getting. FC isn't new, but it was new to us. We wanted to learn as much as possible and the best way to do that is to ask questions. For that reason, we weren't the least bit hesitant to fire off a series of questions to our vendor about the FC switches and the SAN.

Here are a few examples of our questions:
  1. Does the FC design use a 1Gb/s or 2Gb/s switch?
  2. Does the switch allow for nondisruptive code updates?
  3. Does your SAN design use inter-switch links?
  4. How do you provide redundancy?
  5. Are you using redundant HBAs in the servers? Are your switches dual-homed?
With the right questions, good answers and both of your eyes open, your FC installation can be carefully planned, well-implemented and bring needed storage capacity to your enterprise.
A storage area network (SAN) for e-mail?" I immediately recognized the questioning look on my colleague's face. It wasn't so long ago that I felt same way myself. Common wisdom says that SANs are intended to house large amounts of data, typically terabytes--massive storage demands not generally associated with e-mail systems. Consequently, it was unusual to use a SAN for an e-mail system. But today things are different--users routinely send multimegabyte file attachments with their e-mails. In addition, they expect their e-mails to be retrievable--attachments and all--for something close to forever. This means that the modest storage needs of yesterday's mail systems have all but vanished. Today's e-mail systems need to have robust storage capacity, and that usually means you need to have a SAN. And besides an enormous amount of storage, a SAN promised to provide another benefit to West Virginia University's (WVU) Novell GroupWise e-mail system--fault tolerance. Last year, I would have said that e-mail systems weren't as mission critical as, say, our Oracle database system or our enterprise file servers. However, a series of e-mail system outages in late fall of 2001 proved differently. A backlash of angry complaints filtered all the way up to the president's office. When the CIO made GroupWise stabilization my highest priority, it became abundantly clear that e-mail had become mission critical to the daily operation of our university.

Selecting the vendor
Through an informal request for information (RFI) process, we selected three vendors-each already had an established presence on campus. They were: Compaq (now HPQ), Dell/EMC, and IBM. All were asked to tell us their SAN story. These vendors made our short list because, besides being known entities, the procurement process is much easier and faster with vendors who have existing state contracts. Our first meeting was an introductory session focusing on the high points of each vendor's SAN offering. Follow-up meetings with each vendor probed into deeper technical detail.

We assembled a selection team consisting of members from the GroupWise services and LAN services units to rank the three vendors. At first, we planned to have the winning vendor provide the SAN and use Dell PowerEdge servers for the GroupWise systems. Dell has a partnership with WVU and has traditionally been the server vendor of choice for the campus. However, it quickly became evident that if we were to have, say, Compaq as our SAN vendor and Dell as our server vendor, we would wind up placing Compaq Host Bus Adapters (HBAs) into Dell servers. While there wasn't anything technically wrong with that, we wanted to avoid any likelihood of vendor finger-pointing if something didn't work as advertised. As a result, we decided to go with a single vendor. Whoever we selected to provide our SAN would also provide our servers.

To iSCSI or not
Next, we had to decide which storage technology would be best for our environment. While we could use traditional direct-attached storage (DAS), using iSCSI and network-attached storage (NAS) was attractive, especially since our Network Operations staff already knows IP and both iSCSI and NAS ride over Ip. There was also iSCSI's throughput advantage. Fibre Channel (FC) operates at rates up to 2Gb/s--clearly a decent speed, but far less than the potential iSCSI rate over 10GbE. We thought that would make iSCSI ideal for adding storage.

When we investigated more closely, we learned that while it certainly has great potential, iSCSI standards aren't expected to be in place before the end of the year. Since WVU's network operations are highly standards-based, we decided that it wouldn't be sensible to include a pre-standard version of iSCSI.

NAS was an attractive option for off-site mirroring. We could use it over our campus backbone network instead of tying up dedicated fiber. Still, we were uneasy about potential latency and bandwidth issues over Ip.

We were also concerned with system stability. We wanted a storage technology that had been around long enough to make reliability and support a no-brainer for ourselves and the selected vendor. At the end of the day, we decided to go with FC and incorporate NAS and iSCSI later.

Now that the storage model had been determined, Compaq, Dell/EMC, and IBM returned for another informal RFI session. During those sessions, we took a careful look at each vendor's SAN proposal. All three vendors supplied what we felt to be a credible solution for our storage needs. The decision came down to whoever offered the lowest priced and the most robust solution.

We looked at the cost figures trying to match apples to apples. IBM's quote $285,000 was the highest of the three vendors, partly because their solution required 20 additional drives. Compaq's quote of $265,000 was marginally higher than Dell's. However, Compaq's solution used single FC cards and only one FC switch. The Dell/EMC proposal was in the same ballpark at Compaq's, but included redundant FC switches and dual-homed HBAs. For that reason, the decision was made to go with the Dell/EMC solution. Contractual issues with Dell prevent me from discussing their exact price. However, the entire GroupWise project had a budget of $350,000. That includes the SAN, two FC switches, the GroupWise servers and consulting fees. The SAN is 1TB. We expect to add another 500GB to 700GB by the summer of 2003.

Second phase-design considerations
We felt that we had adequate experience with GroupWise to do our own system design, but the SAN design was best left to those who knew FC and SAN arrays better than we did. So the SAN design was parceled out to Dell/EMC, while the GroupWise system design was kept in house (See "SAN design"). We are using EMC's ControlCenter Navisphere to monitor, provision and report on the storage systems from a Web browser.

West Virginia University's GroupWise SAN design

West Virginia University's GroupWise 1TB e-mail SAN consists of 5 clustered post office servers--all running NetWare 6. Each post office and the SAN are dual-attached to the redundant FC switches.

For resiliency, we wanted to cluster our GroupWise Post Office Agents (POAs). In GroupWise, the POA is responsible for delivering messages to each user's mailbox. If the POA is down, all users on that Post Office are offline. The POA is also the place where user mail is held, including those storage-eating attachments. Consequently, not only was the POA critical, it also required the most storage of anything in the GroupWise system. That made it a perfect candidate for the SAN. The other elements in the GroupWise system such as the mail domains, the GroupWise Internet Access Agent (GWIA) and Web Access agents were given local storage and weren't included in the SAN.

Our next decision was whether to deploy Windows 2000 Advanced Server or NetWare as the underlying OS. GroupWise runs under either OS, so we didn't have an application issue, but we did have a SAN issue.

Although SANs are OS-agnostic, different operating systems manage clustering in different ways and that affects the SAN design. For example, Microsoft Windows 2000 Cluster Services uses a Quorum file to track the status of the cluster and applications participating in the cluster. Usually a 1GB or 2 GB partition is assigned to one RAID array for each Quorum file. The more server clusters included in the design, the more Quorum files were required. Obviously, that impacts the amount of raw data storage available on the SAN.

On the other hand, NetWare doesn't use a Quorum-like file. This results in slightly more raw storage per dollar. However, unlike Microsoft's Windows 2000, Novell's NetWare is relatively new to the clustering scene. That made it difficult for us to reach consensus on which OS we should deploy. After weighing our options, the selection team had a slight preference for NetWare. We decided to make the call for NetWare 6.x as the OS of choice.

Next, we had to determine the SAN configuration. Of course, there are several different levels of RAID that can be deployed in a SAN. Each RAID level has a different drive distribution pattern that affects both performance and redundancy.

Initially we had planned on using RAID-5, but we received a strong recommendation from a storage engineer to use RAID-10. We were told that there would be a performance advantage to RAID-10 over RAID-5. That was the upside. The downside was that the RAID-10 drive configuration would require us to increase the number of drives for the same amount of storage, and that would increase the cost. We weren't against this if it would be beneficial and wasn't overkill for an e-mail system. So we did more research.

More Information
Dell published an article on RAID level selection. The article is informative and doesn't talk about specific products. It provides a good explanation of the SAN design decision-making process.
Dell also published an article "Assessing the Reliability of RAID Systems" by Abraham Long, Jr.
An interesting, but somewhat biased comparison of RAID 5 and RAID 10 can be found here.
A list of books and articles on storage networking is available on the Storage Networking Industry Association (SNIA) Web site.

What we found discovered is that RAID-5 is optimized for disk reads and takes a performance hit primarily on writes because RAID-5 uses parity calculations to assure data integrity. This causes an extra write and read cycle in order to get the parity data on the disks. Since RAID-10 consists of multiple sets of mirrored drives, it doesn't have this overhead. Therefore, an application that's write-intensive would perform more poorly on RAID-5 than it would on RAID-10.

We measured read/write averages over a 24-hour period on three of the busiest Post Offices in our current GroupWise system. Under normal operation, there were 200 to 400 reads per second. Writes averaged only one per second. Based on the read and light write averages, we felt that we could tolerate a slight write performance hit. Therefore, we went with RAID-5.

Third phase-the implementation
While Dell and EMC were putting our order together and building equipment, we got busy with a myriad of pre-installation implementation details. The design for the GroupWise system had to be finalized and run by Novell Consulting. We needed the proper power connectors installed by our physical plant for the SAN, and the 20 GroupWise servers and then we had to implement backup power and add network connections. With the Information Systems department, we had to map out floor space for two additional racks in an already overcrowded data center. But we were underway and becoming more excited about adding a robust, fault tolerant storage solution for our e-mail system.

Web Bonus:
Online resources from "Chat Q&A- NAS: 2002 and beyond," by Randy Kerns.

Article 11 of 21

Dig Deeper on SAN technology and arrays

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Storage

Access to all of our back issues View All