The networked storage project: moving ahead


This article can also be found in the Premium Editorial Download "Storage magazine: Best storage products of the year 2002."

Download it now to read this article plus other related content.

Last month, I wrote about the rationale behind implementing a storage area network (SAN) for an e-mail system (see "The networked storage project: getting started,"). In the not-so-distant past it might have been unusual to see a SAN used

Requires Free Membership to View

for an e-mail system, but today things are different. Users routinely include multimegabyte file attachments with their e-mails. In addition, users expect their old e-mails to be available - attachments and all - for something close to forever. All of this means that modest storage requirements of yesterday's e-mail systems are history. Modern e-mail systems need robust storage capacity and typically, that means a SAN.

In part one, I also discussed how our team selected the SAN vendor (Dell/EMC), determined the operating system (NetWare 6.0) and worked through the myriad of considerations that went into the SAN design. In this installment, you'll find out about the various implementation issues we encountered as WVU's Network Services team took the GroupWise SAN system from a design concept to a fully functioning production system.

SAN training
After completing our vendor selection, determining the design and ordering the SAN, it was time to send our staff members to Dell's training facility in Austin, TX. Four of our staff flew to Austin for a week of SAN training. The reviews were generally good. Every person we sent felt Dell had provided first-rate SAN training. In fact, if there was any gripe at all - except for the sweltering August weather outside of Dell's training facility - it was that the Dell training lab was exclusively Microsoft Windows server-based. While we were there to learn about the SAN, our servers were entirely Novell-based and there wasn't a NetWare server in sight.

Project Summary
After all was said and done and the system was up and humming, we held a debriefing where we detailed what went right with the GroupWise 6 system design and its implementation and what went wrong. Happily, there were many more things that went right than things that went wrong. Here's a summary of our debriefing:
What we did right
  1. Assigned project manager (responsible for all phases of design and implementation) earlier in the design
  2. Created a PERT chart with milestones and adhered to it closely
  3. Developed budget, regularly monitored it and kept expenses within budget
  4. Encouraged staff to participate in design and implementation decisions
  5. Carefully thought out vendor selection
  6. Carefully thought out OS decision
  7. Well thought out design
  8. Had design reviewed by two independent outside consultants
  9. Exhibited a high degree of collaboration and communication
    • Within GroupWise services unit
    • Between units in network services
    • To other IT departments (customer support and IS)
    • To vendors (Dell, Novell)
  10. Implemented installation sign-off procedure
  11. Designed 30-day burn-in and test period
  12. Developed in advance contingency planning and determined ways to implement contingencies
What we could have done better
  1. Improved communications during initial phase of the project (corrected in progress)

After training, our staff returned to begin the installation of the 21 servers and the Dell/EMC FC4700 SAN system. Rather than roll our own, we contracted Dell to provide installation services. While this added to the cost and our staff could have handled the installation, we felt that having professionals handling racking, cabling and staging offered several advantages. It freed up our staff to prepare for the system implementation and it provided the opportunity to look over the installer's shoulders and learn the new system's hardware configuration.

As Dell's installation crew was wrapping up their work, we started the installation sign-off procedure. This is a comprehensive white glove inspection of all aspects of the installation including power on tests, cabling, server mounting and all hardware.

White gloves on, we started going over the system. The installation was spread over two free standing 6' x 19" racks. One rack held the 21 Dell servers, while the second rack housed the Dell/EMC SAN and Fibre Channel (FC) switches. Fiber optic cables ran from the server host bus adapters (HBAs) to the SAN through a pair of cut outs located on the top of the racks. Upon examination, we noticed that both holes had frighteningly sharp edges. Our concern was that over time, the sharp metal edges would rub off the protective cladding on the fiber optic cables, affecting the connections to the SAN.

We showed this to Dell's installation team, but they were reluctant to correct this. It was left to our team to come up with a solution. We inserted the fiber optic cables inside of a ribbed rubber hose - the kind you find at any auto parts store. It wasn't elegant, but it was functional.

Cable retention arms were used for running the cabling - copper and fiber optic - to the servers. The arms moved in and out to allow access to the rear panel of the servers. However, several of the cable retention arms in our installation seemed to have a mind of their own. Instead of moving strictly horizontally, they both moved horizontally and vertically - and this wasn't a feature they should have. Unfortunately, no one on our team noticed this until the installation had been completed, and Dell's installation team wasn't Johnny-on-the-spot to point it out. To be fair, though, Dell took full responsibility for the problem once it was pointed out to them and sent an engineer on site the following week who determined the parts were defective in manufacturing.

The company immediately shipped out new cable management arms and had installers return to replace them. While we felt that this was a no-hassle method for correcting the problem, we wished the installers had called this out to us while they were racking the servers.

Our design used two different Dell PowerEdge server models, the 2650 and the 1650. The 2650s handle the more rigorous demands of the system such as the post office agents (POAs.) The 1650s do their work in the lighter parts of the system such as the Web agents. By some design quirk, the hard drives from the 2650 PowerEdge servers aren't plug-compatible with the same size drives in the PowerEdge 1650s. We had expected otherwise. Having completely swappable drives would have made life a whole lot easier for us. As it is, we'll now have to keep spare sets of both drives on hand and remember what goes where in the rack.

Testing one, two, three
Modern SAN management technology is cool. Java-based Web management interfaces that provide access from any desktop to the SAN are all terrific management features. Still, when all else fails in the management tree, there's nothing like being able to access a good old-fashioned command line interface (CLI) with a serial communications application like Hyper Access. Fortunately, Dell and EMC's designers had wisely included a serial management port in the FC4700 SAN. Determined to test this feature, we grabbed a Windows 2000 laptop and headed for the data center. To our surprise, we found that we couldn't communicate with the CLI. No matter what we tried, the SAN just wouldn't talk to us. Finally, we called Dell support. They talked us through serial port set up configurations. Eventually, we were able to set parameters that the SAN liked. Several members of our team commented, "Serial communications used to be so easy."

Other problems surfaced: During the test period, we encountered difficulty with cluster failover. When we downed one of the clustered servers, the other one was unable to access the SAN. At first we were completely stumped. Fortunately our eagle-eyed GroupWise system administrator, Gene Hendrickson, discovered that the management port on one of the FC switches hadn't been connected during installation. He just plugged it in and restarted the switch. The clustering immediately started working. It helps to plug things in.

We weren't sure how we missed this during the installation sign-off procedure. However, this actually turned out to be a good thing. The installation team gained valuable troubleshooting experience with the SAN and the detective work reinforced what was learned in Dell's SAN classes.

This was first published in January 2003

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: