Feature

The networked storage project: moving ahead

Ezine

This article can also be found in the Premium Editorial Download "Storage magazine: Best storage products of the year 2002."

Download it now to read this article plus other related content.

Disciplines learned from
mainframe storage

Requires Free Membership to View

An e-mail-based SAN has different performance metrics than, say, a file and print server. While both are I/O-centric, an e-mail SAN typically has many more small files to deal with. To determine performance, we used several metrics. First, we blasted e-mails from a test program we developed at the system and measured how quickly it processed them. Here are the test results we recorded:
100,000 messages were sent between two accounts on two separate post offices. Fifty-seven of the 100,000 messages had one million lines of text. The system handled them all without any lost messages.
System routed 36,000 e-mails in 10 minutes.
Even with 10,000+ messages in the mailbox, the client started very quickly.
We didn't receive the alert that there were "too many messages to be viewed." This is an error that came up often with the previous system whenever we had more than 4,096 messages in a folder.
Deleting 10,000 messages took less than five minutes. Deleting a large amount of messages in the previous system would lock up the client for over an hour in some cases.
Next, we ran benchmarks on the SAN itself with the following results:
     Read I/Os: 21,225/s
     Write I/Os: 21,223/s
     Read Performance: 41.6MB/s
     Write Performance: 41.5MB/s
     Average read response time: 0.04 sec
     Average write response time: 0.06 sec
We were curious to see how the measured read/write performance of the SAN compared to Dell's specifications. The documentation we received at Dell's SAN class indicated we should expect to see 30MB/s to 35 MB/s. Since our measured 41.5MB/s was higher, we felt that Dell was perhaps being a bit conservative in the documentation. Still, this was an impressive performance figure exceeding our expectations.

Going live
Statistically, most faults in computer systems and hard drives appear within the first 30 days of operation. Therefore, it was important to get a SAN up and running a few weeks before the go live date. This allowed us sufficient time to burn in the system. Obviously, if the system was going to smoke, it's better if it happens in the preproduction testing.

Getting the system up early in test mode provided another advantage. It gave the team time to play with the system before it went live. Things such as deliberately creating outages to see how the system reacted would be totally forbidden on a production system, but it's fair game when the system is in preproduction testing. While this allowed the team to test the system and learn, it also produced some confusion regarding the SAN state as reflected in a message sent by Ed Norman, one of the project team leaders: "A large number of faults are being generated on the SAN. This began on Friday, Aug. 16, [and continued ten days later]. If this is being done on purpose, may I suggest we stop sending these alerts to Dell, because they are assuming these are real failures. If they are real failures, we clearly have problems with our hardware."

The eleventh hour
Every project has its eleventh hour glitch. Ours came just 30 hours before system cut-over. Just prior to cut-over, we installed NetWare Support Pack 2 on the servers. There were several good reasons for upgrading to SP2 before cut-over. Installing SP2 meant not having to bring the servers down to install SP2 after the cut-over. The support pack also contained fixes for SAN clustering - something we obviously wanted to have in our system. Additionally, we wanted to be prepared for any possible support issues that might arise during the cut-over. We knew that if we ran into any glitches, Novell support first question would be, "Did you install SP2?"

Vendors such as Microsoft and Novell create support packs to add features and fix problems. But nearly every support pack seems to find a way to break something that was previously working. SP2 for NetWare was no exception. After installing it, we found it took over four minutes to load a post office. Additionally, when a client connected to a post office agent, its IP address was displayed in the management system as 0.0.0.0.

The latter issue prevented us from seeing what clients were connected to the system - mostly an annoyance. The former issue was a greater concern creating the potential for an outage of up to five minutes in the event that we had to failover clustered servers.

As the notes of our project leader, Milton Christ, pointed out, at first Novell support was slow to respond to our requests for support: "We have been playing a waiting game with Novell [support] ... we lost an afternoon's work due to the issues with SP2. [At this late date] management doesn't want to hear 'I don't know what Novell is doing.'"

Fortunately, our Novell advocate stepped in. She raised the incident to high severity and contacted Novell's support manager to get coordination in place between Novell's OS and GroupWise support groups. In short order, Novell's OS group gave us a field test NetWare Loadable Module (NLM). The NLM protocol supports file locking from NFS-mounted files that fixed the slow load issue for the POAs. However, it didn't correct the 0.0.0.0 showing in the client IP address field. That fix, we were told, would come later. Since we had a few POAs with no users on them, we could use these POAs to test Novell's final fix without causing interruptions to the production POAs.

Up and running
We went live on Oct. 5, 2002 - a full day ahead of schedule. The migration from the old system to the new SAN system went amazingly smoothly.

After every project, we held a debriefing meeting to reexamine our design, installation and deployment procedures (see "Project summary,"). In looking at the past five months, there wasn't much that we wished we had done differently. In retrospect, we could have orchestrated the hardware installation a little better and communicated our design criteria to all team members more rapidly than we did. But on whole, it was a job well done by all members of the team. The design and vendor choices we made were soundly based and have begun to prove themselves in real life operation.

Were we do to it over again, we would without hesitation incorporate a SAN as a core element in our e-mail system. Maybe in the future we'll include NAS and iSCSI as well, but for now we're happy with our system - and so are our users. Out of nearly 6,000 GroupWise users on campus there were fewer than 250 support calls related to system or client issues to our Help Desk the first week of operation - clearly a success.

This was first published in January 2003

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: