Feature

Pushing storage to another level

Ezine

This article can also be found in the Premium Editorial Download "Storage magazine: Overview of top tape backup options for midrange systems and networking segments."

Download it now to read this article plus other related content.

For that reason, he says, "we're very interested in EMC's Centera, which we understand is not HSM but is storage-optimized for high reliability, low cost and permanent storage."

For now, the center uses DLT-based tapes for backup, and is just about to purchase their first SuperDLT library, Peterson says. Why just tape? "We've lost enough RAID 5 disk sets that we're very paranoid about backups. So all our data is backed up to tape, a full backup every two weeks and incremental every evening."

Peterson's biggest challenge is how to find out where bits and pieces of information are stored. "It's like Unix files," he says. "The pathname is almost as important as the actual file that gets pointed to. Here, if you pick out a trace file, it's of no use to you unless you know exactly where it was stored and which project it's associated with." Each project may have 30,000 or more different items associated with it. If someone's looking for a particular file, it can be daunting to figure out where the file is.

Ideally, he says, he'd like to "describe the files on the system almost more from a database perspective" rather than in a traditional storage file system format. He says his company's stored files have grown so big that "I wish we could have simplified things a few years ago when we had the chance."

Instead, he says, "We push the technology so hard that some scientists have thrown two or three million files in one directory.

Requires Free Membership to View

It's hell on the system to do that, because Unix file systems aren't architected to do that. Our search tools just aren't meant for that type of scale." The solution? "We modified them to make a database call rather than have them look for a file and open up a directory."

The center generates 20GB to 30GB of gene-sequencing data each day. Much of this needs to be on disk for fairly significant amounts of time, because of the way scientists work. Sometime scientists need to retrieve data that was generated at the beginning of a project, Peterson says, "and we support that."

The center has traditionally been a Compaq StorageWorks shop running TruCluster and a NAS/SAN hybrid. The system is configured to show one system image, Peterson says, adding there are four servers logically connected on the back-end via a Compaq Fibre Channel SAN. Sometimes with applications that need high-speed access to specific data sets, he says, they move disks off the SAN and onto different hosts.

Although Peterson says the Compaq gear works fine, the center bought a Network Appliance Filer NAS system. The primary motivation was to "decrease the amount of time it takes to deploy and reconfigure storage. We think the amount of time the systems administration team has to spend on managing a terabyte of data will drop significantly," he says, adding that the NetApp filer system will make it more efficient to support the center's Windows users.

When the center first built its server environment, it placed a premium on flexibility, because the specific systems requirements of the then-new Human Genome Project were still unknown. Now, 20+TB later, they understand their workload's scalability requirements much better. At this point, scalability takes a back seat to manageability, speed of deployment and integrated solutions.

"It's most efficient to support Windows over the Windows file-sharing protocols," including the common Internet file system [CIFS] or server message block [SMB], Peterson says. "To do that on the TruCluster system, you're faced with significant tasks. It's not difficult, just time-consuming, because you've got to implement Samba or Advanced Server, the two things that support Windows within the Unix environment."

In comparison, the NetApp filer already includes CIFS and SMB support. The company has bought a filer that can support up to 15.5TB.

What it ultimately comes down to is how conservative one wants to be in choosing technology. "We're very risk-averse here, given what we do and our requirements for availability," Peterson says. In fact, the center resisted going with NAS two years back, because "We felt it was not sufficiently mature. Whether that was a good thing, I can't really say."

Lesson learned? The bottom line says Aberdeen's Hill, "is that it's not always quantity that creates complexity; it's the number of objects that have to be managed. It is much more difficult to manage an elephant than it is a Chihuahua. But it's more difficult to manage a herd of mongrels than it is one elephant."

This was first published in July 2002

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: