Storage for high-performance computing


This article can also be found in the Premium Editorial Download "Storage magazine: CDP 2.0: Finding success with the latest continuous data protection tools."

Download it now to read this article plus other related content.

Ranger's storage
Ranger's storage differs from conventional SANs in the way it's provisioned, managed and backed up. The type of storage used in commercial organizations isn't appropriate for HPC in clusters the size of Ranger. Commercial systems and their SANs are oriented toward processing transactions, while HPC systems are built to maximize bandwidth and quickly process large quantities of unstructured data and files.

Until a few years ago, HPC relied on massive symmetric multiprocessing computers, but has moved to scale-out architectures with lots of x86 or commodity-based processors with DAS or NAS. The storage in these systems is aggregated under a small number of file systems such as Sun's open-source Lustre file system, Hewlett-Packard Co.'s StorageWorks Scalable File Share or Quantum Corp.'s StorNext.

Of the aggregated Thumper storage servers, six file servers with 144TB of disk space are allocated to user /home directories, 12 file servers with 288TB of disk space make up the work file system and 50 file servers with 1.2PB of disk space are reserved for scratch space. The remaining four Thumper servers are used as a "sandbox," says TACC's Minyard, to test file-system upgrades and new software versions.

In addition, the Lustre file system provides striping capability, in which data is divided and spread across several disks to increase performance (see

    Requires Free Membership to View

"HPC: A study in tiered storage," below).

HPC: A study in tiered storage
Bruce Allen, director of the Max Planck Institute for Gravitational Physics in Hannover, Germany, has implemented a three-tiered storage model to support his high-performance computing (HPC) environment. Each tier is connected to a 10Gb/sec Ethernet network.

"The most reliable level consists of Sun [Microsystems Inc.] Thumpers [servers]," says Allen. "There's less reliable storage made up of Linux SuperMicro boxes that have Eureka 16-disk Serial ATA RAID controllers. The least reliable storage is the [internal] storage on the compute nodes themselves."

In Allen's network, 1,342 compute nodes operate in concert with the storage. The network is organized and managed by Sun's ZFS file system, and each type of storage in the network is provisioned according to its reliability.

Allen's tier one storage consists of 12 Sun Thumpers with 19TB of usable capacity. "What we typically store on the Thumpers is the users' /home directories, which we regard as the most valuable data," he notes. "We use the snapshot feature of ZFS to back up very fast."

Allen chose the Thumpers over other storage arrays based on the features promised with Sun's ZFS file system. "We liked the fact that with ZFS you can do snapshots very efficiently, use variable-sized striping and it incorporates block-level checksums in all the file-system data structures for guaranteed consistency," he explains. "And we liked the way the file system and the OS deal with bad blocks on the disk."

The next tier of storage for Allen is the SuperMicro storage servers. While Allen transfers some of the backup data from the Thumper boxes to the Linux boxes, "we typically use the Linux boxes for storing more experimental data; in most cases, we can get that data from tape archives located at CalTech. In some sense, that data is more expendable and less valuable than the /home directory data."

Finally, Allen has another 650TB of storage distributed across the compute nodes. "We typically mirror experimental data that's being accessed a lot across the compute nodes," he says. "Right now, we have a 40GB data set that's being accessed quite a lot; we have a copy of that data set on every single cluster node and programs access it locally. That gives us huge bandwidth because every node in parallel is reading off of the local disk."

However, Allen's happiness with the Sun storage system and ZFS is dampened by performance problems. "One of the things we haven't been so pleased with on the Sun storage side is that the most I/O we've been able to get out of the Thumpers is a couple of hundred megabytes per second," he says. "That's surprising because the local file system seems to be capable of 500MB/sec to 600MB/sec. We typically export that data to the cluster nodes by NFS. So far, we haven't even gotten close to saturating our wire. With NetBurst, we can get about 700MB/sec reading and writing, not necessarily to the storage device."

Allen says a recent patch from Sun is expected to dramatically improve NFS performance.

This was first published in October 2008

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: