Problem solve Get help with specific problems with your technologies, process and projects.

Grid computing helps allocate scientific data

The University of Houston uses data-intensive grid computing to coordinate weather, ozone and seismic data from across U.S.

When ozone alerts get issued or energy companies discover untapped oil reserves, chances are that the University of Houston's High Performance Computing (HPC) group had something to do with it.

Using grid computing technologies, university scientists develop and employ sophisticated applications for geosciences research. Examples of HPC applications include daily air quality models and algorithms for high-resolution images of seismic data.

The university has received recognition for its work in the geosciences and is part of the High Performance Computing Across Texas (HiPCAT) consortium. But the HPC has also had to deal with challenges. Foremost was finding ways to enhance the grid environment to improve how computing jobs were submitted and executed by the system.

To enable the grid project to achieve its mission, the HPC combined Sun Microsystems hardware and software. Two Sun Fire 6800 Midframe servers and 13 Sun Fire V880 servers have been deployed, providing a computational cluster with a total of 104 CPUs. Another Sun product, Sun ONE Grid Engine software, is used as a resource management tool for managing and allocating available computing power across the computing grid.

Before using Sun products, the HPC environment was powered by an IBM SP2 machine, which has since been decommissioned.

Not only must the university's computing group ensure the computing grid is efficient, but it also must accommodate for local storage. For its own on-campus data storage, the HPC has 2.5 terabytes (TB) of capacity using Sun StorEdge T3 array technology for workgroups. According to Sun, this tool has a single RAID controller drive-tray unit that uses Fibre Channel switches to access an array of nine disk drives.

One grid computing project at the HPC models the condition of the air quality in the Houston-Galveston area on a daily basis, using information downloaded from the National Weather Service. This information, along with a chemistry model and other external data, is then used to run several different executables, including a fine-grain model for local and regional forecasts and larger computational models for the rest of the U.S.

These models simulate potentially harmful atmospheric conditions, such as high ozone levels, thus enabling environmental officials to evaluate possible remedies. A run of each individual air quality model generates about a half-terabyte of data each day. "We have four different codes running, each of which is fairly substantial," says Barbara Chapman, who oversees the process for the school.

The university sometimes collaborates with another HiPCAT member, Texas A&M University in College Station, for production runs of data used in building the air quality models. That means using the grid to transfer data between machines in Houston and College Station. "The way we're running it now is appropriate in the sense that we can run code, but it's not producing results in a timely fashion. If there's a suspicion that an air quality problem might occur on a given day, then I have to speed up this whole process," says Chapman, "because we don't have the computational resources on our [Houston] campus to do that."

Data bottlenecks get more severe when geophysicists begin running the applications that are used for the imaging and processing of seismic data. Even a small run of these data-intensive applications could consume as much as 6 TB of data. "It would consume huge amounts of storage just to keep track of their input data sets," Chapman says.

Although Grid Engine is useful for allocating resources across the grid, it is not designed to produce additional storage capacity. And with only 2.5 TB on campus, the HPC is not operating in an optimal situation, says Tony Curtis, its systems administrator. He says system resources on campus have been maxed out for "as far back as I can remember."

With a state budget crisis looming, the university has had to tighten its belt and live with the less-than-desirable circumstance. "I have the feeling that on some of the bigger runs, the data is being overwritten," he says.

Eventually, Curtis says, the center wants to boost its storage capacity to 10 TB.

How innovative is your company or someone you know? Nominate a true storage trailblazer for a prestigious "Storage Innovators" award. The deadline for entry is Feb. 28!

What do you think of the Storage Innovators e-mail? Take our quick survey and let us know.

For more on the University of Houston, visit its Web site.

Additional information on Sun can be found here.

Next Steps

IBM's Sanford sees storage nirvana

Livermore Lab experiments with Linux

Best Web Links: Advanced technologies

Dig Deeper on Storage for virtual environments

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.