Even rocket scientists have to tighten their belts, especially when it comes to supporting astrophysics equipment that sucks up most of the budget for new projects. So when a NASA project had no money for more storage arrays, the systems engineer in charge turned to software that creates a shared storage system.
As part of NASA's Infrared Processing and Analysis Center, the Spitzer Science Center on the California Institute Technology (Caltech) campus in Pasadena uses 1.2 PB of storage arrays from Nexsan to support data gathered from space. According to senior systems engineer Eugean Hacopians, the Center has anywhere from 10 to 14 multimillion-dollar telescope projects going at any time.
But budgets for new projects are often under $1 million annually, and that's the case for a new four-year project called the Palomar Transient Factory. The goal of the project, which has been in test and development phrase for three months and is expected to go live in November, is to scan the skies nightly for any undiscovered objects. A 200-inch Hale Telescope located at San Diego's Palomar Observatory will send up to 2,400 image files per day to the Center. There, these files are stored and processed by an application based on the open source PostgreSQL database.
With most of the project's budget used for the equipment needed to generate and transmit the images themselves, no money is left for more storage arrays. Instead, Hacopians will eliminate RAID arrays, Fibre Channel switches and HBAs by using software from Seanodes Inc. The company's Exanodes software, which can create a shared storage system using the cluster's own hard disks, will be used to repurpose existing Linux servers to act as processors and storage arrays in one.
Hacopians estimates the Seanodes approach runs less than half of what it would cost for another Nexsan system and accompanying hardware. The Spitzer Center has a 10 TB Seanodes cluster setup, consisting of two Intel and three AMD Linux servers -- "whatever we had already, whatever we could afford," Hacopians said. "The beauty of it is that it doesn't matter what hardware we use at all." He considered other do-it-yourself clusters from vendors such as LeftHand Networks, but they did not support all hardware.
Seanodes also offers flexibility when it comes to networking. "You can set up the cluster to be like a Ferrari, with 10 Gigabit Ethernet or InfiniBand connections, or you can do what we've done so far and set up something more like a Honda, with Gigabit Ethernet," Hacopians said. For now, though, the parallel processing of multiple server nodes has that "Honda" processing data at up to 300 MBps.
While testing, one of the server nodes had to be taken down and used somewhere else temporarily, and Hacopians' team anticipated that bringing the node back into the Seanodes cluster again would be an arduous chore. Bujt when the server was put back into the cluster, Hacopians was surprised to see it bring up log files, identify its partition and find the data it still had sitting on its drives in just a few minutes. "That's about as bulletproof as you can get for the price point and flexibility," he said.
Still, this configuration probably won't last, and, unlike traditional enterprise storage systems, will require more than one redesign going forward. Over the next four years, NASA expects the open source PostgreSQL database for this project to expand to 20 TB, or 42 billion rows. The archive storage will probably exceed 400 TB, and the Spitzer Center is already pushing the limit of internal storage on the existing sever nodes.
"If we expand more, we are going to have to put external storage behind the Seanodes cluster," Hacopians said. "But Seanodes will still save us by letting us use JBODs behind each cluster node rather than more expensive RAID arrays."
Right now, the plan is to expand with Sun Microsystems Thumper direct-attached storage arrays behind each Seanode node. "Right now each of our server nodes has a dual-quadcore CPU, with 3.2 GigaHertz per core," Hacopians said. "That's some serious computational power – if you end up with 14 of them, you need storage that can feed the beast at the optimum level. Thumper's servers would allow us to generate that amount of data throughput."