When Dan Werthimer talks about storage for the enterprise, he's not referring to the latest Star Trek movie. It only sounds that way.
As director of the Search for Extraterrestrial Intelligence (SETI), a scientific research project at the University of California-Berkeley, Werthimer oversees the collection and storage of radio-wave emissions from hundreds of billions of stars. The program has harnessed the spare computing power of nearly four million volunteers, creating what is believed to be the largest distributed computing environment in existence. This massive logical computer network has a simple goal: analyze radio wave patterns trying to detect if intelligent life exists elsewhere in the universe.
"A big problem at SETI is that we don't know what frequency E.T. might transmit on, so we want to record as much of the radio spectrum as possible," says Werthimer, only slightly tongue in cheek. This data is recorded under an agreement with the 305-meter radio telescope in Arecibo, Puerto Rico, thought to be the largest device of its kind.
SETI's biggest storage demand is on site at Arecibo. The telescope records images of the sky directly onto digital linear tape (DLT), using a data recorder powered by a single tape drive. Telescope operators, who man the observatory round the clock, change these magnetic tapes for SETI twice a day. SETI collects about 50G Bytes of new data each day for analysis "but we'd like to do more. Right now we're only analyzing a small part of the radio spectrum," says Werthimer.
So far, SETI has analyzed about 50T Bytes of data recorded at Arecibo. SETI maintains "a few terabytes" of disk storage on site in Berkeley for large databases and for referencing interesting or unusual radio signals.
The nonprofit organization has begun installing a more powerful data recorder at Arecibo, using a pair of hardware products donated by Hewlett-Packard. HP's Netserver LH 6000r is running the Linux operating system, which was chosen because its open architecture enabled SETI to interface with the millions of diverse computing devices in its enterprise. This helps minimize compatibility conflicts between differing platforms.
Second, HP provided a bank of four SureStore DLT 8000 tape drives. SureStore features a disk buffer that records and stores data, even while DLT tapes are being swapped. This feature guards against data loss when the tape-recording function is interrupted. Each SureStore DLT tape stores up to 40G Bytes of compressed data, and each tape drive holds nine tapes, providing 1.44T Bytes of total storage.
Once SETI's new system is fully operational, probably within six months, it could process as many as 800G Bytes a day. That would represent a 16-fold increase over the 50G Bytes of data collected presently. In other words, tapes would be storing exponentially more data without increasing the tape-swapping burden on telescope operators. "We need it to tend to itself for many hours at a time," says network manager Jeff Cobb.
Changing tapes causes downtime, which in turn could lead to loss of vital data. Once data is lost there is no way for SETI researchers to capture it without repeating their analysis. Thus, a system that offered high reliability and low latency for interrupts was a priority. "We wanted a machine that could give us a lot of bandwidth because we're going to be taking in data at a rate of about 10M Bytes/sec or higher," says Cobb. "We also wanted a machine that offered us more than one independent PCI bus."
The bus issue was critical to ensuring a reliable data-routing path. Data recorded by the telescope first is moved on to the PCI bus, then on to disk, from where it is retrieved and placed on tape. "So there are a lot of bytes moving around," says Werthimer.
Cobb and Werthimer also liked the remote diagnostic and management capabilities of the HP hardware. Says Cobb: "It had to be robust. Once we install a system, it's going to be somewhere far away and we had to be able to trust that it would work."