Ernest Bowman-Cisneros, manager, LROC Science Operations Center at SESE, said his team needs to store petabytes of data in a single volume so it can handle the constant flow of images from the moon. His implementation is the type of "big data" storage EMC executives have been talking about since completing the $2.25 billion Isilon acquisition last December.
SESE switched to Isilon last year as the LRO began sending data back, and after its previous storage setup choked while the team was digitizing images from the Apollo space program.
SESE runs two 11-node clusters of Isilon NL-Series storage with just under 700 TB of capacity on each cluster. The primary cluster is used for active data with a secondary cluster serving as a redundant copy stored in a separate site. SESE uses Isilon SyncIQ software to synchronize data between the two clusters.
Bowman-Cisneros said SESE has published nearly 100 TB of data from the lunar camera in the first year of the project, and he expects it to generate approximately 170 TB a year from now on – 120 TB of published data plus about 50 TB of other data from the spacecraft. The LRO is expected to remain in orbit between four and nine years.
"We don't need fast I/O performance, but we do need a large storage solution," Bowman-Cisneros said. "One of our biggest storage requirements was that we would be able to grow to multi-petabyte volumes. If we couldn't grow to those large volumes, we would have small pods and we'd have to have a sufficiently high-speed solution to move data from one volume to the next. We couldn't do that with our budget."
Bowman-Cisneros said before the LRO project, his group began storing data from the Apollo Scan Project undertaken by NASA's Johnson Space Center (JSC) in 2006 to digitize photographs taken from Apollo missions.
For the Apollo Scan Project, SESE originally used NetApp storage running redundant Red Hat Global File System (GFS) nodes set up by ASU's high-performance computing (HPC) group. That project gave Bowman-Cisnero's team time to work out any storage problems before the LRO cameras began sending back data beginning in 2010.
Six months into the Apollo project, the size of the clusters grew too large for SESE's GFS heads and caused system crashes, Bowman-Cisneros said. One crash lasted a week.
"According to Red Hat, we were running one of the largest back-end clusters to GFS, and we hit the limit of that implementation," he said. "The size of the system and the way the system was being accessed freaked out the GFS heads. At one point, we had multiple heads, but after this [one-week] outage we had to revert to a single node so we could continue operations and not encounter this problem again."
He said the LRO team never lost data but the crashes caused inconvenient delays.
"That one node tended to lock up if it got very busy, and we'd have to restart it, which caused minor delays in data processing," he said. "If the load got very high on it, it actually took out the file system, lost state with back-end filers, and the storage solution went away and we'd have to wait for it to re-start. So while the solution was working, it suffered from this performance problem and we also needed more storage for the next iteration of processing."
So after four years with his original storage system, Bowman-Cisneros decided it was time for an upgrade. He said he talked to approximately six NAS vendors, and only Isilon and IBM said they could fulfill the LRO requirements. Isilon pitched its NL-Series and IBM proposed SONAS based on its General Parallel File System (GPFS).
Isilon quickly sent out a six-node test system early last December. Bowman-Cisneros said it "passed with flying colors" and LRO has had Isilon in full product for approximately a month.
"Isilon was very aggressive getting us equipment and making sure it was set up correctly," he said. "We had a full implementation of their hardware set up in short order to test all our requirements and some of the issues we were having with the old solution."
He said the only problem he's had with Isilon was a software patch that caused some problems with his secondary node but didn't impact day-to-day performance. Tech support quickly solved the problem and he was able to apply the patch to both systems without problems.
Bowman-Cisneros was testing Isilon's system at the same time EMC was negotiating its acquisition of the clustered NAS vendor.
"It wasn't until after we signed on the dotted line that we found out about EMC," Bowman-Cisneros said. "By the end of December, we had completed our testing and decided to go with their system. At this point, [the acquisition has] been inconsequential to us. My only concern is that EMC will continue to develop and support the model I have. "