Much is made in the enterprise data storage industry about the performance of disk systems over tape drives, but...
the managers of one data center that has reached the far limits of capacity say otherwise. Budget and performance demands forced them to build access protocols and data management tools for disk systems from scratch. High-end commercial tape drives, on the other hand, have largely met their requirements as one of the largest data producing facilities in the world.
The facility is the Large Hadron Collider (LHC), owned by CERN, the world's largest physics laboratory, in Switzerland. The collider, which will be used for new, highly data-intensive experiments beginning in May 2008, is a tube large enough in diameter to drive a small car through. It accelerates particles around a 10-mile-wide circle formed by the tube underground, bringing them together at four set collision points in order to smash them apart. Even further down, 12-story-high caverns full of electronic detection equipment collect raw data on the collisions.
With the new project being launched next year, scientists hope they can use the collider to discover new subatomic particles, which in turn could help to explain fundamental mysteries of the universe.
When fully operational, these new experiments will produce 15 petabytes (PB) of raw data annually. During each collision, the system produces high-resolution images in the hopes of capturing evidence of the elusive particle. From that, the data is pared down to about 1 PB of refined event summary data annually.
This data is stored on a massive farm of network attached storage (NAS) servers, currently 500 in all, though that number will increase to 800 by the end of this year. The servers are based on Red Hat Enterprise Linux running XFS and whitebox x86 hardware. Some older systems with smaller capacity are included in the NAS farm, but each of the servers currently being delivered to CERN comes with twenty 500 GB SATA disks, or 10 terabytes (TB) total storage. Currently, the farm has 3.5 PB of capacity, and once more are added next year that number will nearly double.
Data is placed on the NAS servers, granted access from clients, migrated to tape libraries and managed using a home-grown program called CASTOR, which stands for CERN Advanced Storage System.
According to Tony Cass, the leader of the Fabric and Infrastructure Operations group at the CERN facility, during previous projects at the collider, data storage had been handled by a mainframe and Unix systems, but CERN moved to open systems several years ago when it could no longer afford the number of CPU calculations required for the mainframe to support its data.
The organization looked into a number of commercially available filesystems and file management products, including GPFS, Lustre and Sun Microsystems Inc.'s SAM-FS, which performs similar functions to CASTOR, but never found one that could meet its scalability and performance requirements. In addition to requiring hundreds of gigabytes per second throughput, the system must also allow concurrent access to the grid from researchers around the world, including grid partner sites in other countries that are hosting copies of some of the analysis data. The organization has also found that standard filesystem protocols, like CIFS and NFS, aren't up to the task, so it's using specialized community-developed communication protocols specific to high-energy physics research for client access to the system.
While the disk technologies and software being used will be exotic to most enterprise storage managers, with the possible exception of Google Inc. Cass said that data centers, like those run by Google and use a farm of commodity PCs, are the closest commercial comparison to the data center at CERN, the tape libraries used for long-term storage should be familiar. CERN has 160 tape drives backing its system in all, fifty 3592 and TS1120 drives spread across two IBM 3584 libraries and another 50 T10000 drives spread across two Sun/StorageTek SL8500 silos. Another 60 drives are older Sun/StorageTek 9940s.
The proprietary tape drives go against the organization's standards favoring inexpensive and commodity products, Cass admitted, but said CERN had found the proprietary drives not only had performance advantages over LTO, which CERN evaluated first, but also that the ability to repurpose media would more than cover the higher cost of the libraries.
It's a capability that has long been supported by IBM and StorageTek, according to IDC tape analyst Robert Amatruda, but it often gets "lost in the discussion" around tape. Proprietary media cartridges can be reformatted at the density of a new generation. For example, today's 500 GB cartridge can be reformatted to a 1 TB cartridge if there is a capacity refresh for the format.
"It takes a long time to do, so many users don't do it," Amatruda said.
However, Cass said that at about $200,000 for a petabyte's worth of cartridges and the anticipated capacity of 15 PB, the ability to recycle the cartridges will save CERN millions per year.
Though the tapes, which are constantly being written, reformatted, copied, transferred and read by the CASTOR system have not been a bottleneck, there are some enhancements that Cass said he'd like to see to boost their performance further. For example, due to the high mount rate inside CERN's silos, Cass said he'd like to see the tape robots have the ability to "pre-fetch" cartridges for the appropriate drives, the better to feed them faster.
Cass also said he has in no way dismissed the possibility of going with a commercial disk system down the road. "If you look out in a few years, other enterprises are coming up with similar [capacity and performance] demands. Five years down the line, I would expect commercial products to have caught up with us. After all, five years ago you didn't have the transfer rates or capacities you're seeing with these high-end tape drives now."