Home > Storage Technology News > Researchers wrangle petabytes of data storage with NAS, tape
Storage Technology News:
EMAIL THIS

Researchers wrangle petabytes of data storage with NAS, tape

By Beth Pariseau, News Writer
02 Aug 2007 | SearchStorage.com

News and trends in the storage industry
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google

Much is made in the enterprise data storage industry about the performance of disk systems over tape drives, but the managers of one data center that has reached the far limits of capacity say otherwise. Budget and performance demands forced them to build access protocols and data management tools for disk systems from scratch. High-end commercial tape drives, on the other hand, have largely met their requirements as one of the largest data producing facilities in the world.

The facility is the Large Hadron Collider (LHC), owned by CERN, the world's largest physics laboratory, in Switzerland. The collider, which will be used for new, highly data-intensive experiments beginning in May 2008, is a tube large enough in diameter to drive a small car through. It accelerates particles around a 10-mile-wide circle formed by the tube underground, bringing them together at four set collision points in order to smash them apart. Even further down, 12-story-high caverns full of electronic detection equipment collect raw data on the collisions.

With the new project being launched next year, scientists hope they can use the collider to discover new subatomic particles, which in turn could help to explain fundamental mysteries of the universe.

More on data center storage
HP, Quantum bundle SAN file system for multimedia

Storage industry ponders power efficiency redesign

Oxford University rethinks data center storage

Cancer clinic juggles two tiered storage environments
"One particular theoretical particle that we're looking for is called the Higgs boson particle," said Francois Grey, head of IT at CERN. "It's the missing piece in a model known as the Standard Model that provides a coherent picture of our universe."

When fully operational, these new experiments will produce 15 petabytes (PB) of raw data annually. During each collision, the system produces high-resolution images in the hopes of capturing evidence of the elusive particle. From that, the data is pared down to about 1 PB of refined event summary data annually.

This data is stored on a massive farm of network attached storage (NAS) servers, currently 500 in all, though that number will increase to 800 by the end of this year. The servers are based on Red Hat Enterprise Linux running XFS and whitebox x86 hardware. Some older systems with smaller capacity are included in the NAS farm, but each of the servers currently being delivered to CERN comes with twenty 500 GB SATA disks, or 10 terabytes (TB) total storage. Currently, the farm has 3.5 PB of capacity, and once more are added next year that number will nearly double.

Data is placed on the NAS servers, granted access from clients, migrated to tape libraries and managed using a home-grown program called CASTOR, which stands for CERN Advanced Storage System.

According to Tony Cass, the leader of the Fabric and Infrastructure Operations group at the CERN facility, during previous projects at the collider, data storage had been handled by a mainframe and Unix systems, but CERN moved to open systems several years ago when it could no longer afford the number of CPU calculations required for the mainframe to support its data.

The organization looked into a number of commercially available filesystems and file management products, including GPFS, Lustre and Sun Microsystems Inc.'s SAM-FS, which performs similar functions to CASTOR, but never found one that could meet its scalability and performance requirements. In addition to requiring hundreds of gigabytes per second throughput, the system must also allow concurrent access to the grid from researchers around the world, including grid partner sites in other countries that are hosting copies of some of the analysis data. The organization has also found that standard filesystem protocols, like CIFS and NFS, aren't up to the task, so it's using specialized community-developed communication protocols specific to high-energy physics research for client access to the system.

While the disk technologies and software being used will be exotic to most enterprise storage managers, with the possible exception of Google Inc. Cass said that data centers, like those run by Google and use a farm of commodity PCs, are the closest commercial comparison to the data center at CERN, the tape libraries used for long-term storage should be familiar. CERN has 160 tape drives backing its system in all, fifty 3592 and TS1120 drives spread across two IBM 3584 libraries and another 50 T10000 drives spread across two Sun/StorageTek SL8500 silos. Another 60 drives are older Sun/StorageTek 9940s.

The proprietary tape drives go against the organization's standards favoring inexpensive and commodity products, Cass admitted, but said CERN had found the proprietary drives not only had performance advantages over LTO, which CERN evaluated first, but also that the ability to repurpose media would more than cover the higher cost of the libraries.

It's a capability that has long been supported by IBM and StorageTek, according to IDC tape analyst Robert Amatruda, but it often gets "lost in the discussion" around tape. Proprietary media cartridges can be reformatted at the density of a new generation. For example, today's 500 GB cartridge can be reformatted to a 1 TB cartridge if there is a capacity refresh for the format.

"It takes a long time to do, so many users don't do it," Amatruda said.

However, Cass said that at about $200,000 for a petabyte's worth of cartridges and the anticipated capacity of 15 PB, the ability to recycle the cartridges will save CERN millions per year.

Though the tapes, which are constantly being written, reformatted, copied, transferred and read by the CASTOR system have not been a bottleneck, there are some enhancements that Cass said he'd like to see to boost their performance further. For example, due to the high mount rate inside CERN's silos, Cass said he'd like to see the tape robots have the ability to "pre-fetch" cartridges for the appropriate drives, the better to feed them faster.

Cass also said he has in no way dismissed the possibility of going with a commercial disk system down the road. "If you look out in a few years, other enterprises are coming up with similar [capacity and performance] demands. Five years down the line, I would expect commercial products to have caught up with us. After all, five years ago you didn't have the transfer rates or capacities you're seeing with these high-end tape drives now."



Tags: Data center storageVIEW ALL TAGS

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


RELATED CONTENT
Data center storage
CIO interview: Data backups must run on time at DFW Airport
Cisco sees ratified T11 standard driving adoption of Fibre Channel over Ethernet (FCoE)
Plan data protection strategy around business case: Storage Decisions speaker
Electronic medical records present challenge to healthcare industry
Storage Decisions Chicago 2009 Session Downloads
Storage Decisions Session Downloads: Management/Executive Track (Chicago 2009)
EPA begins long process to green storage specification
The glass is half full
A dynamic four-tier storage design
SNW: DAS makes a comeback as alternative to SAN, NAS

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
carrier hotel  (SearchStorage.com)
storage consolidation  (SearchStorage.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary



Backup Solution Directory and Archiving Reseller Resources
TechTarget Storage Media
Storage Magazine View this month\\'s issue and subscribe today.
Storage Decisions Apply online for free conference admission.
SearchStorage.com
HomeNewsMagazineTopicsLearningMultimediaWhite PapersBlogsEventsAbout Us

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2000 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts