You think you have high storage capacity needs? Well, this week the National Center for Super Computing Applications (NCSA) added 20 PB – that’s right, petabytes – of tape capacity for online data for its Blue Waters supercomputer. And that should last about a year.
The NCSA uses four SpectraLogic 17-frame T-Finity tape libraries with IBM TS1140 tape drives for all of its active archiving for the Blue Waters super computer that went into operation a year ago. That setup can store 380 PB, which should be enough for Blue Waters’ expected five-year lifespan. Michelle Butler, NCSA senior technical program manager, said Blue Waters’ archiving data is expected to grow by about 60% to 70% per year.
The SpectraLogic libraries are connected to a Cray-branded Seagate high performance computing disk system that holds 35 PB of raw data and 25 PB usable.
“We need to be able to stay ahead of our users,” Butler said. “We are continuously growing, but we have stored a little less than 20 PB in data in our first year.”
Blue Waters, based at the University of Illinois at Urbana-Champaign, is used for a variety of research data. Applications include weather prediction and analyzing how the cosmos developed after the Big Bang. Butler said there are 36 teams of from 10 to 20 researchers per team that use the system, with between 100 and 200 users online at any time. The supercomputer includes 28 systems dedicated to online data movement and 50 for nearline data movement using one or two 40-Gigabit Ethernet cards.
Butler said NCSA chose tape for archiving because hundreds of petabytes of disk would be too costly. It is also easy to non-disruptively add capacity to the tape libraries.
Writing that much data to tape efficiently and concurrently did require the Blue Waters team to write a RAIT (Redundant Array of Inexpensive Tapes) utility for its IBM HPSS (high performance storage system) hierarchical storage management system.
RAIT enables nine wide data stripes and allows for the loss of two tapes without losing access to data.
“With RAIT, we can stripe data and still protect it,” she said. “We needed to strip data and write data extremely fast to the tape drive. We couldn’t single-stream files, that’s too slow. But we didn’t want to lose users’ data if a tape or drive fails. With a seven-wide stripe of data, that would be 28 TB of data if we drop a tape. Now we can do seven wide stripes of data and two wide stripes of parity. We can lose two tapes and still continue to retrieve users’ data.”