Trying to make sense of these seemingly random events created by Mother Nature is the function of the Boulder, CO-based National Center for Atmospheric Research (NCAR). Funded through the National Science Foundation, NCAR conducts research and provides facilities -- including data sets and computer models -- to improve the understanding of the skies, oceans, ice packs and landmasses of the world. As you can imagine, it's a function that requires a massive amount of computing power and an enormous, capacious storage system. The data and projections obtained and created by NCAR are used by scientists, academic institutions and numerous government agencies including NASA, NOAA, DOE and the EPA.
The demands of coping with a subject as vast as the environment has caused NCAR a few storage-related issues over the years. The burden has led them to develop an innovative high-capacity archival system -- one designed to accommodate huge amounts of critical, irreproducible and ever-increasing data.
In NCAR's case, the amount of data generated is driven by one simple factor -- the speed at which its two IBM supercomputers, the 10th and 33rd largest in the world, can create it. With the current processing power, the data growth rate sits at around 1.5 terabytes (TB) a day. Although it's taken 16 years to reach the 1 petabyte (PB) level, NCAR expects to break through the 2 PB barrier as early as next year. The increase in yield is directly related to the processing increases of
In light of the sheer volume of data stored by the NCAR mass storage system (MSS), the only currently viable media is tape. At present, the backbone of the NCAR MSS is a tape-based archive with only a small amount of disk space for files. The tape technology consists of five PowderHorn 9310 tape libraries, 20 StorageTek 9840A silo-(library) attached tape drives, 28 StorageTek 9940A silo-attached tape drives, four StorageTek 9840A manual mount tape drives, etc.
One of the principal developers of the NCAR MSS is Gene Harano, manager of the high-performance systems section for the scientific computing division. For the past 16 years, Harano has been instrumental in the development and management of the MSS. Although Harano sees a point where there will be more disk capacity for the MSS, he still feels that tape will have its place. "Although the costs associated with disks are coming down, there is a long way to go before they are comparable to tape cost capacities."
According to Harano, the MSS software was initially deployed on an IBM mainframe under MVS, though this software is in the process of being migrated to open systems platforms. Some of the MSS functionality has already been moved. Direct tape access from 25 MSS heterogeneous Unix-based host platforms is accomplished with a combination of a high-performance parallel interface (HiPPI) fabric and StorageTek channel protocol bridges that attach directly to the control units of the storage devices.
Like others in his situation, Harano realizes that you can only keep throwing capacity at a storage problem for so long. Ultimately, he says, you need to find ways to manage and control the data you have. Harano's team felt that the current software offerings -- which allow data management and classification -- fall short of a suitable solution. As a result they are currently developing an in-house system that will allow them to manage their data more effectively. "We have a single user that owns over a million files," says Harano, "There can be no doubt that he does not know what they all are, and even more likely that he doesn't need them all."
Having seen the NCAR MSS grow over the last 16 years, Harano is able to offer one overriding piece of advice for those who are in the process of building and developing their own storage networks. "The amount of data being created in all environments is growing at an alarming rate. Never underestimate the size of the problem. There is always more data than you think there will be."
For more information on National Center for Atmospheric Research (NCAR) visit its Web site.
Additional information on StorageTek can be found here.
>> Best Web Links: Archiving
This was first published in April 2003