The Environmental Molecular Science Laboratory upgraded its supercomputer this year to increase performance required for its data-intensive research. That bump in performance then required a new storage system to keep up with the faster reads and writes and greater capacity requirements generated by the supercomputer.
Environmental Molecular Science Laboratory (EMSL) implemented DataDirect Networks' (DDN) Storage Fusion Architecture (SFA) 12K storage platform and ExaScaler file storage appliance to deliver 2.7 PB of usable capacity. That was a tenfold increase over its previous supercomputer storage system, despite reducing the number of racks from 40 to 3.
EMSL, a U.S. Department of Energy (DOE) facility in Richland, Washington, is used by more than 300 scientists to collaborate on biological, energy and environmental research. The scientists come from academia, private industry and other national labs.
The lab has seen a surge in unstructured data, much of it due to the addition of computational structural genomics that support the DOE's climate modeling and highly granular chemistry modeling.
"As we try to grow our supercomputing abilities, we produce more and more data, driving the need for additional storage. In addition, we are beginning to get more work on climate science, which has large data needs," said Gary Skouson, a senior high-performance computing (HPC) engineer at EMSL.
In January, the DOE-funded lab completed a planned five-year technology refresh by installing its newest supercomputer, known as Cascade, which handles up to 3.4 quadrillion calculations per second. The ability to support the superfast processing was a key factor in choosing the new HPC storage cluster. EMSL's previous storage system provided 240 TB and delivered 30 GBps of peak input/output performance, but the aging system frequently became a bottleneck.
EMSL needed storage that enables teams of researchers to write data to the file system and run computational analyses simultaneously without hogging bandwidth or contending for resources. It sought clustered NAS with a file system attached to it.
"Our new target needed to be over 2 PB to match the growth trend. The focus was on future data growth, as well," said Evan Felix, Skouson's colleague who is also a senior HPC engineer.
After receiving input from its scientists on projected data needs, EMSL specified minimum performance metrics for file open-and-close rates. Bidding vendors needed to provide a system rated for open rates of 8,000 IOPS and closing rates of 40,000 IOPS.
"The biggest [criteria] was being able to do 60 GBps read or write, along with high metadata operations. We have large files that need to open in a reasonable amount of time, and we have thousands of files that need to open simultaneously," Felix said.
EMSL hired Atipa Technologies to solicit proposals from vendors and evaluate their performance claims. After sifting through vendor proposals, Atipa recommended the DDN SFA 12K platform, which is intended for HPC environments with heavy reliance on big data analytics.
The new storage is answering researchers' needs for quick access to complex data sets, enabling multiple computational analyses without hindering performance and opening doors for expanded research, Skouson said.
DDN's family of SFA devices features its high-speed Storage Fusion Fabric, which is designed to provide a high number of concurrent connections to drives, parallelizing I/O across multiple channels. DDN's storage architecture also enables applications and file systems to be embedded within storage controllers, which reduces the number of components and boosts application performance.
As part of its $17 million computer upgrade, EMSL installed three SFA12K-40 storage arrays, as well as DDN's ExaScaler parallel file system to support the open source Lustre File System that contains more than 56 million files. The 20 Lustre file servers are managed by DDN's Lustre software stack and connect to the 1,400 nodes on the cluster over InfiniBand.
The SFA12K-40 array can scale to 1,680 drives in two racks, including a mix of solid-state drives and SATA and SAS hard drives. EMSL uses only fixed disk in its HPC environment.
EMSL's decision was helped by the fact that it already used two DDN storage arrays to support a large data archive. Felix said the newly installed SFA12K-40s provide peak performance to 75 GBps for read and write operations.
Reducing the number of racks to three also meets EMSL's goal to lower its hardware footprint and control power and cooling costs. "The number of components that can fail is way lower and it's a lot simpler to manage," Felix said.