raywoo - Fotolia

News Stay informed about the latest enterprise technology news and product updates.

Bioinformatics lab uses DDN GridScaler storage for Ebola research

Virginia Bioinformatics Institute said it chose the SFA10K platform because it embeds the GPFS file system and provides scalable storage.

Sometimes having fast storage can help make a difference in life or death situations.

The Virginia Bioinformatics Institute (VBI) in 2014 used Data Direct Networks (DDN)'s Storage Fusion Architecture (SFA) GridScaler SFA10K appliances to quickly generate computer models simulating the spread of Ebola and containment strategies in West Africa.

The Ebola outbreak has claimed more than 10,000 lives, according to the World Health Organization. As the Ebola crisis was raging in Liberia, the U.S. Department of Defense requested computer models from VBI that represented how the disease could spread via human interaction. Computational epidemiologists accessed the DDN GridScaler storage to analyze huge data sets and rapidly build agent-based outbreak models.

U.S. officials used that information to decide where to place mobile emergency field hospitals in Liberia. The defense department request arrived on a Friday, and VBI's research team turned the data around before U.S. supply planes left for Liberia the following Monday.

"Having the DDN storage in place was critical in us being able to deliver the results in the timeframe needed," said Keith Bissett, a simulation development scientist at VBI's Network Dynamics and Simulation Science Laboratory.

GPFS-based storage designed for big data analysis

VBI is a research center affiliated with Virginia Tech University in Blacksburg, Virginia, specializing in bioinformatics and genomics research related to the development of vaccines and therapeutics for treating infectious disease.

VBI's high-performance computing (HPC) cluster, nicknamed Shadowfax, includes 2,500 processor cores running in Dell servers that access a two-node configuration of DDN's SFA10K appliances with nearly 1 PB of storage. One SFA10K node is devoted to home file system storage and a second to scratch file storage. A third SFA10K appliance at the Virginia Tech campus supports other research projects.

DDN's GridScaler architecture embeds IBM General Parallel File System (GPFS) within storage controllers in an effort to reduce latency. VBI's SFA10K storage cluster is connected via a quadruple-data rate InfiniBand connection.

VBI also uses SGI's Data Migration Facility for creating virtualized storage fabric to support active archiving and is using DDN's SFA10K controllers with 7000 Series disk shelves.

Growth in unstructured data required scalable high-performance storage

VBI officials said they implemented DDN GridScaler to keep pace with increased volume of unstructured data. Since 2005, VBI's data has crept up from 300 TB to its present volume of 1 PB, fueled largely by genomics research, said Kevin Shinpaugh, VBI's director of IT and high performance computing.

"Having a parallel file system being served up over Infiniband has made a big difference for us. Availability of data is the value we bring to customers, so reliability and data integrity are very important," Shinpaugh said.

VBI backs up its data with IBM Storage Manager (TSM) to a 10,000-slot Oracle StorageTek SL8500 tape library for archiving and disaster recovery. Its work on Ebola is the latest illustration of improved storage performance, Bissett said. Before installing the DDN cluster, VBI could create simulations for a city of 10 million in several hours. With the cluster, it can simulate the entire U.S. population of more than 300 million in about an hour, he said.

Shinpaugh and Bissett said VBI is exploring the addition of DDN's Web Object Scaler (WOS) platform for object storage as a tool to boost global collaboration with other researchers.

Next Steps

IBM uses GPFS to create software-defined storage

Lab finds success with DDN for supercomputer performance

Dig Deeper on Big data storage