Zack Ramjan was given a blank slate to remake storage in 2014 when he joined Van Andel Institute as a research...
computing architect. His goal was to consolidate storage and reduce IT infrastructure costs, while protecting "irreplaceable" instrument and research data.
"At the time, there was no compute and storage infrastructure here that was massively parallel. We had basic business storage and little islands of storage that people put in for big data or imaging, but it was a very inefficient use of resources," Ramjan said.
Under Ramjan, storage at Van Andel Institute (VAI) in Grand Rapids, Mich., evolved from scattered silos to 2 PB of highly dense shared capacity, aided by DataDirect Networks (DDN) GRIDScaler GS7K storage arrays and Web Object Scaler (WOS) object storage. Ramjan built his current setup after determining VAI required a parallel file system for storage.
Two DDN GRIDScaler GS7K disk-based arrays currently handle performance and scalability at the 400-person research organization. The GS7K is based on IBM Spectrum Scale file storage, formerly known as the IBM General Parallel File System. VAI also chose DDN WOS for active archiving and collaborative research.
Scientists at the organization use high-tech instrumentation to explore genetic and molecular structure in an effort to pinpoint how healthy cells become cancerous. A growing area of genomics research involves epigenetics, a data-intensive discipline that examines how genes can directly alter DNA biological structures and, thus, influence a cell's protein production.
Ramjan's team inherited a mishmash of aging storage, including 60 TB of EMC Isilon NAS that had already maxed out. A 100 TB Nexenta ZFS-based NAS still has service time remaining on the contract. Older Dell Compellent -- now Dell EMC SC Series -- arrays used for corporate back-office storage remain for now.
"Institutionally, we knew we needed to consolidate our [research] storage to be able to handle the most computationally intensive problems we would encounter," Ramjan said.
A software developer by trade, Ramjan moved to VAI from the University of Southern California (USC) Cancer Center, where he helped select and implement DDN GRIDScaler storage.
From his experience at USC, Ramjan concluded that a single monolithic file system wouldn't cut it at VAI.
"Our workloads have many, many compute nodes hitting storage, all needing throughput at the same time," he said. "A parallel file system fits the bill, and we were able to narrow things down pretty quickly."
Ramjan said the incumbent, Dell EMC Isilon, was "crazy expensive" and eliminated from consideration almost immediately. He narrowed his choice to Red Hat's GlusterFS file system or IBM Spectrum Scale.
GRIDScaler presents files and objects for cloud archiving, collaboration
DDN specializes in high-performance computing storage for life sciences and other data-intensive research. In addition to DDN GRIDScaler appliances running Spectrum Scale, the vendor sells a line of branded Lustre-based storage arrays under its EXAScaler ES Series.
Zack Ramjanresearch computing architect, Van Andel Institute
Ramjan said he picked DDN GRIDScaler because the Spectrum Scale file system requires less specialized management expertise and resources than Gluster. Gluster would have required moving to InfiniBand networking, which Ramjan worried would stymie the ability to adopt private cloud storage. VAI's GRIDScaler arrays are connected by 40 Gigabit Ethernet.
"We wanted to keep it simple," he said. "Although I had no explicit request for it, I anticipated that our users might be interested in going to a private cloud. We wanted to plan ahead for what might be coming down the line."
DDN GRIDScaler presents files and objects together in a single federated namespace. VAI uses WOS for affordably tiering cold data from high-speed primary storage. DDN systems also include a native OpenStack driver that helped streamline VAI's OpenStack cloud integration.
"The OpenStack part was in our design from the get-go. The goal was to support both clustering and cloud computing using a parallel file system. DDN GRIDScaler with Spectrum Scale fit that requirement," Ramjan said.
Synchronous mirrors replicate data between the two DDN GRIDScaler arrays to support disaster recovery.
VAI started its DDN storage with 1.3 PB, increasing it to 2.1 PB earlier this year. Ramjan said he expects to scale storage to approximately 3 PB before the end of 2017.
As for the smattering of other storage, Ramjan said he sees no alternate uses that suit the organization's budget or research mission.
"We're scrapping all of them," he said. "I'll be taking the Isilon to the recycling bin soon. There is only going to one parallel file system for research."
Capacity growth drives NAS purchase decisions
Comparing and contrasting NAS and SAN
How object storage and scale-out NAS work unstructured data