News Stay informed about the latest enterprise technology news and product updates.

Johns Hopkins selects Caringo CAS software for data archiving

With its data archives growing by terabytes a week, the research center at Johns Hopkins University uses Caringo CAStor clusters to manage archived and active data.

A research center at Johns Hopkins University turned to Caringo Inc.'s CAStor content-addressed storage (CAS) software to provide data archiving and also to manage its sensitive and rapidly expanding genotyping data.

The Center of Inherited Disease Research (CIDR) provides genotyping and statistical genetics services for investigators trying to identify genes that contribute to human disease. The work of CIDR is, to put it bluntly, a data hog. As part of its research, CIDR might scan up to 12 DNA samples on one slide, according to Lee Watkins, Jr., the Center's director of Bioinformatics. One sample can produce files ranging from 2 GB to 4 GB.

CIDR uses CAStor to archive the data and delete it from the Windows file share. With data from tens of thousands of DNA samples in its system, the archive builds up fast. The Baltimore-based CIDR often generates terabytes of data a week, sometimes hitting a terabyte in one day. The Center used high-capacity PetaBox systems from Capricorn Technologies to store the data, but last summer the 50-person research team realized they needed help managing it all.

More on archiving
The enterprise archive of tomorrow

Atempo, Nirvanix offer cloud-based file archiving
Symantec shops hail Enterprise Vault archiving software

Bridgehead archiving app can search and index PACS
"We knew we needed to have an archiving strategy," Watkins said. "Keeping up with all the data became unmanageable. People wanted to recover files by project, keeping track of which files go with each slide scanned."

But perhaps the hardest part was finding technology that wouldn't deplete the budget. "We're well-funded, but we can't go out and buy a system from EMC or Hitachi to do this," Watkins said. "We said, 'There has to be somebody who has written software that can keep track of this.'"

CIDR became aware of Caringo through Capricorn. Caringo gave CIDR a free trial period to test CAStor. CAStor passed the test and CIDR became a paying customer last November. The Center started with a 30 TB CAStor cluster and is now up to a 99.9 TB cluster with 80 TB used. . .and is still growing.

To keep up with its data growth, the Center is installing a high-density Rackable Systems array for more capacity and will install CAStor clusters on that as well. This new set-up is scheduled to go live in August.

At first, CAStor had trouble keeping up with the data that the Center was throwing at the clusters. "It wasn't 100% robust," Watkins said. "There were cases where a disk wouldn't fail but it would stop performing and act weird, give us little hiccups now and then. They wrote a fix a few months ago, and we haven't had that problem."

Derek Gascon, Caringo marketing vice president, said, "They wanted to have disk capacity freed up much quicker, so we put together a new version for them that includes a faster turnaround in releasing disk capacity." That fix is now included in the general release of the product.

According to Watkins, no relief from data growth is in sight. "Our plan is to keep data online for a year," he said. "We haven't gotten to that point yet where we've released projects, so we can't predict our high water mark. But we suspect it will be between 300 and 400 TB."

CIDR keeps its data on tape for long-term archiving, but uses CAStor for active data. "We've had to recover a lot of stuff we didn't think we would have to recover, and it's there," Watkins said. "What we were doing before was not scalable, and we couldn't keep track of everything. We had to do everything on a separate storage device. "Now it's simple as simple can be," he said. "You need more storage, add another storage device, boot up from a NetBoot server and you're done."

Watkins said CAStor has also helped provide disaster recovery, surviving various mechanical failures, and even a flood in the lab where the clusters were temporarily installed while CIDR was expanding its server. "We've had random disk failures, and power failures where all the nodes went down and we had to power it back up," he said. "We never had a problem with that, which is amazing to me."

Dig Deeper on Object storage

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.