Object storage has been an ideal fit for the big data needs of the Integrated Imaging Program at Albert Einstein...
College of Medicine, although the technology wasn't even on the New York institution's radar a little more than a year ago.
Shailesh Shenoy, the program's director of engineering and operations, knew he had a problem to solve. The Integrated Imaging Program's (IIP) microscopes were generating approximately 1 TB per week of specimen images, and he had limited resources and scant IT personnel to deal with data storage.
"I couldn't keep up with maintaining LUNs [logical unit numbers] … with RAID sets and scaling our existing systems," said Shenoy, whose core expertise is biomedical research, not data storage.
Shenoy first reached out to major vendors such as EMC and Dell in hopes of finding a turnkey storage system to replace the IIP's VTrak Ex10 Series from Promise Technology Inc. A colleague's suggestion subsequently led him to DataDirect Networks (DDN) Inc., and he never looked any farther than DDN's Web Object Scaler (WOS).
The college's need for a long-term data repository presented a classic use case for the new generation of object-oriented storage products. The imaging program already had about 30 TB of immutable images, and there was no end in sight to the growth of unstructured data. Shenoy wanted a storage system that would be easy to scale, manage and maintain, and that would provide replication for dual sites and afford the opportunity for data sharing with institutional collaborators.
"The ability to scale easily is important to us because I don't want to ever deal with having to reformat our arrays or do any of the stuff that many typical systems force us to do," Shenoy said. "Here, I don't have to worry about a file system and LUNs. With the object storage, basically I can just keep growing by adding more drives, and that is very appealing in an operation like ours where I cannot count on having dedicated storage experts."
The college replaced its file-based storage with two DDN WOS 6000 appliances capable of storing 180 TB each. One is located on the north side of the campus, and the other is on the south side for data protection purposes. The DDN WOS handles replication between the two appliances, each of which has two nodes. So far, the appliances store approximately 100 TB of images.
"I don't have to worry about any additional software programs to make sure the replication is working," Shenoy said. "With the WOS, it just happens on its own. I write an object to Zone A, and then it automatically replicates to Zone B."
Another advantage is that object storage assigns a unique identifier with associated metadata to each image or piece of data. Shenoy said the system not only ensures immutability, but eliminates the need to maintain a directory structure with access control lists.
The main challenge in deploying object storage was writing to DDN's HTTP/REST and WOS Library (WOSLib) application programming interfaces (APIs) in order to allow the program's image-capturing and data analysis applications to use the WOS system. Shenoy said he spent four or five hours remodeling the application workflow and doing the necessary programming to allow the core set of microscopes to connect to WOS. He hasn't had time to remodel the workflow for the rest of the organization's applications.
"For people who know little or nothing about storage, they may assume they just plug this in and then set some permissions," Shenoy said. "That's not the way this works. They really have to understand how object storage works and the fact that they would have to formulate a way for all of their systems to address the storage directly."
DDN offers the WOS Access file system gateway to allow customers to use familiar file-based interfaces -- NFS and CIFS, now known as Server Message Block -- to access DDN WOS object storage. But there is a performance impact, which DDN's David Martin, WOS product line manager, said tends to be greater for small files than for large files.
"We tried the [file system] gateway, and it was a little slow, and it's klugey," Shenoy said. "Basically, what you're trying to create with a gateway is an environment that to the user is not object-oriented. If that's really the goal, then you don't need object-oriented storage. I guess it's useful for people who want some sort of transition, but fundamentally, I don't think it makes sense."
Two knocks on object storage systems have been performance and proprietary APIs, but neither has been a problem for Shenoy. He said WOS performance has been good, and DDN's proprietary WOSLib and HTTP/REST APIs don't concern him. Some of IIP's collaborators also use WOS.
"It's not an issue for me as long as DDN continues to support standards," Shenoy said. "As long as I have the object ID number, whatever API they want to create to access it, that'll come. It's all firmware."
The college's end users no longer click on a network file share mounted to their desktop computers to dig through folders and locate images by their file names. They now open a Java client that connects to the open source OMERO microscope image management system, which maintains a database of all the specimen images that researchers can organize by experiment name, date and/or thumbnail description.
Shenoy said he is piloting the OMERO software and examining the open source Integrated Rule-Oriented Data System to provide a level of abstraction between the DDN WOS storage and the researchers.
"Users really don't know that the objects are stored in WOS. It's a file to them that they just click on and see," he said.
Moving forward, Shenoy said he is looking into hooking additional applications into the DDN object storage now that many of his colleagues have become interested in it. He laughed at his initial qualms over remodeling the application workflow to address the object storage through APIs.
"Ultimately, the benefits are what swayed me to overcome my hesitation to do something that I considered so different," he said. "And now I consider it to be just a no-brainer."