Cornell University and e-commerce site Shopzilla are among the early adopters of primary storage data reduction to consolidate their storage and keep up with data growth. Both use compression appliances from startups. Ithaca, N.Y-based Cornell runs Ocarina Networks ECOsystem appliances, and Shopzilla has Storwize Inc.'s STN-6000 device.
Cornell looks to consolidate storage, appeal to internal clients
Cornell University's Center for Advanced Computing (CAC) began testing Ocarina Network's ECOsystem appliances in January, and found new algorithms geared specifically to files used in life sciences applications delivered 50% compression. Cornell now deploys Ocarina with its S2A9700 disk arrays from DataDirect Networks Inc.
"The main thing we were concerned about was the data explosion we're experiencing," said David Lifka, director of CAC. Astronomy applications that store data at CAC generate up to 1 TB per day and life sciences apps contribute up to 100 GB per day.
CAC is also trying to create economies of scale for the entire campus by getting other departments in the university to store research data on its storage. "We're hoping researchers will put their data on centralized storage devices as opposed to spending excessive amounts of money on terabyte USB drives that are deployed in silos," Lifka said. "Siloed technologies cost the university money. They reduce scalability and cost more to maintain."
With Ocarina Networks, the effective cost of a terabyte is approximately $500, a price point that's appealing to CAC's clients. At the rate researchers are signing on, Lifka said, CAC may have to add another 100 TB to its DataDirect arrays this summer.
If that happens, the plan is to use 2 TB drives so that the new capacity won't consume as much floor space in the data center. "We can leverage new technologies without replacing all of our older gear," Lifka said.
Lifka said he's had to do some tuning among the file systems stored centrally on the DataDirect Networks arrays, but "that's true of any common storage system." As for Ocarina, he's happy with its performance for NFS files, but "Windows clients are important, especially in life sciences, and right now they're not able to take full advantage of the compression capability."
Shopzilla: Storwize compresses nearline data
Shopzilla storage manager Robert Laureano said Storwize's STN-6000 enterprise appliance has been in production at the e-commerce site's data center for approximately a year, and is compressing files on the company's nearline BlueArc Corp. and OnStor Inc. network-attached storage (NAS) array and gateway (OnStor's Bobcat NAS gateway fronts IBM's XIV Storage System in Shopzilla's data center ).
"We send about 50 terabytes per day through the devices and get about a 50% reduction on Sybase and Oracle files," Laureano said. Storwize claims it can perform fast enough to front tier 1 primary storage, but Laureano said "we just haven't had a need for that right now with our tier 1 data warehouse boxes."
As for the nearline boxes, Laureano said in some cases throughput to the NAS devices is actually faster with Storwize in front of them. He estimates that they're 10% faster on write speeds. "Reads are also a little bit faster as there's less info pulled from disk," he said.
Laureano said he first encountered Storwize at a storage networking conference in 2007 and soon brought the product in as a proof of concept.
There have been some lessons learned as the deployment progressed, he said. "The way we use it is a little bit complicated," he said. "We use VLAN [virtual LAN] tagging, which is supported, but it's a little bit tricky." Access control lists for each NFS file system had to be ported into a bridge file system running on the Storwize box. "We had to make sure to also add VLAN info into our Storwize bridge and then add the bridge IP info to our NFS ACLs so it would remain transparent to the client," Laureano said.
Laureano said he'd like to see more reporting features with the Storwize appliances. "Something that just generates an email with the status of how much data it's processing and how much space we're saving – something more proactive that will let me know what's going on with the box instead of requiring a query," he said.