This article can also be found in the Premium Editorial Download "Storage magazine: Overview of top tape backup options for midrange systems and networking segments."
Download it now to read this article plus other related content.
|Sharing is a nice thing to do|
|Companies in the R&D space are - like the rest|
| of corporate America - struggling to learn how to make all their storage resources available across the board. They're coming up with different ways to approach the problem.
CardioNow, based in Encinitas, CA, provides cardiologists with the ability to share cardiograms with colleagues across the globe. Each cardiogram ranges from 70MB to 130MB, depending on how the input device is set up and how detailed the cardiogram gets.
At the core of CardioNow's setup is a DataDirect Networks' Silicon Storage Appliance, a high-speed RAID system. The system, now storing 1.5TB of information, is set up at a data center managed by a hosting vendor in San Diego, CA. After performing the cardiogram the test goes through a T1-based virtual private network (VPN) into the DataDirect box. After logging on with a password, users and guests can access the cardiogram via the Internet, and download the compressed image for viewing in a browser.
At Anadarko Petroleum, Houston, TX, the information is so proprietary that their goal has been to show it to as few people as possible. Also, most of the people who need to view the data work out of the main office, in one location. Nevertheless, the company is starting to look into working with a data service provider to be able to easily share information around the globe. The project, says Ken Nadolny, manager of exploration and production systems at Anadarko, is in the very early stages.
John Reynders, vice president of information systems at Celera Genomics, Rockville, MD, says his goal is to share more information among business units via a global SAN based in the corporate data center. "We're discovering that different parts of the business pull data together in different fashions, so we might have different data from multiple projects exposed all at once." He says that he's accepted, as inevitable, the notion that consolidation brings with it some amount of replicated data. "But if it's consolidated, then the overhead is amortized over many more servers" than in the traditional storage model.
One of the chief goals of the Cambridge, MA-based Genome Center at the Whitehead Institute, "is to get rapid access to data, regardless of where it's stored," says Michael Zody, manager of sequencing informatics at the center. For some of the biggest research algorithms, the center is still working off of direct-attached storage.
The center would really like to get to an environment where one could attach hosts to a SAN and then do file-level sharing. But to do this would require them to use different protocols for file-level access in a SAN - and "that's what NAS does," says K.M. Peterson, manager of computer operations for the center. He calls this the holy grail of storage systems: to be able to plug something in and access files with the efficiency of a SAN. "But we're a long way from that," he says.
Indeed, if there's anything that separates those who work in storage-intensive research and development (R&D) environments from less storage-hungry companies, it's more in the way R&D shops approach storage - as opposed to the kind of gear they use. These high-need R&D centers are looking at storage as an enterprise-level architecture, as opposed to seeing storage in the traditional application-by-application, piecemeal kind of way.
Enterprise-level means building storage architectures that can be shared among multiple users in a distributed network. For now, these shops are clearly on the vanguard and in the minority (see "Sharing is a nice thing to do" sidebar). And many of the largest shops - although certainly not all - use hierarchical storage management (HSM) to help with their backup and archival requirements, and to keep their disk capacity clear for the highest-priority data.
The R&D gang does something else, too. They segment their storage needs into what kinds of information they have, regardless of the type of application the information is used for, and then they select the most appropriate technologies to support those different needs. So it's not unusual to have, say, both a network-attached storage (NAS) and a storage area network (SAN) setup in these shops.
Michael Peterson, president of Strategic Research Corp., a consultancy in Santa Barbara, CA, says this kind of thinking represents a real shift in the industry. He calls this model application-intelligent storage, which is when storage systems become applications themselves by managing bandwidth, resource allocation, scalability, security and other things behind the scenes.
Application-intelligent storage also calls for optimizing storage for specific types of applications.
Although Peterson likes to call this smart storage - which takes the next step beyond enterprise storage and is shared and virtualized more than ever before - he believes it's still essentially stupid because the storage software really can't understand what type of data it's storing, and make appropriate decisions based on this knowledge.
One company that has started to implement much of this application-intelligent model - although they're not calling it that - is Anadarko Petroleum Corp., Houston, TX. Anadarko is one of the world's largest independent oil and gas exploration and production companies. In the past two years, Anadarko has begun to install various types of storage technology for different kinds of applications, and now has a NAS for its seismic data and a SAN for its relational database.
"We found that the NAS and the SAN are optimized for different things," says Joan Dunn, manager of enterprise computing at Anadarko. The company has 110TB of primarily seismic data stored in its Network Appliance Filer servers.
Ken Nadolny, manager of exploration and production systems at Anadarko, says seismic data is akin to a "sonogram of a pregnancy, except it's a sonogram of the earth. But the quality is about as good as the medical kind." Essentially, seismologists use sound waves to transmit energy through the earth, analyze the way those sound waves look after they come back out and hopefully discover if there are any oil reserves hidden beneath. For each square mile explored, one can easily record 300GB of data, according to Nadolny. The company is currently doing exploration in the Gulf of Mexico - all 615,000 square miles of it.
Currently, Anadarko has approximately 15% of its data stored on standalone servers, 4% on an EMC-based SAN and approximately 81% of its information on the NAS, Dunn says, adding that their storage requirements generally double every six to 12 months. "That's why the flexibility and expandability of storage is important," she says. "We believe the combination we have today will serve our needs for the foreseeable future."
It's not unusual for the company to decide it needs to explore a huge amount of property in a hurry, she says. "We have to be prepared to support - on real short notice - the storage of new data." Looking toward the future, Dunn says they would like to pursue HSM "to apply intelligence to manage files and usage, to minimize cost and improve performance."
Although in an entirely different industry, the Celera Genomics Group, Rockville, MD, struggles with many of the same storage-related issues as Anadarko. Celera has two major businesses - one which provides information about the human genome to researchers around the globe, and the other that uses genetic information to help identify possible therapies for different cancer-related diseases.
John Reynders, vice president of information systems at Celera, says the company is in the process of moving from SAN islands to a global storage network that will facilitate sharing. Right now the firm is a straight Compaq shop, using Compaq's AlphaServer ES40 boxes and StorageWorks arrays and SANs. All together, Celera has approximately 100TB of spinning disk and another 150TB of tertiary storage managed by UniTree HSM software.
Although Celera hasn't selected which SAN it will use, the firm did some testing of NAS vs. SAN. "We found better consolidation, and better connection of all servers at a higher connection speed with the SAN approach vs. the NAS," Reynders says. Celera's fiscal year starts in July, and plans are to unfold the new SAN now.
Reynders uses two other techniques that assist him in forecasting the firm's storage needs. One is activity-based costing (ABC), an accepted means within the financial community of figuring out moneymakers or losers, and where the break-even point may be. The only difference is Reynders does this for storage. "We can fairly accurately predict what each business requires, per project, for storage," he says.
Reynders also monitors which databases are used and helps decide what information needs to be safely archived because it hasn't been accessed in a while. "We don't know much about what's being asked," he says, "but we know what classes of data are changing more rapidly than other classes. So we know what to leave in cache, and what to move." One particular genome, for instance, may need to stay active, while another can safely be archived.
Another leading-edge research organization, the Center for Genome Research at the Whitehead Institute, Cambridge, MA, hasn't implemented HSM for its 21TB of data. "But I would kill for an HSM system that works," says K.M. Peterson, manager of computer systems operations for the center. "We've talked to several vendors that claimed they could solve our problem, but the volume of data transferring [we must do] to tertiary storage is just too large."
This was first published in July 2002