Managing data with an object storage system
A comprehensive collection of articles, videos and more, hand-picked by our editors
The insatiable need for file-based primary data storage is propelling three technologies -- scale-out network-attached storage (NAS), object-based storage and the cloud as a NAS tier -- to the forefront as potential lifelines for IT shops overwhelmed by unstructured data.
Scale-out NAS systems can boost capacity, performance and availability with the addition of storage nodes or x86 servers equipped with a special operating system and storage. The most scalable of the clustered storage systems have the potential to manage petabytes of data across more than 100 nodes, but they’re accessed and managed as a single system through the use of a distributed file system or global namespace.
Object-based storage systems are another promising alternative to traditional NAS. Object storage foregoes traditional file systems, which have capacity and management shortcomings. Instead, these systems assign a unique identifier, or digital fingerprint, to each file plus its metadata. This identifier renders the physical location immaterial and provides massive scalability.
Using the cloud as a NAS tier is another option for IT shops coping with a flood of unstructured data. In particular, a lot of attention is gravitating toward a new wave of file-based gateway appliances that move data to a cloud service provider. These can be hardware or virtual appliances, and they can solve security and data access issues that make IT shops hesitant to use the public cloud.
Here’s what you need to know about these three NAS technologies as you plot out your file storage:
A traditional scale-up NAS box has a fixed amount of CPU, cache and drive slots. When it fills up, the customer needs to buy another device. Scale-out NAS systems appeal to organizations with huge files because of their potential for seemingly limitless expansion while still being managed as a single storage resource.
Also known as clustered NAS, scale-out NAS originally took aim at applications requiring high throughput and high bandwidth, such as those in media and entertainment, high-performance computing, bio-informatics, and oil and gas.
But these scale-out systems often weren’t tuned to perform well with the typical enterprise application, where EMC Corp. and NetApp Inc. held sway with their traditional NAS devices.
Terri McClure, a senior analyst at Enterprise Strategy Group (ESG) in Milford, Mass, said scale-out NAS tended to excel in environments with fewer numbers of unusually large files rather than the large number of small files the typical enterprise has. That made them a good choice for applications such as video streaming. But as scale-out vendors tune their systems to perform better with more I/O-intensive enterprise applications, their systems are starting to show up in more enterprise IT shops.
Scale-out NAS got a major shot in the arm late last year when EMC acquired Isilon Systems. Isilon offers three options: its S-Series aimed at I/O-intensive smaller files, its X-Series for fewer number of large files and its NL-Series for bulk high-capacity and low-performance storage.
Isilon’s 72000X has a maximum capacity of 10.4 PB in a single file system from a 144-node cluster. The company’s solid-state drive (SSD)-equipped S200 has a lower maximum capacity at 2 PB, but offers 85 Gbps of aggregate throughput and 1.2 million NFS IOPS in a single file system/volume from a 144-node cluster.
Isilon claims its distributed file system-centric system was built from the ground up for scale-out storage, whereas systems that make use of a global namespace require a software layer for scale-out NAS.
But Jeff Boles, a senior analyst and director, validation services at Hopkinton, Mass.-based Taneja Group, said the nuances of the architecture matter less to end users than the ease with which the system scales and whether multiple storage nodes can be managed as a single storage system.
“Scale out is still very new and innovative and proprietary,” Boles said. “Because it’s not as simple of an operation as building a controller head on an array, you’re not going to see a convergence of technologies around one best architecture.”
In addition to Isilon’s offering, other scale-out products include BlueArc Corp.’s Mercury and Titan Series Servers (which Hitachi Data Systems resells as the Hitachi NAS platform), Dell Inc.’s PowerVault NX3500 with a clustered file system acquired from Exanet, Hewlett-Packard (HP) Co.’s X9000 family (based on technology acquired from Ibrix), and IBM’s SONAS. NetApp has a cluster mode version of its Data Ontap 8 operating system (but not a clustered file system) and Quantum Corp.’s StorNext and Symantec Corp.’s FileStor are clustered file systems that run on hardware appliances.
Greg Schulz, founder and senior advisor at StorageIO Group in Stillwater, Minn., said some scale-out NAS products increase the number of nodes for parallel performance or large sequential streaming, while others optimize for concurrent access of multiple small random file or page views. Some focus on data storage capacity, and others emphasize clustered file systems or clustered nodes, he said.
More scale-out options are on the way. Dell, for instance, plans to use Exanet technology to add scale-out capabilities to its EqualLogic and Compellent SAN systems, according to Scott Sinclair, senior manager of Dell enterprise storage.
NetApp’s Brendon Howe, vice president and general manager of the NAS business unit, added via email that the company’s next-generation Ontap 8 Cluster-Mode is designed as a scale-out version of its unified architecture that extends to enterprise applications and virtualized data centers.
“We find that segmenting the scale-out discussion to just ‘NAS’ isn't that meaningful to customers,” Howe said.
Randy Kerns, a senior strategist at Evaluator Group Inc. in Broomfield, Colo., said although there are situations where scale-out NAS makes sense, there are also plenty of use cases where customers will prefer simpler traditional NAS.
“It may boil down to there’s a place for both,” Kerns said. “I think scale-out NAS and traditional NAS will both be around a long time.”
Object-based storage is hardly new. EMC pushed it into the forefront in 2002 with its Centera line in an attempt to stake out a new market known as content-addressable storage (CAS). But performance issues generally relegated the use of CAS products to archives of information that rarely if ever changed, such as medical images.
A new wave of object storage makes use of such protocols as Representational State Transfer (REST), and is gaining a second look for near-line and primary data storage -- especially in the cloud.
“There’s no technical barrier that says you can’t use object [storage] for primary storage,” said Andrew Reichman, a principal analyst at Cambridge, Mass.-based Forrester Research Inc. “Some primary storage is not that performance sensitive, especially with files.”
EMC now promotes Atmos for that purpose. Other object offerings include Caringo Inc.’s CAStor, DataDirect Networks Web Object Scaler (WOS), Dell’s DX Object Storage (which uses Caringo’s technology), NetApp’s (formerly Bycast) StorageGrid, and products from startups such as Amplidata, Cleversafe Inc., Mezeo Software and Scality.
“In the long run, we could see object [as] a replacement for file storage -- just a better way to do file storage,” Reichman said.
Object storage is attractive to cloud storage providers because of its massive scalability and shared tenancy features, especially in comparison to ordinary file- or block-based storage.
“You have so much metadata for each chunk of data, you can lock it down more easily and move it around based on policies and change the redundancy based on policies,” Reichman said, explaining the draw for cloud providers.
Using the public cloud as a NAS tier for primary storage is a much tougher sell for most IT shops than for backups or archives. But one of the emerging technologies that could start to make that prospect more palatable is the gateway that acts as a hybrid cloud storage appliance.
The appliances supply an on-premises cache that can provide access to the most active or frequently accessed data, so latency or network or cloud outages won’t prevent users from getting needed files. Algorithms determine which data to store in the cache.
Many of the appliances also offer data reduction technologies such as deduplication or compression to reduce bandwidth consumption and lower the fees associated with transferring data to and from the cloud. They also encrypt the data before sending it off-premises and offer extra features such as snapshots to lighten the load on backup systems.
Several startups currently rule the roost in the NAS hybrid cloud space and typically partner with prominent cloud storage providers. They include Ctera Networks Ltd., Nasuni Corp. and StorSimple Inc. Nasuni makes a software-based virtual NAS appliance that installs on a virtual machine (VM).
Another option is Nirvanix Inc.’s CloudNAS product, which can transform Linux or Windows servers into a virtual NAS gateway to the company’s Storage Delivery Network (SDN) encrypted off-site storage. Nirvanix uses standard protocols such as NFS, CIFS and FTP for access to its service.
Rick Villars, vice president of storage systems and executive strategies at Framingham, Mass.-based IDC, predicted that major NAS vendors such as EMC or NetApp will eventually provide the protocol support for a cloud tier in addition to their SSDs and SATA and SAS drives.
“We think that day is coming. It may not be this year. It may be parts of next year,” Villars said, acknowledging the business model challenges for the NAS vendors. “That’s the last step. That hasn’t happened yet, but there’s certainly no reason why they can’t. It would require some software. It would require some links. But you could absolutely add that function in.”