From Web-based video to scientific and medical images, "unstructured" data is fast becoming the primary data type...
of the 21st century. As data patterns change, the NAS systems that store files and other unstructured data are also evolving.
Market research firm IDC projects a 61.7% compound annual growth rate (CAGR) for unstructured data in traditional data centers from 2008 to 2012 vs. a CAGR of 21.8% for transactional data.
Another storage industry analyst firm, Milford, Mass.-based Enterprise Strategy Group, identified this growth as the primary problem CIOs and storage managers must contend with in 2010. "Unstructured data growth shows no signs of abating," wrote senior analyst Terri McClure in a research paper titled Unstructured data in 2010: Trends to Watch. "It will make up the bulk of data growth in the data center in 2010, driving IT to take a long hard look at unified storage platforms, scale-out NAS, and cloud storage services to alleviate the strain."
High-end NAS: Performance and data reduction in primary NAS storage
Traditional NAS systems have a capacity limit to the amount of file system space they can address. These systems also have a "head" or controller. These controllers traditionally have a limit to the performance they can achieve, dictated by the type and number of processors and cache used in each system.
As unstructured data has grown, several approaches to overcoming these limitations of traditional NAS have evolved. One is to add acceleration hardware in front of NAS systems in the form of additional cache or to use NAND capacity as cache.
Weta Digital, the New Zealand company that created digital effects for the blockbuster movie "Avatar," said the film's detailed animations required more horsepower than one clustered NAS system could provide on its own. To support the project, which included new breakthroughs in animating the faces of 3-D characters, Weta Digital set up a combination of BlueArc Corp.'s Titan clustered NAS arrays and NetApp Inc.'s FlexCache. FlexCache is designed to support applications like Weta Digital's render farm. It adapts to changing usage patterns by automatically replicating "hot" data through local caching volumes.
"We were using regular file servers for these texture files beforehand, but that required us to manually manage the replication. We'd have to have a copy of these texture files on lots of different file servers," said Paul Ryan, Weta Digital's chief technology officer (CTO).
Another approach to improve NAS performance is data reduction in primary NAS and nearline NAS systems, so the system has to store and serve less data to applications. Products in this space include Storwize Inc.'s STN series and Ocarina Networks Inc.'s ECOsystem appliances.
"Before we implemented Storwize, we had almost completely run out of storage space on the unstructured filers," wrote Daniel Gill, infrastructure test analyst at Allianz Insurance in Britain, in an email to SearchStorage.com. The company is rolling out Storwize's STN compression appliance with NetApp Inc.'s FAS3020c and FAS3050 in production, as well as a FAS6030c and FAS3050 at the firm's disaster recovery (DR) site. "By now, we have compressed 4.29 TB of existing data down to 1.73 TB. … We have also saved this storage on the three other backup environments [and] still have a further 6 TB of existing data remaining to compress," Gill added. "We predict we could defer purchasing disk for up to three years."
The low end: SMB NAS products get serious
Today, even low-end NAS systems can scale out; scale up to a midmarket or enterprise-class NAS architecture; and offer snapshots, replication and other advanced data protection features. Commodity hard disk drives (now at 2 TB) are so large, commodity processors so fast and data management features so robust that it's sometimes difficult to draw a clear line between systems meant for commercial use and those meant for "prosumers" and sold in retail stores.
"The lines are blurring," said Rich DePas, principal for an IT consulting business called Data Coordination and a network systems manager for a heavy-duty axle manufacturer located in the Midwest that he asked not be named because of policy forbidding him from mentioning it in the press. DePas said he started off using a NetGear ReadyNAS 1100 at his job with the axle manufacturer to support a new server virtualization project in Europe. "After that, I needed to replace aging equipment and servers in the demo office for my other business."
For that, he brought in a ReadyNAS 2100. At home, he has NetGear's ReadyNAS NV+ for personal storage.
"What I like about ReadyNAS is that there isn't a distinction between the SMB server and the [NV+] -- it's the same interface," he said.
To scale out or not to scale out?
Another approach to overcoming performance and scalability limitations of traditional NAS is to join file server nodes under a global namespace. Using this approach, also known as scale-out NAS or clustered NAS, capacity can be added to the pool of nodes without requiring data migration, and performance and capacity can be scaled independently.
Scale-out is becoming an expected feature of NAS systems and has the potential to change competition in the storage industry, though for now the market is still somewhat divided between the management simplicity and scalability of scale-out NAS systems and the advanced software features that have been developed for traditional NAS systems over time.
"It used to be with NAS you'd look at a couple of different parameters of scaling -- compute, network or throughput and capacity," said Nathan Day, CTO at SoftLayer Technologies, a cloud computing and storage service provider that uses scale-out NAS systems from Isilon Systems Inc. to support its multi-petabyte storage environment.
"With traditional NAS systems, if you exceeded any one of those parameters, you'd have to do a forklift upgrade to the next model. With Isilon as the back-end system, you can scale compute, network and capacity resources in different increments rather than ripping and replacing."
Still, another service provider is sticking with traditional NAS systems from NetApp and is content to wait for the company's Ontap 8.1 release, which is supposed to unite NetApp's software data protection features with the scale-out technology it acquired with Spinnaker Networks in 2003.
"Scale-out is something we've seen a lot of talk about -- we think it's definitely the future from a manageability perspective, but we haven't seen a huge overriding need for it in our business," said Jay Gagne, CTO at Razor Technology, a service provider with approximately 50 TB under management after using NetApp's primary storage data deduplication features. Gagne said features like data deduplication and space-efficient snapshots are more important in his environment than the ability to scale out right now, and they stave off capacity crunches by reducing the amount of data on the company's filers.
File virtualization fades
File virtualization products are software-based tools, sometimes delivered as appliances, which were originally marketed as a means of achieving scale-out manageability of multiple, separate, traditional NAS systems by layering a third-party's global namespace over existing NAS nodes. While there are still some products being used in that capacity, particularly F5 Networks Inc.'s ARX switch and AutoVirt Inc.'s AutoVirt, other file virtualization vendors such as NeoPath Networks and Attune Networks have either found a new niche for data migration or have taken their products off the market entirely.
EMC refocused its Rainfinity file virtualization product as a data migration add-on for tiered storage and archiving with its Celerra multiprotocol platform, and storage networking vendor Brocade Communications Systems Inc. discontinued its StorageX file virtualization platform based on its 2006 acquisition of NuView Systems Inc.
"It's tough to go buy an expensive NAS system with intelligence built in, and then lay file virtualization from someone else over it," said Andrew Reichman, a senior analyst at Cambridge, Mass.-based Forrester Research. "Some financial services customers did seem to buy, but file virtualization seems to have plateaued as cluster vendors get more ability to set up different zones with various disk and performance densities as part of the NAS system."
When NAS isn't just NAS: Multiprotocol or unified storage
The concept of unified block and file storage in one system isn't new -- it's at least as old as NetApp's write anywhere file layout (WAFL). More recently other players followed into multiprotocol or unified storage, especially in the midmarket. EMC repositioned its Celerra line from NAS to multiprotocol storage by adding iSCSI capability for block storage, and SAN vendors such as Dell EqualLogic and Compellent added Windows Storage Server-based NAS gateways to their block-based storage systems.
"Below a certain company size, it doesn't make sense to have to manage two different systems just because of different access methods," Forrester Research's Reichman said. "As you get into hundreds of terabytes, you're still probably going to have multiple boxes."
Open source storage finds a niche in scale-out NAS
The chief power behind open source NAS is Sun Microsystems' Zettabyte File System (ZFS), a 128-bit file system whose code has been made a part of OpenSolaris and been opened for community development. Sun has been acquired by the much more proprietary Oracle Corp., which has said it intends to market ZFS-based storage products.
In the meantime, with the code open-sourced, other companies are free to develop and manipulate it to create customized NAS systems. These companies include storage vendors like Nexenta Systems Inc., which markets a scale-out version of ZFS with inline data deduplication for primary storage and added support for virtual servers.
End users are also free to modify the code to suit their use case. "Nexenta allowed us to offer a centralized email setup and redundant storage for customers more cheaply and effectively than proprietary competitors," said Jeremy Miller, head of virtual private server operations at service provider Site5. "It also allowed us to choose our storage on the back end; choose from multiple storage access types, including iSCSI, NFS and CIFS; and customize the code ourselves by editing scripts."
While it's unclear whether Oracle will try to claw back ZFS from the open-source community, Nexenta's CEO said that horse has left the barn.
Unstructured data encompasses not only primary data organized into files, but all the files associated with corporate knowledge and data archives maintained for compliance and storage efficiency purposes. As this data continues to balloon with ever more lengthy retention regulations being applied in multiple vertical markets, companies already squeezed by last year's recession are looking for ways to offload this persistent data. Cloud NAS -- delivered over the Internet by a service provider -- is evolving to meet these needs.
Other IT organizations, like the National Sept. 11 Memorial & Museum, also need a cost-effective way to share growing numbers of files among various users and business partners. The Sept. 11 Memorial organization has some 10 TB of multimedia data to work with in designing its exhibits so far, said Sean Anderson, director of IT, and expects another 30 TB by the end of this year. "We need to share the content with consultants and designers, and might need to accommodate donated catalogs [of media] overnight," Anderson said. "Nirvanix CloudNAS can scale quickly to offer offsite storage for those files and also lets us share them more easily with partners."
Despite a year of hype around cloud data storage in 2009, enterprise data storage pros have so far been reluctant to migrate much data to service provider data centers.
Most recently, the concept of the "hybrid cloud" -- one that leaves some data cached locally at the user's data center while storing the bulk of it in the cloud -- has been gaining more steam, with the launch of startups like Avere Inc. and Nasuni Corp. that specialize in advanced caching for NAS systems. "I don't buy the current cloud conversation," said Jeff Boles, a senior analyst and director, validation services at Hopkinton, Mass.-based Taneja Group. "But I think we're going to see much more intelligent options for local caching and the evolution of the cloud platform through gateway opportunities."
Object-based storage: The next NAS frontier?
Object storage isn't a new concept in the NAS world, but some new products are bypassing traditional file system interfaces as industry debate emerges about the best way to cope with unstructured data.
Traditional hierarchical file systems organize data into "trees" consisting of directories, folders, subfolders and files. A problem arises, however, when a traditional file system, which has a theoretically limited number of files it can address in a single directory and tracks only simple metadata, runs into massive repositories of similar files. With an object ID replacing a file name, more extensive data can accompany an object, and detailed policies can be applied to objects for more efficient and automated management.