You may already be sold on the concept of scale-out NAS, but scale-out systems vary widely and you’ll have plenty of decisions to make before buying one.
Scale-out network-attached storage (NAS) has arrived. But if you said it’s been around for a while, you’d also be right -- sort of. But using a clustered file system and building NAS around it was far from being real NAS. NAS implies simplicity, and those home-cooked systems never quite fulfilled that requirement.
Isilon, now a part of EMC, is probably most responsible for making scale-out NAS a reality. Isilon came to market approximately a decade ago and struggled to educate us on the virtues of scale-out architectures. It was an uphill climb as NetApp and EMC, lacking such an architecture, trivialized its need. But Isilon prevailed.
NetApp recognized the potential of scale-out and bought Spinnaker Networks in 2003. It took a while to get everything integrated, but NetApp is now fully in the scale-out NAS game. Dell is, too, after acquiring Exanet, and offers PowerVault or EqualLogic storage behind Exanet software. Hewlett-Packard (HP) also went the acquisition route and picked up Ibrix, which it uses in front of LeftHand Networks or 3PAR storage. IBM used its own General Parallel File System (GPFS) as the basis of its Scale Out Network Attached Storage (SONAS). Hitachi recently bought BlueArc, which, for all practical purposes, also had a scale-out NAS offering. Smaller players, including DataDirect Networks (DDN) and Panasas, are also in the market. Scale Computing, focused on the small- and medium-sized business/small- and medium-sized enterprise (SMB/SME) market, has a scale-out product that uses IBM’s GPFS on top of its own scale-out block offering.
Every major storage player now has a scale-out NAS product and they’re enthusiastically behind the architecture. But all scale-out NAS systems aren’t the same. Here are some things to consider if you’re shopping for one.
Just as one vendor’s block array differs from another’s, scale-out NAS products vary from vendor to vendor. Differences include:
- Scalability, how capacity is added, scaling capacity vs. performance, new node assimilation and data redistribution
- Minimum configuration
- Number and types of nodes, amount of storage with each node
- Throughput-centric vs. IOPS-centric vs. balanced
- System manageability, ability to partition system
- Single file throughput, single file system throughput
- Impact on performance on loss of a node, number of nodes that can be lost without losing data
- How data is protected internally, rebuild times when the system is vulnerable and how systems are backed up
It would take too long (and too much space) to go into detail on each of these factors, but your next strategic purchase of NAS will likely be a scale-out system so you should be prepared. Each vendor will claim its systems are infinitely scalable, and they’ll all be wrong. Ignore the theoretical limits and just focus on what the system’s practical limits are.
One of the most important considerations is whether the applications you run on the scale-out system are throughput-centric or IOPS-centric. When Isilon first appeared, it was targeted squarely at the media and entertainment market, which means storing and accessing very large audio and video files (applications that are throughput-centric), so IOPS aren’t that important. But if you’re dealing with a large number of small files and requests from thousands of users, it’s all about IOPS.
All scale-out NAS systems have a global namespace, but there are differences under the covers. NetApp, for instance, aggregates smaller namespaces into a global one, whereas EMC Isilon creates a single namespace with its OneFS file system. This distinction may not be relevant for all IT buyers, but you should be aware of the difference.
Be particularly careful about configuration starting points. If an individual node is very powerful and you need three nodes minimum to start, then the starting price may be out of range. But if the node is too small for your applications, you may need too many to build a reasonable system when you consider power, heat, space and cost.
It’s crucial that the system you choose be able to handle the number of files you’ll want to store over time. If the number is in the billions (as with a Web 2.0 application or an archival system), you need to be very particular about the system you buy. Very few systems can deal with such high numbers today. That’s why so many public clouds are built on object-based designs rather than clustered file systems.
Finally the consensus is that scale-out is the way to go, so you might as well accept that premise. But you’ll still have to figure out which one is right for your environment. Hopefully, these tips will put you on track to ask all the right questions.
BIO: Arun Taneja is founder and president of Taneja Group, an analyst and consulting group focused on storage and storage-centric server technologies.