With file storage sprouting up like weeds, data storage shops are grappling with managing multiple disparate NAS...
systems. But you can fight NAS sprawl with a number of technologies.
At the end of 2008, Framingham, Mass.-based research firm IDC reported that for the first time ever, more data was stored on network-attached storage (NAS) systems or filers than on storage-area network (SAN) storage. In addition, IDC's more recent forecasts predict an acceleration of this trend. It's not only the number of files growing, but their size as well.
All of this translates into more installed NAS systems. Adding more NAS systems is an understandable reaction to file growth as network-attached storage systems are typically self-contained and preconfigured for rapid installation, and are easy to implement, operate, manage and use. But most traditional NAS systems are also silos, so they contribute to NAS sprawl. The consequences of NAS sprawl can be summed up by the often-repeated adage, "I loved my first NAS filer, I really liked my second, but by my tenth I was pulling my hair out."
Five ways NAS sprawl causes problems
NAS sprawl generally creates five major IT challenges (these are the biggies; there are others as well). All of them are complicated by the limited number of tasks a data storage administrator can complete in a given time, and they're all pretty difficult.
1. System management. Even though NAS management is far simpler than SAN storage management, it still requires some care, feeding and time.
2. Managing client and application access to data. Each NAS system must be mounted on every server and workstation that requires access. Mounts are application disruptive so they require scheduling downtime for the server applications. With more NAS systems you have more mounts, and that adds up to more scheduled downtime.
3. File location. Policies for file placement must be set based on performance, accessibility, age, access frequency, storage cost, availability, data protection and so forth. Policy setting is the easy part, but actually moving the files to the appropriate NAS system is a time-consuming manual data migration process. And it's an ongoing one. When the migration is done, the originating application must be re-pointed at the correct NAS system; this isn't such a big deal with a couple of NAS systems, but it's compounded as NAS systems are added.
4. NAS load balancing. Load balancing is required to get better utilization or to meet applications' performance requirements. Because load balancing is also a manual process to set up and manage, it becomes a major time sink even if you have identically configured NAS boxes.
5. Protecting, replicating and/or backing up files. Different NAS systems have different methods for snapshots, continuous data protection (CDP), mirroring and replication. Some are well integrated with common backup vendors, such as Windows Volume Shadow Copy Service (VSS), VMware or Citrix Systems Inc.'s XenServer, but others aren't. So there are more tasks requiring more time, training and experience. Even with identical NAS systems, they still require separate touch points for each data protection setup, operation and management.
These challenges get more difficult, take more time and make it more likely that errors will occur as NAS sprawl grows.
Technologies that can help with NAS sprawl
The industry took note and recognized this sober situation. The result is the current availability of four technologies designed to solve some or all of these challenges, albeit in completely different ways. They include: operating system built-ins such as Microsoft's Distributed File System (DFS) for CIFS as well as Linux/Unix automounters for NFS; file virtualization systems; clustered NAS systems; and private cloud and grid storage. A brief analysis of each of these technologies illustrates what they do and don't do to meet the aforementioned challenges.
1. Operating system built-ins
Microsoft Distributed File System (DFS) is part of Microsoft's Windows 2003 and 2008 server operating systems; DFS was developed for the small- and medium-sized business (SMB) Windows-only (CIFS) market. DFS Namespaces enables multiple file servers' shared folders to be grouped into one or more logical namespaces. Users see the namespace as a single shared folder and are automatically connected to shared folders in the same available Active Directory domain services site. This sidesteps the need for LAN or WAN routing. DFS Replication can automatically synchronize folders between local file servers or remote NAS systems on a wide-area network.
- Easy integration with Windows environments
- Familiar to Windows administrators
- No additional licensing costs
- Low upfront total costs
- Solves user access, mount/unmount, load balancing, data migration as well as some data protection challenges of multiple CIFS NAS systems
- Requires a relatively high level of Windows expertise
- Loose file synchronization among different servers, especially when geographically dispersed; a user at a remote location may access a file before it's updated
- Only works with CIFS (not NFS)
- Limited scalability; not architected to scale to large numbers of file servers
- Doesn't provide file-level granularity
- Doesn't work with non-Windows-based NAS systems
- Poor storage utilization because of large numbers of duplicate files
- Can require additional hardware infrastructure to meet performance requirements
- Doesn't solve issue of managing multiple NAS or filer systems; doesn't address data migration and many of the data protection challenges
Linux/Unix automounters are intended for NFS users. Automounters mount and unmount directories from other systems on the network as they're needed. They get their mounting instructions from centralized maps, which can be flat files, NIS maps or sections of an LDAP directory. Automounters are far easier to use than managing multiple static NFS mounts. Automounter advantages are readily apparent when there's a service failure. If a remote file server becomes unavailable, an automounter will simply time out and unmount the directory without alarming users. With static NFS, mounts will hang until the file server is back up and running again.
- Easy integration with Linux and Unix environments
- No additional licensing costs
- Low upfront total costs
- Eliminates server hangs when the mounted service fails
- Works in conjunction with most NFS NAS systems
- Solves many of the user and mount issues with multiple NFS NAS systems
- Requires significant Linux or Unix expertise
- Not easy to set up
- Only works with NFS
- Lacks built-in replication capabilities
- Doesn't work with Windows (CIFS)-based NAS systems
- No file-level granularity
- Doesn't solve many of the management problems with multiple NAS systems
2. File virtualization systems
File virtualization systems separate the physical location of a file from the representation of that file. File virtualization systems essentially eliminate the requirement for a user or application to know exactly where their files are stored as they see only a single global namespace (GNS.) Depending on how it's implemented, file virtualization allows transparent file access, load balancing, data storage tiering, file migration, and even snapshots and replication for multiple homogeneous or heterogeneous NAS systems.
File virtualization implementations can usually leverage Microsoft's DFS and/or Linux/Unix automounters by acting as a management layer. This allows them to automatically update the DFS Namespace to include NAS filers and file servers, while also providing common management for multiple dissimilar NAS systems. F5 Network Inc.'s ARX file virtualization appliance also provides available disk space monitoring, while others (Avere Systems Inc.'s FXT Series and EMC Corp.'s Celerra NS with FAST) provide storage tiering.
No additional software is required to leverage DFS Namespace and Linux/Unix automounters. If the file virtualization technology fails, the file maps for Windows and mounts for Linux/Unix remain intact, allowing users and applications access to their files. Not all the file virtualization systems work with DFS or automounters, and some that do don't necessarily require them.
There are two types of file virtualization products: shared path and split path.
Shared-path file virtualization systems share the control and data path, which means that all connections to the NAS and all data to/from the NAS flow through the virtualization system. Shared-path file virtualization systems are full proxies that touch every file and every packet in the path before it's written or read.
- Allows files to be migrated in real-time even when in use; the file virtualization system updates the global namespace with the new physical location of the file
- Easy to operate
- Protects current investment
- Transparent retirement of older NAS or file systems
- Individual file-level granularity
- Heterogeneous NAS and/or file server support; eliminates NAS system lock-in
- Definable policies using file metadata such as file type, creation date or when last accessed
- Added latency to pass through file virtualization system can be a bottleneck affecting response times and IOPS
- Single point of failure; a dead-box failure cuts off all access to the NAS and/or file systems
- Scalability is limited by the throughput of the shared-path file virtualization system
Split-path file virtualization systems separate the control and data paths, so the NAS connections and all data to/from the NAS don't pass through the file virtualization system. Split-path file virtualization is typically deployed as an x86 appliance connected to the LAN switch. They manage the namespace to direct files to the appropriate NAS or file system without intercepting any packets.
- Nondisruptive implementation for applications/users
- Highly scalable
- File virtualization system failure won't cut off access to data
- Protects current investment in NAS and file systems
- Relatively easy file migration
- If it uses Microsoft DFS for the namespace, DFS will always have the most recent namespace configuration allowing users and applications to access their files
- Heterogeneous NAS support
- Easy to operate
- Usually requires agents on application servers and workstations for transparent file migration; agents must be managed and maintained
- Tends to be Windows (CIFS) focused with limited NFS support
Shared-path and split-path systems are typically mutually exclusive. But EMC's Rainfinity is primarily a split-path system except when moving files when it's configured as shared path. That eliminates the need for split-path agents for file migrations and the shared-path scalability, performance and single-point--of-failure issues.
Shared-path systems include Avere Systems' FXT Series and F5 Network's ARX series, and EMC's Rainfinity when performing data migration. Split-path options include AutoVirt Inc.'s AutoVirt 3.0 and EMC's Rainfinity.
File virtualization systems have continued to evolve, solving more network-attached storage sprawl issues. Avere Systems' FXT automates NAS storage tiering by hosting the most active files requiring the highest performance on its system of solid-state disk and 15K rpm SAS drives. Using policies, it automatically moves files to heterogeneous back-end NAS systems based on access frequency, performance, age, etc. EMC's Rainfinity provides similar functionality within its Celerra NS NAS systems. FAST (fully automated storage tiering) on Celerra NS uses the Rainfinity engine for transparent file movement (it currently doesn't support heterogeneous systems). F5 Network's ARX uniquely solves NAS sprawl data protection by managing snapshots and replication for distributed heterogeneous NAS systems.
3. Clustered NAS systems
Clustered NAS systems use a distributed file system running concurrently on multiple NAS nodes. Data and metadata can be striped across both the cluster and underpinning block (direct-attached storage [DAS] or SAN) storage subsystems. Clustering also provides access to all files from any of the clustered nodes regardless of the physical location of the file. The number and location of the nodes are transparent to the users and applications accessing them.
Although clustering appears similar to file virtualization, the key difference is that all system nodes must be from the same vendor and often configured similarly. Some exceptions to this include BlueArc Corp.'s Titan and Mercury series, and NetApp's Ontap GX.
Clustered NAS systems typically provide transparent replication and fault tolerance, so that if one or more nodes fail, the system continues functioning without any data loss. Clustered NAS systems are distinguished by their large file systems that can scale to hundreds of terabytes (or more) of addressable capacity.
Clustered NAS systems include BlueArc's Titan and Mercury series, EMC's Celerra NS-960 with Multi-Path File System (MPFS), Exanet Inc.'s ExaStore, Hewlett-Packard (HP) Co.'s Ibrix Fusion and StorageWorks Scalable NAS (previously known as PolyServe), Hitachi Data Systems' HNAS and 3200 series, IBM Corp.'s Scale-out File Services (SoFS), Isilon Systems Inc.'s IQ, NetApp's Ontap GX, Panasas Inc.'s ActiveStor and Scale Computing's SN Series.
- Linearly scale to many nodes and high capacities, with millions to billions of managed file objects; aggregate throughput and IOPS independent of one another
- Easy to grow
- Pay-as-you-go architecture
- Built-in fault tolerance
- Centralized management
- Easy data protection
- Simple file access
- Rip-and-replace solution; can't reuse current NAS systems
- No support for heterogeneous NAS systems
- No ability to migrate files from current NAS systems to the clustered
- Higher hardware and license costs, but may be offset by significantly lower management costs
Clustered NAS does a very good job of resolving most network-attached storage sprawl challenges. It eliminates or at least mitigates the multisystem management issue depending on the scale of the environment. User and application access is simplified with load balancing built in, and data protection and replication is also part of the architecture. Clustered NAS does fall a little short on storage tiering; it does make it easier, but doesn't automate the process (with the exception of EMC's Celerra NS-960 with FAST using Rainfinity).
4. Private cloud or grid storage systems
Private cloud or grid storage systems are somewhat similar to clustered NAS systems, but grid storage provides peer-to-peer clustering that enables it to provide single-image files over geographically dispersed, long-distance and cross-domain operations.
Geographic location "awareness" adds another dimension to NAS sprawl management by centralizing control, management and access for distributed environments. Based on access performance and/or data protection policies, files are replicated and moved to the geographic location that best meets the policy. Whether you have only a few remote or branch offices or hundreds, grid or private cloud storage can make a lot of sense.
There are currently two commercially available private cloud storage systems: Bycast Inc.'s StorageGRID and EMC Atmos. The Bycast StorageGRID runs on x86 nodes that sit in front of standard DAS or SAN storage, so it can use already installed block storage. EMC Atmos also runs on x86 nodes but can only use its own JBOD storage. Bycast's product is a bit more mature with hundreds of installations and OEM deals with HP and IBM.
- Same pros as clustered NAS
- Same or lower cost than clustered NAS
- Management of geographically dispersed locations
- Distributed geographically aware access with centralized management, protection and replication of all files
- Geographically aware, policy-based file replication and movement
- DAS and SAN investment protection or use of very low-cost storage
- Limited number of vendors with mature technology
- No automated storage tiering at this time
- Startup costs can be more than other technologies (but long-term costs will likely be less)
Click here to view a PDF of the antidotes to NAS sprawl.
Summarizing NAS sprawl solutions
File storage growth is bordering on the out of control, with many companies struggling to get a handle on their network-attached storage systems. This NAS sprawl creates serious management problems that can tax overworked IT staffs and jeopardize users' access to corporate data. But the four different technologies described above are available today and can resolve many of the issues and challenges created by NAS sprawl.
Take a pragmatic approach, and implement the least amount of new technology that best meets current and forecasted requirements. That will help minimize risk, lessen the strain on CapEx and OpEx budgets, and can make a world of difference with NAS management.
BIO: Marc Staimer is president of Dragon Slayer Consulting.