According to IDC, the amount of new file storage growth between 2009 and 2014 is expected to be about 160.35 exabytes. That’s approximately 300% more than the growth of every other data type combined, including database and email, over the same period of time. That kind of file growth has a number of negative ramifications, not the least of which is protecting it all. Traditional data backup approaches are no longer practical because of the sheer mass of file storage.
In many cases, IT professionals don’t create file systems larger than 2 TB because they don’t want backup data sets to be too big. This means that if you have one petabyte of NAS storage, you’ll have at least 500 file systems you have to back up. There are companies with thousands of file systems out there; over time, that kind of situation will become more and more commonplace.
Although the market tends to hype and value large file systems, they’re difficult to protect. If you have a file system that’s 100 TB, then backing up the entire file system becomes extremely impractical. This is also true of object-based storage systems that have a flat name space. Vendors that provide these products often recommend you replicate to disk vs. backing up. However, that doesn’t provide an easy way to recover data. The challenge is that most of the storage-based replication solutions are block based, so you really don’t have any efficient methods to recover data at the file level. And even if some of those systems provided file-level replication, they have no recovery app for users to find and retrieve the files they’re looking for. As file storage increases, it makes the needles in our ever-growing haystack harder and harder to find.
Block-based replication has never been, and never will be, an adequate replacement for backup for a number of reasons. Storage-based solutions are vendor specific and therefore don’t provide a universal method for data protection. Additionally, these solutions are typically confined to single storage systems; they’re stovepiped. If you have 100 NAS systems, it will be a nightmare to manage remote mirroring for all of them. This approach is also costly because it’s usually a paid-for option, it increases maintenance charges and replicates data onto the same vendor’s storage, which isn’t necessarily a low-cost solution. Perhaps most importantly, recovering specific files is a difficult if not impossible task. Remote mirroring is not well suited for granular recoveries; it’s better suited for recovering entire systems.
A better and smarter approach is an intelligent file-level replication solution with the following capabilities:
- The ability to replicate data to and from any file system
- The ability to replicate entire systems, individual file systems, directories and sub-directories, and at the file level
- It’s essential it provide search and recovery capabilities so users can find what they’re looking for efficiently
- The ability to scan the file systems for any changed or new files, and to replicate only those to the system
- Must be able to scale to petabyte environments, including discovery, replication and search with high performance
A software company called Digital Reef Inc. is doing all of the above. However, it’s also important to find a lower cost and easy to manage storage tier to replicate this data. There are a number of scale-out file storage systems that fit this requirement, including HP Ibrix and IBM SONAS. EMC Isilon isn’t really a lower cost solution, but there are configurations where it would certainly be more attractive price-wise than tier 1 NAS. Dell Exanet should be available as an option for this tier as well. There also seems to be an uptick in interest in the Symantec file system, which sounds good on paper. There are a number of open-source file systems, including Gluster and Hadoop, and we can’t forget ZFS -- it’s not scale-out, but you can throw Gluster in front of it to provide that capability. However, whenever you’re using an open-source file system there’s typically some handholding that’s required by the user.
The return on investment would be significant. In some cases, you could even stop backing up your file systems altogether. Consider the impact on your infrastructure and resources of eliminating file backups. You can also reduce your reliance on storage-based mirroring and minimize the cost and management of these solutions. Reserve remote mirroring technology for mission-critical files and leverage file replication to a lower cost, extensible storage tier for everything else.
The world has changed and yet we’re still using the same tools to manage our file data. That’s neither practical nor sustainable. Not unless you have an unlimited budget, endless floor space and a deep pool of skilled people who don’t mind doing mundane work while putting out fires all the time.
BIO: Tony Asaro is senior analyst and founder of Voices of IT.