Maxim_Kazmin - Fotolia

Is it time to rethink the traditional file system?

Large amounts of unstructured data and technologies like cloud have some asking if the traditional file system is up to the task.

Let's start at the beginning; there are many different types of file systems. Some are specialized to different storage media -- like the ISO 9660 file system for optical disc -- others for providing access to specific types of storage devices connected in a network -- such as NFS, CIFS, SMB (Server Message Block) and more. Others translate or map one file data and metadata schema to an entirely different file and metadata scheme, or participate in the parsing of a file over physical media.

Bottom line: there is more to the traditional file system than one might assume, especially given the somewhat disparaging references to "unstructured data" often applied to user files. Truth be told, there is a lot of structure in the traditional file system, and a lot of complexity.

Recent developments in IT have some storage managers and data center decision makers questioning whether the file system in its current form is really still the best mechanism for storing data. Indeed, the advent of cloud computing, virtualization and "long block" files -- such as video, medical imaging, human genome data and others -- have started to raise questions about the efficacy of file systems in the not too distant future.

Why? For starters, cloud computing tends to flatten the directory structure, driving a lot of minute files into a single namespace for faster loading as graphical interfaces and more. In some cases, a single page may have millions of file components, driving the need for greater speeds in file system operation to feed data to CPUs and GPUs.

And, of course, in the presence of potentially millions or billions of files deployed to storage -- now typically measured by terabyte or petabyte -- searching for a single file in a nesting of directories and subdirectories, and across extended locations on media represented by node and inode extents, can be a much slower task.

Another big issue is the self-destructive nature of the traditional file system. Most file systems still overwrite the last valid copy of a file every time you save a new version of the file, rather than creating a journal of revisions. This was a deliberate design choice that reflected an era, nearly 30 years ago, when the price per GB of disk space was prohibitive. While some file systems offer more options for saving files, including journaling, they are not in mainstream use.

For example, on-journaling was supposed to provide a hedge against wasted storage space, but reality has proven quite different. Difficulties understanding the contents or business context of files created by end users has led to extreme waste of space on storage media. According to one recent study by the Data Management Institute, up to 70% of the capacity of every hard disk in use contains files that are inert and never referenced, orphaned, duplicated or contraband.

Technologies such as deduplication have proven to help improve storage capacity usage efficiency, especially the waste created by unintentional file duplication and "white space." Many file systems are now being enhanced to include deduplication as part of the storage process at the file system level, but initial efforts have come at the cost of file system performance.

In short, there are many reasons to reconsider file systems in their current form. An alternative is to use a different frame of reference for storing data. One approach that is currently enjoying some traction is pioneered by companies such as Tintri Inc. Tintri substitutes a virtual machine for a traditional directory, subdirectory, volume or LUN file system framework, catering to the current popularity of hypervisor-based computing.

Another approach championed by Caringo Inc. and a host of other object storage companies is to substitute an object-oriented storage framework for a traditional file system. This approach is especially well-suited to cloud computing, where it can help improve on the speeds, feeds and space utilization of flat directory systems with myriad file objects. This is also well-suited to long block files, such as video and medical imaging, that do not change very often and may be enhanced with extra descriptive metadata to help searches for -- and within -- object contents.

Next Steps

SwiftStack adds file system gateway to software

Benefits of using NAS with an object file system

The differences between object, file and block storage

Dig Deeper on Parallel file system