Not long ago, storing files was easy. On Windows, you'd use NTFS. On Linux, Ext4 was the best approach for enterprise...
data. A lot has changed as different file systems have evolved. Multivolume file systems are common, and parallel and distributed file systems make it easy to extend local file storage to the cloud.
To make it easier to understand what's happening in the world of storage, let's examine three different file systems: multivolume file systems, distributed file systems and hybrid cloud storage. We'll also take a look at a new development: decentralized cloud object storage.
Next-generation local file systems
We've been discussing file system improvement in the Linux OS for about a decade. Linux has been searching for a file system offering features similar to those in the ZFS file system, which originated in Unix. Those features include copy-on-write, easy resizing and increased data integrity features.
The Btrfs file system, which Oracle designed for use in Linux, appeared to provide those features. Btrfs' development began in 2007, and it has been considered stable since 2014. But now that Red Hat, the most important corporate player in the Linux market, has decided not to continue supporting Btrfs, its future is less certain. However, for modern data center workloads tightly integrated with public or private clouds, next-generation local file systems aren't the most important development to consider.
Distributed file systems
The main drawback of a local file system is that it's local. Computing increasingly happens in a distributed environment. Distributed file systems enable data storage in a distributed way. NFS and other client-server-based distributed file systems have been around for decades, but nowadays, distributed file systems are also available in clouds.
The value of cloud-oriented distributed file systems is that they allow a file to be split into multiple chunks that can be replicated over multiple servers. That enables redundancy, while making it easier to provide access to the file at the location where it's needed. Also, distributing the chunks over multiple servers enables parallel access to different parts of the file, which avoids locking problems in a multiclient environment.
Multiple distributed file systems include object storage approaches, such as Ceph and Swift. There is also cloud-based storage, such as Google and Hadoop file systems. In distributed file systems, hundreds of storage nodes can be involved, which makes the file system less sensitive to outages caused by hardware failures.
Hybrid cloud storage
As more data centers become tightly integrated with cloud, it makes sense to integrate local storage with storage in either private or public clouds. Traditional storage vendors, such as Dell EMC, IBM and NetApp, as well as public cloud platforms, such as AWS and Azure, offer these sorts of hybrid approaches. Hybrid storage is a market still dominated by cloud and storage vendors. Because of the nature of hybrid storage, there are no important open source projects.
There are many reasons why companies start looking at hybrid storage when considering different file systems. One important motivator is that hybrid cloud storage offers scalability without requiring a large investment in an extended storage infrastructure. Among the different file systems, cloud storage stands out because you only pay for it when it's really needed. Once it's no longer needed, users can fall back to the local storage.
Connecting local storage to cloud storage requires a protocol to move data around. ISCSI, the S3 API and NAS protocols, such as NFS, are the common ones. Another common element is the gateway that makes the connection between the local data and the public cloud.
Decentralized cloud object storage
Many servers in cloud have too much storage capacity, and that's the foundation of decentralized cloud object storage. With this type of storage, client-side encryption is used to distribute data over multiple anonymous devices on the internet.
Various little chunks of the encrypted data are stored on many computers, but the owners of the computers can't do anything with the data. Storj is implementing this type of storage. Its network should be available in early 2019.