Mopic - Fotolia
The Tahoe Least-Authority File System, or Tahoe-LAFS, is an open source cloud storage option designed to address common security and reliability concerns with storing data in public clouds. Just as a RAID array stripes data across multiple disks, Tahoe-LAFS stripes data across multiple cloud storage providers. Security is improved because individual cloud storage providers store only data fragments. Tahoe-LAFS also enhances reliability because data is stored with sufficient redundancy to guard against the failure of one or more providers.
Data storage redundancy is achieved through a technique known as erasure coding. Erasure coding is based around the idea that it is possible to specify the total number of drives (or, in this case, cloud providers) that can fail without impacting the functionality of the file system.
Erasure coding uses the variables K and N. K refers to the number of providers required to be functional at any given time, while N is the total number of providers used. Hence, recovery goals can be expressed as K of N. Put into practice, each of your N cloud providers will store a volume of data that is equal to the total size of your data set divided by K.
To further illustrate this concept, let's examine the default Tahoe parameters in which K=3 and N=10. These values, which can be changed, specify that 10 different cloud service providers are being used, and that up to seven of them can fail at any given time. Conversely, three providers must remain online for the file system to remain functional.
Now suppose you needed to store 1 TB (1,024 GB) of data in the cloud (using the default Tahoe-LAFS parameters). Each of the 10 cloud providers will need to store enough data to insulate against the failure of any seven servers. The volume of data that must be stored on each server is the total size of the data set (1,024 GB) divided by K (3). In this case, that would mean that each of the 10 cloud providers would have to store approximately 341.3 GB of data.
It is important to consider what this level of reliability does to your storage costs. Cloud storage providers charge based on the volume of data being stored (some also charge for input/output). Using the example above, the redundancy requirements would triple the total volume of data being stored in the cloud (3,413 GB spread across 10 providers instead of 1,024 GB stored on a single provider).
Varying approaches to open source clouds
Open source cloud options expand
Dig Deeper on Public cloud storage
Related Q&A from Brien Posey
Like composable infrastructure, next-gen hyper-convergence promises to ease procurement and management by, among other things, enabling users to add ... Continue Reading
The reasons for going hyper-converged vary. Often, however, organizations deploy HCI technology to effectively address one or more of the five issues... Continue Reading
Adhering to service-level agreements, keeping up with performance demands and planning for future workloads are just a few of the goals you should ... Continue Reading