Mopic - Fotolia

Problem solve Get help with specific problems with your technologies, process and projects.

How does Tahoe-LAFS store data in the cloud?

Open source Tahoe-LAFS stores fragments of data across multiple cloud storage providers to improve the reliability of cloud security.

The Tahoe Least-Authority File System, or Tahoe-LAFS, is an open source cloud storage option designed to address common security and reliability concerns with storing data in public clouds. Just as a RAID array stripes data across multiple disks, Tahoe-LAFS stripes data across multiple cloud storage providers. Security is improved because individual cloud storage providers store only data fragments. Tahoe-LAFS also enhances reliability because data is stored with sufficient redundancy to guard against the failure of one or more providers.

Data storage redundancy is achieved through a technique known as erasure coding. Erasure coding is based around the idea that it is possible to specify the total number of drives (or, in this case, cloud providers) that can fail without impacting the functionality of the file system.

Erasure coding uses the variables K and N. K refers to the number of providers required to be functional at any given time, while N is the total number of providers used. Hence, recovery goals can be expressed as K of N. Put into practice, each of your N cloud providers will store a volume of data that is equal to the total size of your data set divided by K.

To further illustrate this concept, let's examine the default Tahoe parameters in which K=3 and N=10. These values, which can be changed, specify that 10 different cloud service providers are being used, and that up to seven of them can fail at any given time. Conversely, three providers must remain online for the file system to remain functional.

Now suppose you needed to store 1 TB (1,024 GB) of data in the cloud (using the default Tahoe-LAFS parameters). Each of the 10 cloud providers will need to store enough data to insulate against the failure of any seven servers. The volume of data that must be stored on each server is the total size of the data set (1,024 GB) divided by K (3). In this case, that would mean that each of the 10 cloud providers would have to store approximately 341.3 GB of data.

It is important to consider what this level of reliability does to your storage costs. Cloud storage providers charge based on the volume of data being stored (some also charge for input/output). Using the example above, the redundancy requirements would triple the total volume of data being stored in the cloud (3,413 GB spread across 10 providers instead of 1,024 GB stored on a single provider).

Next Steps

Varying approaches to open source clouds

Open source cloud options expand

Dig Deeper on Public cloud storage

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

Can an open source option, such as Tahoe-LAFS, provide better reliability in the cloud?