One of the most commonly weighed options for storing cloud data is object storage vs. file storage. Those making the comparison tend to focus on technical differences rather than conceptual ones, which rarely explain why object storage has become the mass storage of choice for cloud service providers.
File storage is organized similarly to the way humans organize physical files in a filing cabinet. Files are systematically placed into folders and organized by naming conventions based on characteristics such as extensions, categories or applications. File systems are presented as a hierarchy of directories, subdirectories and files. Files are stored by the file system with a limited amount of metadata, such as file name, creation date, creator, file type, most recent change and last access. The typical relatable experience is the C: drive in a desktop or laptop or the shared D: drive in a departmental file server. Locating a file is done either manually or programmatically by working through the hierarchy. This works exceptionally well when the number of files is relatively small or the location of the files is known. As the number of files grows into the billions, it becomes far more cumbersome.
Object storage organizes unstructured data in such a way that it is not intuitive to humans. There is no hierarchy -- it's all about the individual objects. Files are stored as objects in different locations, and each object has a unique identifier and a considerable amount of metadata. The amount of metadata is variable -- and significantly greater than file metadata -- ranging in size from kilobytes to gigabytes. In addition to the metadata typically found in file systems, object metadata frequently includes a summary of the content in the file, key words, key points, comments, locations of associated objects, data protection policies, security, access, geographic locations and more. That enhanced metadata enables object storage to protect, manage, manipulate and keep objects on a much finer level of granularity. One example of this is the ability of object storage to use erasure codes.
Object storage benefit: Better data protection
Object storage is very suitable for the erasure coding method of data protection. Erasure codes break a file into multiple objects on different storage drives, nodes and even geographic locations. That requires a lot of metadata. Erasure codes provide a higher level of data resilience than RAID, but at a much lower cost. However, erasure codes add latency that slows performance.
Object storage vs. file storage access
Locating data represented by multiple objects is done via the object's unique identifier and its metadata. There is no hierarchy to scan or crawl. It is analogous to handing a valet a ticket for a car. There is nothing on the ticket as to where the car is located, and yet the car is retrieved. Finer granularity means many functions can be provided on a per-object basis. When data protection, replication, search, mining, moving and managing are more granular -- as it is in object stores -- it also becomes faster and more efficient. This is increasingly noticeable as the number of files grows into the billions or trillions.
One technical difference in object storage vs. file storage is access. Object storage access is primarily via its RESTful API. Although many object stores have a built-in NAS front end or physical NAS gateway, they cannot match the response-time performance of native NAS systems. Both file and object storage often have a block iSCSI gateway function. Once again, the NAS variation is usually -- although not always -- more performant than the object storage equivalent. To take full advantage of an object store, the application or server file system must utilize the RESTful API. File systems, on the other hand, are accessed through standard Network File System, Server Message Block or Hadoop Distributed File System protocols, as well as IBM Spectrum Scale (formerly GPFS), Lustre or Panasas.
The performance drawback of object storage systems
File storage tends to provide better performance -- though there are exceptions -- than object storage. The additional metadata and erasure coding increases read and write overhead latency. However, the performance drawback of object storage is offset by several main advantages it has over file storage:
- Object storage scales to higher capacities.
- Object storage is more efficient on search, mining, data protection and analytics.
- Object storage has a much lower total cost of ownership than file storage. This is due to storage and data protection efficiencies; the use of commodity, off-the-shelf, white-box hardware; and significantly lower licensing and support costs.
Object storage vs. file storage in the cloud
Boiling this down for cloud appropriateness, file storage is generally a very good cloud fit for colocated mission-critical or high-performance applications, whereas object storage is a better fit for:
- Big data
- Active or cold archiving
- Social media
- File sharing
When a cloud provider should choose block storage over object
Essential guide to object storage data management
Object storage system advantages over file storage