SearchCloudStorage senior writer Carol Sliwa spoke with Anders Tjernlund, the chief operating officer and co-founder of SwiftStack Inc., which is a core contributor to the open source project for OpenStack Swift object storage software. SwiftStack is the employer of the project/technical lead for Swift and many of the core team members. SwiftStack also sells a storage controller and support for Swift.
How does data placement work in the current version of OpenStack Swift compared to the original version?
Regions, zones, servers and drives form a hierarchy for data placement. Regions are used only when distributing a cluster over geographic sites. A zone is defined as a unique domain of something that can fail, such as power or a networking segment. If a cluster has the same power source and networking, there is no reason to configure anything other than one zone because Swift automatically places data in unique locations in the cluster. If the same cluster is distributed over separate rooms in the data center, and each room has its own power source, two zones should be configured.
OpenStack Swift places three copies of every object across the cluster in as unique-as-possible locations: first by region, then zone, then server, then drive. For instance, in a cluster with three zones, Swift will place the objects in all three zones. In a cluster with one zone and three nodes, Swift will place objects on each node. In a cluster with one node, Swift will place the objects on unique drives and so on. Swift does this automatically, and there is no requirement to configure zones unless the user has clear failure boundaries, such as a networking segment or power source.
Prior to version 1.5 of Swift, zones had to be configured for data placement. The default setting was three zones, but the recommended number was five because if the user set up only three zones and one zone became unavailable, a write could fail. The recommendation for five zones became irrelevant with version 1.5's unique-as-possible data placement, which relegates zones to failure-domain status.
Swift is a two-tier storage system consisting of a proxy tier, which handles all incoming requests, and an object storage tier where the actual data is stored. In addition, consistency processes run to ensure the integrity of the data. Swift uses a data structure called a ring to map a URL for an object to a particular location in the cluster where the object is stored. The mapping also applies to handoff locations that are used in the event of a hardware failure. As a distributed storage system, the ring is deployed to every node in the cluster, both proxies and object servers.