amaze646 - Fotolia
If you're architecting a cloud environment, you'll probably have to decide whether or not to use object storage. Object storage is often the best choice for enterprises that have a requirement to store millions, and potentially billions, of files. Once the decision to use an object storage system is made, there is still the task of wading through the various object systems available today.
One of the first options to consider is how you will implement the object storage system. There are two prevalent options:
- Choose a vendor who offers a software-only implementation of object storage. In this model, software is downloaded and installed on servers provided by the organization itself. The compelling advantage to this approach is cost savings. The organization can use its existing server relationship to access inexpensive commodity servers. The commodity, server-class hard disk drives (HDDs) installed in those servers can also be used. Object storage software is then installed on the servers, which are networked together, forming an object storage cluster.
This approach provides an organization maximum flexibility in how it responds to growth. It can order more of the same servers or, as its hardware relationship changes, implement new servers from a new vendor. The same advantage also applies to storage. An organization can quickly move to higher capacity HDDs or higher performance flash drives without waiting for a vendor to certify the new options.
The downside to the software-only approach is the time it takes to realize the value. A software-only object storage system requires more decision making, careful design and a longer testing cycle. This time expenditure repeats itself any time the organization needs to change server or storage hardware.
- Select a vendor with a more turnkey approach to the market. These vendors provide all the storage hardware and software as a single line item. The goal is to speed the time to value, allowing an organization to quickly implement the object storage system and bring it into production. While the user gives up some flexibility as a result of this approach, for many, the gains in productivity are worth it.
Replication vs. erasure coding
The type of data protection to use is another factor. Since traditional RAID isn't feasible in very large storage systems that use high-capacity HDDs, there are typically two choices: replication or erasure coding. Replication simply makes multiple copies of various objects (files) throughout the storage cluster, while the object storage software ensures that no two copies are on the same node or the same rack. It also monitors the number of copies of each object and compares it to policies set for the various types of data. If an object's number of copies falls below the set policy, additional copies are created.
The downside to replication is the capacity overhead created per protected copy. For example, a typical policy is to maintain three copies of data so data can survive the failure of any two components. This means the capacity requirement is tripled. For a 100 TB environment, this strategy may not be an issue. But for a 5 petabyte (PB) environment that now has to deal with a 15 PB storage requirement, this decision may be career-limiting.
Erasure coding is a parity-based data protection scheme, loosely similar to RAID. Unlike RAID it's granular to the sub-object level, meaning it generates parity per object instead of per volume. This results in a better overhead efficiency than RAID, and there's no need to read entire volumes to do a rebuild. In the 5 PB example above, erasure coding can provide a triple level of redundancy for approximately 30% of the overhead, requiring only about 6.5 PB of total capacity.
Beyond object storage connectivity
A final consideration is if the cloud environment needs to write or read data from other protocols like Amazon's Simple Storage Service (S3), OpenStack Swift, NFS, SMB (CIFS) or even iSCSI. Many object storage systems can provide a wide variety of protocols like these. In most cases, native object access provides the best performance and data control. Which of these protocols is most appropriate is largely dependent on whether the cloud user has legacy applications that need to access the object store, or if they need to interface with other clouds like Amazon S3 or Rackspace (OpenStack Swift).
Another influencer of your connectivity decision may be whether a modern application that leverages one of these newer protocols will be used. For example, Hadoop can be accessed via native Hadoop Distributed File System or Amazon S3. If the organization is also looking to stand up an outsourced analytics service, the ability to support Hadoop on the same storage architecture as the rest of the environment may be a critical decision point.
Each object storage system has a unique set of features that may interest the organization. At the same time, too many capabilities may make for a more complex environment. It's important that users assess their object storage system needs to select the most appropriate product.
Techniques for protecting data on an object storage system
Big data and archiving needs served by object storage systems
Users look to object storage for more than the cloud