Enterprises have been turning to RAID for years for support because it has proven to be such an effective and reliable...
tool for storing corporate data. RAID utilizes the benefits of block storage and advanced data technologies to boost performance for critical workloads, while protecting against data loss and corruption.
But RAID has a problem: Because of today's data volumes, it takes much longer to rebuild a failed drive than it did when RAID first appeared. The more data that needs to be stored, the less effective RAID becomes and the greater the risks to that data. As a result, many organizations are looking for an alternative to RAID. Object storage is becoming a popular choice, since it can handle massive amounts of data more easily and cheaply. As object storage matures and becomes more widely implemented, people may begin to wonder whether RAID -- and, by extension, block storage -- has a future in the enterprise.
Getting to know RAID
RAID is a tried-and-true storage technology that maximizes performance and provides a way to recover data if a physical drive fails or data becomes corrupted. RAID groups an array's physical disks together and presents them as a single logical drive. The process of storing the data relies on three import technologies: striping, mirroring and parity, usually in some combination with each other.
Striping splits the data evenly across multiple drives to balance the workload and improve performance. Mirroring writes the same data simultaneously to two or more drives to ensure redundancy. Parity verifies whether data has been lost or overwritten during transmission in order to support fault tolerance and data correction.
RAID implementations are typed by their levels. RAID 0, for example, stripes the data across multiple drives but provides no mirroring or parity. RAID 1 mirrors the data without providing parity or striping, and RAID 5 -- the most common RAID implementation -- uses striping and parity, but not mirroring. RAID levels can also be combined, such as RAID 10 (RAID 1+0), which uses RAID 1 and RAID 0 to deliver both performance and data protection.
With RAID, a server has more spindles to use for writing and reading data, making it possible to achieve much faster throughputs than with individual drives. At the same time, the additional drives can increase availability and resiliency through parity or mirroring.
Although RAID plays a significant role in the data center, it was not designed for today's storage volumes. When a disk in an array fails, the data is in a vulnerable state until the disk can be replaced, which can sometimes take days, depending on the amount of data. During that time, another disk might fail or be found to contain bad sectors or unreadable data -- the larger the data volumes, the greater the risk of losing that data.
For this reason, erasure coding is becoming a common alternative to RAID. Erasure coding breaks the data down into fragments that can be expanded and encoded with redundant data pieces. When compared to RAID, erasure coding can reduce the time and overhead that comes with reconstructing data.
General industry trends also play a role in the move away from RAID. Hyperscale computing, for example, uses redundant servers to protect data, SSDs incorporate features such as wear leveling and error correction code, and some SSD vendors now add their own data protection capabilities, such as NetApp's Helix, a distributed replication algorithm that comes with the SolidFire all-flash arrays.
Then, of course, there's object storage, which is changing the nature of data storage.
RAID's storage building blocks
The move to object storage is not only about RAID, however, but also about block storage, the data architecture on which RAID is built. Block storage breaks files into individual data blocks that are each assigned a unique address. The smaller data structures make it possible for a storage management system to store data blocks in the most efficient way possible.
To access the storage blocks, a server uses a communication protocol, such as Fibre Channel, Fibre Channel over Ethernet or iSCSI. The blocks themselves contain no metadata. It is up to the storage management system to determine how storage should be allocated and where data should be stored.
Block storage is used primarily in SAN configurations, which often include RAID arrays. Block storage is a widely implemented and understood technology and is well-suited to applications requiring high throughputs and low latencies. Block storage addresses many of the limitations of file storage, a more basic storage technology that uses metadata and directories to organize files. However, although file storage is simple and easy to deploy, its hierarchical nature adds overhead that continues to increase as more files and directories are included.
Block storage is more flexible and performs better than file storage, but it is also more complex and costly to implement and maintain -- issues that can be exacerbated by a RAID implementation. In addition, block storage includes no metadata, which means it cannot be searched or used for certain types of advanced analytics. Block storage also cannot scale to the degree necessary to meet the demands of larger data volumes. Plus, as distances between storage and applications increase, so too does the system's latency.
Despite these limitations, applications that require fast I/O continue to rely on block storage. For example, database management systems commonly use block storage to support their transactional workloads, and email servers and virtualization software often rely on block storage to meet their fluctuating workloads. Not surprisingly, these systems often utilize RAID to boost performance and protect data.
Enter object storage
Object storage approaches data storage much differently from block storage. Rather than dividing files into raw blocks, object storage keeps them and their metadata together, along with extended metadata that can be customized to meet application requirements. The data and metadata are stored as individual objects that share a common address space (storage pool) without the need to navigate volumes or file hierarchies.
A unique ID is assigned to each object when it is created. The object can be stored on a local server or in a cloud-based data center halfway around the world. An application that wants to access an object need only provide the object's ID, regardless of its location. The application connects with the object via an HTTP-based REST API, using basic calls, such as GET, PUT or DELETE.
When used with erasure coding as data protection, object storage offers a far simpler and more flexible alternative to RAID and block storage. A distributed data pool makes it possible to store large amounts of unstructured data that span geographic boundaries. Objects can be replicated to multiple drives, and drives can be added when and where they're needed. Scaling object storage is merely a matter of adding nodes to the storage clusters, wherever they reside, offering the potential for infinite scalability.
Object storage also has the advantage of including both the data and metadata. The metadata can be customized with application-specific attributes, leading to more advanced, large-scale analytics than can be achieved with block storage. Object storage is also a more cost-effective alternative to RAID and block storage, which are traditionally expensive to implement and maintain.
Because the objects in object storage share a common address space, without the complexities and overhead that come with block storage and RAID, managing storage is also much easier. Plus, object storage has the advantage when it comes to protecting data from drive failure or data corruption. Objects can be easily replicated to as many secondary systems as necessary without incurring additional overhead. Object storage can also use coding to protect the stored data, which comes with lower overhead than RAID.
Despite these benefits, object storage is not the answer for all enterprise workloads. Block storage, with or without RAID, wins out when it comes to performance, especially for applications that require a high degree of random access I/O, such as databases and virtual desktops. With object storage, if an object's data needs to be updated, the entire object must be rewritten, which can impact performance, especially if the data changes frequently.
The inclusion of metadata can also add to computational overhead that leads to additional latency. Organizations should also be aware that moving from block storage to object storage means having to update their applications to access the objects through the API.
Object storage, block storage and RAID
Block storage is not going away anytime soon, nor is RAID, nor is the SAN. Organizations running critical business applications, financial systems, database management systems, virtual desktop infrastructures and other high-performing applications have invested extensively in these systems and understand how they work.
They also understand that their data volumes are growing exponentially and that new storage models need to be considered. Object storage addresses many of the limitations of these older technologies and is well-suited to scenarios concerned more with the amounts of data than with performance, such as backups, archives and big data storage. RAID and block storage are not equipped to handle such massive data volumes, and object storage is generally cheaper and easier to manage, while providing much more flexibility.
Whether object storage and erasure coding will ultimately be responsible for RAID's downfall is difficult to say. While they provide alternatives to RAID, the complexities and limitations that come with RAID might be enough to bring about its demise, without the help of object storage and erasure coding. Block storage and SANs could still be around for a long time to come, even if RAID gets phased out.
Object storage is still a young technology that continues to evolve and mature. Perhaps it will eventually become so performant and reliable that, in a decade from now, few will even remember block storage -- let alone RAID. Until then, organizations will have to balance the best of both technologies.