Managing and protecting all enterprise data


Do RAID levels still matter?

Most new storage arrays automatically distribute data onto a number of spindles, which eliminates the manual task of selecting RAID levels. You can still manually select your RAID levels, but you'll need to balance availability, risks and costs.

Most new arrays stripe data across their spindles automatically to increase performance and better use disk capacity. With capabilities like that, RAID could become a thing of the past.

The 20-year anniversary of the invention of RAID by David Patterson, Garth Gibson and Randy Katz of the University of California at Berkeley is less than a year away. Their revolutionary paper, A Case for Redundant Arrays of Inexpensive Disks (RAID), changed the way server-class computers stored data. Soon after RAID burst upon the scene, storage administrators had to wrestle with the pivotal RAID question: "How shall I place data on my hard disks to optimize capacity, performance and data protection?" But that question is becoming less relevant because most new storage arrays automatically distribute data onto a number of spindles, which eliminates the manual task of selecting RAID levels.

Most Storage readers don't require an introduction to the concepts of RAID. But the rules of the game are changing. As recently as five years ago, storage administrators were constantly challenged by die-hard application administrators to control data placement on the disk array at a very granular level. Not being content with merely specifying "table spaces on RAID 5 and logs on RAID 0/1," some database administrators asked for particular data stripe placement on the platter itself so that, for example, the highest usage online transaction processing (OLTP) instances could get better performance by being written to volumes occupying the outermost disk cylinders.

First and foremost, RAID was invented for increased storage performance. In essence, RAID is a form of parallel I/O processing that spreads the workload over a number of disk devices, summing their performance in an attempt to help storage keep up with the rest of the system. RAID does indeed achieve this goal, especially when used in conjunction with another powerful performance-enhancement mechanism: caching.

Caches keep getting larger, improving performance as they grow. Only when your application seeks data outside the cache (a cache "miss"), does the selection of RAID level affect performance. With new cache algorithms and proper tuning, cache misses can be kept to a minimum. With 90% plus cache "hits," RAID level selection will have a relatively minor impact. This hasn't gone unnoticed by storage array vendors.

Storage administrators have always been confronted with the task of matching the application storage workload to the storage configuration. For many enterprise applications, this goes beyond the simple "logs on 0/1, tables on 5" to complex volume management involving both hardware and software RAID. Some array vendors actually recommend such techniques as software RAID 0 striping across their array controllers. But the storage workload profile of an application can change over time, requiring constant re-tuning to maintain performance.

Many storage administrators, managing hundreds of terabytes of data, are beginning to realize that twiddling knobs to achieve better performance isn't as much fun as it used to be when 2TB was considered a very big array. Today, the hottest trend in storage arrays is data stripe abstraction. Blocks of data are written to many disks in the array according to a pattern determined by the intelligence in the array; the RAID level, as it were, is fixed and not configurable. In the most advanced arrays, the data block layout pattern can change dynamically as the application workload profile changes.

Balance cost, data protection and performance
If you still want to manually select RAID levels, remember that performance increases with the number of spindles across which your data is distributed, but large RAID groups increase the risk of data loss if a disk fails. Using small RAID groups, or increasing the number of parity disks, decreases the available capacity for the same investment in cost, power and rack space. Balancing availability, risks and costs are certainly business decisions. Here are some guidelines, with the caveat that "your mileage may vary," especially among array vendors.

RAID 0/1, with a small stripe size and meta volumes spread out over a large number of spindles, offers the highest reading and writing performance, all other things being equal. The primary reason for this is that there are no parity calculations involved when writing or rebuilding. This is also the most expensive approach; plan on a usable capacity of half the raw storage amount. If you have a "thrashy" application (lots of random reads and writes), such as a journaling file system, email server, OLTP database or CAD application, this still may be your best bet.

RAID 5 offers read performance almost on par with RAID 0/1, but write performance isn't as good. Avoid the temptation to make RAID groups too large. Expect usable capacities in the range of 67% to 94% of the raw storage amount. For most applications, RAID 5 is the best approach.

If you're using big drives for the lowest possible cost per gigabyte consider using RAID 6 (also known as RAID with double parity). This will help protect against the greater risk of two drives failing in a RAID group, while allowing a usable capacity up to 88% of the raw storage amount. This is a good RAID level for data warehousing, archive, back up to disk or any application where large capacity is paramount.

Let the array decide
"At NetApp, we have the view that selecting RAID levels for specific applications has become rather anachronistic," says David Dale, industry evangelist at Network Appliance (NetApp) Inc. "The best modern arrays offer automated RAID with architectural mitigation of performance or capacity trade-offs."

The idea of "carving a LUN" and not knowing which physical spindles the data will end up on may be unnerving to some, but it's the wave of the future. Technologies like storage virtualization, thin provisioning, index copying/hardware continuous data protection (CDP) and dynamic volume sizing--just to name a few--mandate the automation of physical disk layout.

New, distributed disk or grid storage technologies are taking on some of RAID's data protection duties. "There are many ways to create redundant copies of data without using RAID," writes John Spiers, founder and CTO at LeftHand Networks. "LeftHand's volume replication can be configured to withstand multiple drive failures, array level failures and complete site failures without losing data, and all without the use of traditional RAID algorithms ... the days of traditional RAID systems may be coming to an end."

Other storage array companies are also heralding the end of RAID as we know it. "While most vendors assign a single tier or RAID level to a volume, Compellent assigns these parameters on a block basis," observes Bob Fine, Compellent's director of product marketing. "Both data classification and data movement are automatically tuned by the array."

For very highly specialized application workload profiles, such as CASE and video editing, manual storage configuration might still be worth considering. But for mainstream applications, like Exchange, SQL Server, Oracle and ERP, storage vendors offer very appropriate automation solutions.

Click here for an overview of RAID performance levels. (PDF).

Balancing requirements
In the final analysis, business decisions need to be made before technical factors can be considered. What, for example, are the requirements of the application and the operating system? How will the data be protected? What are the infrastructure requirements? How much will it cost?

Picking a storage technology is all about balancing parameters. Choosing a RAID level, or a product with data stripe abstraction, is no different. There are a wide variety of options, with no one selection "best" for all circumstances. As a storage administrator, you want happy "internal customers" with a minimum of work and stress. This may mean giving up some control to the array for disk management.

Article 7 of 16

Dig Deeper on Storage management and analytics

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Storage

Access to all of our back issues View All