Published: 17 Nov 2002
I'm often asked what the value of storage virtualization is and what the best ways to use it are. Without a doubt, in a well-thought-out implementation, virtual storage provides many cost-saving benefits. In fact, there's practically no way to avoid it because RAID virtualization is so common in disk subsystems. But unplanned virtualization can spread data over many disks in an array, and in the process, unwittingly introduce major performance problems.
|Planning for virtulaization|
The three main benefits of virtualization are redundancy for data, increased flexibility in using storage address space and boosting performance for applications that tend to have I/O bottlenecks.
Virtualization, of course, can be done in multiple layers in the storage area network (SAN)-the volume manager, HBA RAID controller, network device (appliance) and subsystem controller. The performance-improving suggestions I make in this article should be done at the virtualization layer closest to the disk drives.
Redundant data protection in the form of mirroring (RAID 1) or RAID 0+1 is widely available and should be incorporated into all system storage. Cost advantages of RAID 5 are not viewed as significant enough to overcome write performance penalties and degraded-mode operations. The choice between RAID 1 and 0+1 depends on the capacity, scaling and performance requirements of the application. If in doubt, use RAID 1 to simplify configuration and troubleshooting.
RAID 0+1 arrays can afford to have multiple disk drives fail, as long as they are not part of the same mirror-stripe. This is a big advantage over RAID 5 and RAID 1, where the loss of more than one disk drive results in a loss of data.
Storage address spaces
In addition to mirroring and striping, virtualization can also subdivide and concatenate storage address spaces. From the perspective of address space manipulation, virtualization can make all disk storage work like putty that can be merged in an endless variety of ways. However, taking this approach to storage is unlikely to result in optimal designs.
It's important to always keep in mind that disk drives are electro-mechanical devices that are performance-constrained by the rotation speed of the media and the time it takes to move the read/write heads over the media. The performance and cost differences between 5400 RPM ATA disk drives and a 10,000 RPM SCSI disk drives can be enormous. As a best virtualization practice, make sure that the drives that form an array have similar specifications, so as not to create a bottleneck.
Additionally, you need to know how disk subdivisions are allocated to applications and systems. For example, consider a scenario where 14 different file systems and/or databases are using the storage resources of 16 disk drives that are managed by a storage virtualization product. There are many ways this storage could be allocated, but one simple "cloud" method is to assign units of storage uniformly on a first-come, first-serve, round robin basis. The "Array distribution diagram 1" (Allocating disks: there's the wrong way ...) shows a collection of disks as you might find within a disk subsystem where all 16 disks have been subdivided into five equal extents (partitions). As each application comes online, an array or extents is allocated to it using either RAID 1 or RAID 0+1.
According to the first diagram, array 1 runs on disks 1 through 4, array 2 runs on disks 5 through 8, array 3 runs on disks 9 through 12 and so on. The other dimension to analyze is the distribution of applications-represented by arrays-on each disk. For instance, disk 12 has arrays 3, 9 and 13 and disk 13 has arrays 4, 9 and 14.
Allocating disk extends
Each application array will have different workloads. Again to simplify things, let's assume that applications 3, 6, 9 and 12 are the heaviest workloads in terms of I/O activity. Analyzing the distribution of these heavy workload applications across disks, the shaded extents indicate that disks 5, 6, 9, 10, 11 and 12 all have two I/O heavy applications on them competing for resources in the drive. It's likely that these applications could experience performance degradation due to I/O bottlenecks that would be difficult to replicate, troubleshoot and identify.
If you take a slightly different approach to allocating disk extents, where you try to mix the type of workload across member disks, you could try to reserve space on each disk for heavy applications. Diagram 2 ("...and the right way") shows the exact same set of disks, extents and arrays, but where the "bottom" extents are reserved for high I/O applications. Notice that the high I/O applications are distributed across all the disks, and none have more than one. This is obviously a lucky situation where everything happened to work out nice and square, but the fact remains that structuring disk allocations to workloads is a good idea and should probably be part of a virtualization best practice.
It's worth pointing out that limiting RAID choices to 0 and 0+1 makes the whole process of mixing applications across disks much easier. The reader is encouraged to try their hand at allocating odd-number member arrays such as RAID 5 to see the additional complexity involved. Not only does the layout become more complex, but the real problem is the impact of the RAID 5 read/modify/write penalty that could impact the performance of all the applications storing data on drives in the array.
Virtual storage performance can be optimized if disks are all matched and RAID 1 or RAID 0+1 are used. In transaction processing environments, RAID 0+1 has the major advantage of spreading out the I/Os over a number of disk drives. One note of caution: While it sounds simple enough, there's an art to tuning arrays for database performance that goes beyond the confines of this article and it's probably better left for experienced DBAs with detailed understanding of their applications.