In Part 1 of this series, we introduced the idea that implementing availability requires taking a layered approach,...
and in Part 2 we discussed the first layer of the Availability Index, Good System Administrative Practices. In Part 3, we looked at the last line of defense between you and total loss of your data, backups. In Figure 1, we review that layering once again.
To no one's great surprise, and keeping right in line with Moore's Law, disk capacities continue to grow at a remarkable rate. And dutifully, consumers of disks upgrade their storage and buy larger and larger disks as they become available. After all, if they are each the same form factor, isn't one 36GB disk better than four 9GB disks? Surely from a real estate perspective three fewer disks are better than three more. And one disk is almost always going to be cheaper than four. But what about availability?
Without getting into the mathematics of it, let us assume that all disks, either 9GB or 36GB, have the same likelihood of failure. User A has four 9GB disks, while user B has one 36GB disk. User A's disks hold the same 28GB of data, evenly spread between them, and split evenly between four applications, as does user B's 36GB disk. If User A loses a disk, he loses just 25% of his data, affecting just one application. If User B loses a disk, he loses 100% of his data, affecting all four of the apps.
If the disks are mirrored (a most reasonable assumption), then instead of actually losing the data, then that much data becomes unmirrored, and vulnerable to a second failure.
If the disks are mirrored, and hot spares are in place (where disks are held in reserve until they automatically take over for the failed disks), then the difference is in the amount of data that must be mirrored to the replacement disk. User B will have to copy (mirror) 4 times as much data, taking 4 times as long (and leaving the system vulnerable to a second failure during that period), and causing a performance impact to the system (or disk array) that lasts four times longer.
A single disk that supports four separate applications is far more likely to see performance problems than if the applications are spread between multiple disks. Four disks have four separate heads, each of which operates separately. As more traffic comes in, there is more hardware to service it. A single disk head can be a performance bottleneck on a busy system.
With multiple disks, you also have the option of spreading the disks across multiple disk controllers if you are so inclined, which can further improve performance. While it is possible to put a single disk on multiple controllers, any benefit would be limited (or lost) due to the single disk head.
If each of the four applications has its own disk, then a full disk will only affect that application. If four applications share a single disk, then any of the applications could misbehave, and cause the others to fail due to a lack of disk space.
Obviously, if you are using a smart disk array with LUNs, or a logical volume manager the advice changes some, depending on how volumes or LUNs are arranged between the physical disks. Nevertheless, when larger disks fail, they take more data with them, and the impact on your system will be larger, and the recovery time will be longer.
There is one other intuitively obvious point to make. Larger disks pack the data closer together, requiring smaller disk heads, and smaller physical clearances between the equipment. That makes it seem as though the disks are more delicate, and more prone to failure. However, I have never seen any hard evidence supporting that position, so I offer it as a caution, rather than as something provable.
So, please keep these thoughts in mind as you upgrade your disks from those tiny 9GB disks to the lumberjack-sized 36GB disks. And next year, when those tiny 36GB disks just don't do the job anymore, and your disk vendor suggests the new whizzy 144GB disks...
Copyright 2002, Evan Marcus
Evan Marcus is data-availability maven for Veritas Software.