Each day there seems to be another choice of storage, ranging from small appliance-type devices with a few drives to large unified storage systems designed to handle both block and file data. The type of storage class where your database is stored is the first storage consideration for data placement. The three types of storage classes we're going to consider for data placement are JBOD, Intelligent Controller and Integrated Cached Disk Arrays.
JBOD. Provides the least amount of data protection and performance, but at low cost. To achieve any type of performance with a database placed on this type of storage, one must consider the physical placement on the drives and utilize the operating system or volume manager software to realize performance gains. JBOD could be suitable as a low-cost solution for infrequently referenced data or certain data elements such as archives or backup copies.
Intelligent Controller. While a step above JBOD in performance and availability, a subsystem with an integrated controller is going to do more to offload host resources than to actually improve the overall database performance. Mirroring and striping can be performed at the controller level with this type of subsystem.
Integrated Cached Disk Array (ICDA). Being the most sophisticated of the storage class, it would be easy to just place all the database objects on the ICDA. And while this is a completely acceptable solution, it is also the most expensive. One way to mitigate the cost associated with using the ICDA is to mix physical drive sizes (larger, slower drives for archives) and a mixture of protection schemes (hardware-based mirroring on redo logs).
There are pros and cons for each class of storage subsystem. The goal in selecting a storage class is to mix and match the system types to gain the best price/performance ratio while giving greatest number of options with regard to data placement.
Controllers and channels
At minimum, we'll need two controllers within our storage subsystem--the more the better. Today, some high-end systems have as many as 16 independent front-end controllers. A high number of controllers allows for a high level of parallel I/O through the storage subsystem. Additionally, when we scale, we would like to scale as near linear as possible. By adding additional channels and controllers in a parallel fashion, this can be accomplished.
Another consideration is the debate over Internet Protocol (IP) and Fibre Channel (FC) connectivity. While iSCSI and other IP-based protocols are making great strides, only a small number of latency-tolerant database applications are currently making use of IP-based channel devices. iSCSI is being used primarily for storage network connectivity, which will certainly change as 2Gb and 10Gb IP infrastructures evolve.
Performance and availability
Performance gains come primarily by spreading the datafiles across as many physical drives as possible. This means concatenating or striping across all the drives on separate controllers to gain as much parallel access as possible, while avoiding striping back on the same drive and creating hot spots of high access density.
A logical volume manager residing on the host server not only eases management, but is also used to increase performance. Striping at the host allows parallel access from the host bus adapter (HBA) all the way down to physical drives--or cache slots--being striped across. By starting the parallel access at this level, greater performance can be achieved. In our example, I've chosen Veritas Volume Manager, but there are several other software programs for various platforms that accomplish the same task.
Availability should always be a primary concern when deciding where to place your data. DBAs will tell you that they can't afford to lose any data. While this may be true, there are ways to mitigate the risk of loosing data while balancing the high price of ICDAs. The first step is to use hardware or software to apply RAID protection. While RAID software is fine, it's preferable to make use of hardware RAID to reduce channel overhead, especially when mirroring.
|Key points to remember:|
Mixing and matching
The last storage consideration is making good use of the storage systems you already own. While it would be nice to buy the latest large-cached, high-available subsystem every time you launch a new database application, that's usually not reality.
It's more likely that you'll have to make the most of existing storage by freeing space on more expensive systems by moving less performance and availability critical data components to less costly storage gear.
In our example we use one Integrated Cached Disk Array (EMC/HDS-class) and one Intelligent Controller Array (LSI/Clariion-class). Although we're using EMC and LSI in the example, the important distinction is the class of storage. The high-end class includes EMC Symmetrix, HDS Lightning, and IBM Shark, while controller-based includes LSI, HDS Thunder, NetApp and others. Indexes, redo logs and high-performance tables are most appropriate for the cached array. We'll place the archives and low-performance tablespaces on the controller-based subsystem (see "Logically dividing database storage") which shows the subsystem placement for the database objects.
This was first published in February 2003