This tip is brought to you by SearchStorage.com. Click to search through more storage tips.
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
Performance is a major issue for databases. In large databases it permeates everything from the design of the database to the design of the storage the database runs on. Because of the extraordinary demands an active database puts on computing resources, it is worthwhile to try to optimize everything.
But optimization assumes a target for which you can optimize, and, unfortunately, many databases offer no such target. Or, more correctly, there are multiple targets because several kinds of operations are being performed against the same database (reads, writes, transactions, reports, etc.), all with relatively high priority. Each of these operations has different requirements and all too often the requirements have contradictory needs. And, of course, usage patterns on many databases aren't stable over time. A database may start out being transaction heavy and gradually acquire more and more use for generating reports. Or the average size of objects in the database may change over time. All these things make it hard to optimize the storage arrangements for the database.
One possible solution to database optimization is to store the information in different databases, such as a production database and a data warehouse, and optimize each for the operations to be performed on them.
A more radical solution is to choose high-performing storage architectures and use them in spite of the special characteristics. That idea got its first big push several years ago when IBM ran a series of tests on a DB2 database running on an ESS (a.k.a. "Shark") storage system using different storage strategies. IBM found that it got the best performance when it spread the data over as many RAID arrays as possible, intermixing table, index and tempspace across all the disks to avoid unbalanced I/O loads. (You can find this mentioned in an IBM white paper on the Red Brick data warehouse).
One such approach is SAME -- or Stripe And Mirror Everything. SAME is the brainchild of Juan Loaiza, vice president of Oracle's Systems Technology Group. As the name implies, the SAME approach consists of striping and mirroring all the data of all types across all the available drives. The result, Loaiza says, is to maximize individual disk performance as well as the I/O bandwidth of the storage system, and equalizing the load on all the disks.
Striping and Mirroring, which spreads data across multiple spindles and makes a mirror copy for data protection, is widely recognized as one of the fastest storage schemes available for databases on hard disks. Not only is it fast on reads and writes, it provides nearly instantaneous recovery in the case of a failure. SAME uses the features in modern storage-management software and controllers to optimize the performance from the system.
SAME is controversial because it is counter-intuitive. It seems obvious that careful placement of files and tuning of storage parameters can outperform a simple stripe and mirror. SAME's advocates admit this is true -- under certain conditions. Those conditions include an application carefully written to avoid disk contention, never needing the full bandwidth of all the disks and that no disk ever becomes a hot spot. In the real world those conditions are unlikely to hold, at least for any length of time, so it makes more sense to adopt an approach that gives you 90 percent of maximum performance on all the major parameters almost all the time rather than laboriously tuning everything.
There is more to SAME than this. For example the SAME approach involves more than just striping and mirroring. For example, SAME puts the stripes containing 'hot', or frequently used, data on the outside area of each disk where the sectors are the densest and performance is highest. 'Colder' data goes on partitions striped across the inner parts of the disks. Similarly, SAME gets some of its advantages from features built into Oracle databases, such as automatically treating sequential and random operations differently.
Of course, one reason SAME works so well on databases is that storage is usually not the major determinant of database performance. While truly inept decisions on storage block sizes, RAID levels and such can severely degrade database performance, the major bottlenecks are usually found in the design of the database itself. The wrong block size, for example, has a minimal effect on performance compared to something like poor table design or inept SQL coding.
Not everyone is sold on SAME, even for Oracle databases. The critics point out that the SAME methodology is based on a number of assumptions about the nature of the databases and their storage problems. Those assumptions don't hold true in many cases, they say, and as a result, SAME produces less-than-optimum results in many cases. For one such criticism see the white paper titled "Proactive Oracle Storage Management -- A Method To Predictable Performance Management" from Quest Software.
Rick Cook has been writing about mass storage since the days when the term meant an 80K floppy disk. The computers he learned on used ferrite cores and magnetic drums. For the last twenty years he has been a freelance writer specializing in storage and other computer issues.