This tip is brought to you by SearchStorage.com. Click to search through more storage tips.
Typically the world of databases is
Take the database at YP.Net, a leading online yellow pages company. The company's 4 terabyte database contains more than 16 million records of businesses in the United States and Canada and is accessed more than 3 million times a day by users looking up businesses, with each look up usually involving a few fairly small records (typically less than 100,000). This kind of user load is typical of a transactional database. However in a conventional transactional database, the activity consists of a mixture of reads and writes because the records are changed frequently. With a 'read-only transactional database' like YP.Net's, records are updated infrequently. Most of the write activity consists of loading new businesses into the database at predetermined intervals.
"You've got to think in terms of penalties," says YP.Net's Chief Technology Officer John Raven in explaining how he optimized the company's database for high performance.
The secret to designing a storage architecture, according to Raven, is to match the storage to the particular task at hand. Since speed of access is critical to YP.Net's business, the storage architecture has to be optimized for it. Write speed is much less critical. As a result, Raven says, the company deliberately sacrificed write speed for faster read speed.
For example, YP.Net uses NAS storage rather than a SAN. The file-oriented nature of NAS storage produces faster reads of the high volume of small records in a bursty environment than would a block-oriented SAN. The other advantage, Raven says, is that he doesn't need people who are familiar with Fibre Channel as well as Ethernet.
To further increase read speed, the database is 'striped wide' -- with records spread across multiple controllers -- rather than 'striped deep' as in a conventional transactional database. A striped wide architecture has parts of each record on disk arrays managed by several controllers. Striped deep architecture has the entire record stored in the array managed by a single controller (if possible). That speeds up reads at the cost of slowing down writes, Raven explains, since a commit across multiple controllers is considerably slower than a commit involving a single controller.
Disk selection is another important factor in optimizing performance. Since the database consists of many small records, YP.Net prefers spindles to platters -- a lot of relatively low capacity hard disks rather than fewer, higher capacity disks. Higher capacity disks might give better performance with large files, but all other things being equal, more spindles mean faster access, especially with smaller records. The only problem, Raven says, is that it is getting hard to get the 18 gig drives with high rotational speeds because manufacturers are moving to much higher capacity drives.
Of course, storage isn't the only consideration in designing an architecture. Raven pointed out that in designing a highly optimized system, every layer, from the application down to the hardware has to be chosen and tuned to support the ultimate goal.
For more information:Tip: NAS pays dividends for bank conglomerate
Tip: What's the best network storage for databases?
Tip: Next-generation NAS
About the author: Rick Cook has been writing about mass storage since the days when the term meant an 80K floppy disk. The computers he learned on used ferrite cores and magnetic drums. For the last twenty years he has been a freelance writer specializing in storage and other computer issues.
This was first published in March 2004