What are key bottlenecks for data warehousing/business intelligence applications?
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
Data warehousing and business intelligence applications (decision support) tend to be based on sometimes extremely large databases. They can be up to terabytes in size. Disk I/O is always the biggest bottleneck using these applications. Ad Hoc queries or cubed queries against these databases means the server needs to wade through millions of rows of data tables looking for matches for the type of query used. This means mostly read access is required for the back end disk subsystem.
These reads may at times be sequential but depending on the query type, can also be random. You need a disk subsystem that has loads of cache, does random read pre-fetching very well and has multiple connection ports. Your database server should use multiple HBAs connected to multiple ports on the storage array. For best performance, spread the I/O load across as many physical spindles inside the array as much as possible. Use a fast file system on the host and create LUNs for the database that is spread across the HBAs. Data warehouse applications are one of the applications that benefit from 2Gbit connections. Since you will be trying to pull huge amounts of data from the disks into system memory, high THROUGHPUT is what you're looking for.You want to be able to pull as many MBs per second out of your array. Use a subsystem with a large amount of front-end cache. Use a subsystem that utilizes an efficient read ahead cache algorithm. Use 15K RPM spindles that have a large on-board cache. (The good ones have a 4MB cache on board).
Last but not least, use a server that uses a 64bit operating system, and 64bit processors, has gobs of system cache and an extremely fast internal architecture with multiple PCI-X busses if possible.
Editor's note: Do you agree with this expert's response? If you have more to share, post it in one of our .bphAaR2qhqA^0@/searchstorage>discussion forums.
Related Q&A from Christopher Poelker
RAID can allow for better storage performance and higher availability, and there are many different RAID types. Read a comparison of RAID levels, as ...continue reading
SAN expert Chris Poelker discusses how to change the size of a LUN in a Microsoft cluster server environment.continue reading
SAN expert Chris Poelker compares connecting a SAN with wavelength cabling and dark fiber and discusses the pros and cons of each.continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.