Home > Storage Technology News > Best practices in data warehouse storage infrastructures
Storage Technology News:
EMAIL THIS

Best practices in data warehouse storage infrastructures

By Stephen J. Bigelow, Features Writer
16 Jun 2006 | SearchStorage.com

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   

Today's business isn't just about putting products on the market. It's about getting the right product to the right consumer at the right time. But such precision requires much more than a solid product and keen business sense -- it takes business intelligence to cull vast reservoirs of seemingly disparate data and discern the trends and patterns that human expertise simply cannot see.

Data warehouses are the foundation of business intelligence, storing the huge databases, consumer records, and other data resources that data warehousing applications will draw from. Ultimately, storage plays a pivotal role in any data warehousing effort. "Think of 'information' as a needle in a haystack," says Arun Taneja, consulting analyst and founder of the Taneja Group. "Data warehousing collects a lot of 'hay' and looks for that 'needle'." Consequently, storage systems must provide ample capacity and performance. Let's take a look at the characteristics and considerations of a data warehousing storage.

Data warehouse storage characteristics

Generally speaking, a data warehouse can easily be assembled using the same drive types and storage arrays that service other aspects of the organization. The common objectives of high reliability, data integrity (e.g., RAID) and good storage performance should always be considered, but data warehouse workload patterns generally favor fast sequential reads, rather than the random I/O often encountered with file systems and database queries. Sequential read performance allows storage to efficiently stream vast amounts of information to the BI applications.

Data Warehouse Information
Office Depot gets data flowing

Building and deploying an enterprise data warehouse

In terms of disk choice, analysts note that disks should be selected to achieve a reasonable cost/performance tradeoff. High-end Fibre Channel (FC) disks running at 15,000 rpm can offer significant performance that may ideal for busy BI platforms that only have seconds to process information, such as finding relevant products for returning e-commerce site visitors. Still, the disks are expensive and their capacity is limited, forcing an even larger storage investment.

But high-end disks are not always necessary or appropriate. "Data warehouses are not update intensive," says Greg Schulz, founder and senior analyst at the Storage I/O Group. "Other than adding data to the warehouse, there are not a lot of transactions taking place. In some cases, a data warehouse is a step right before archiving." This means slower and less-expensive 10,000 rpm FC drives can be employed in a dedicated storage area networks (SAN). The use of nearline SATA drives has also become very appealing for many data warehouse systems. In fact, DATAllegro Inc. supplies dedicated data warehousing appliances based on enterprise-class SATA drives. DATAllegro's C-series appliance will soon be incorporating 500 GB 7,200 rpm Caviar RE drives.

Hardware vs. software

So is it better to use a dedicated data warehouse appliance with its own internal storage or select a data warehousing application to run on your existing storage infrastructure? The answer is "both," depending on your business objectives. The tradeoff is often in efficiency. For example, data warehouse implementations based on established database software like Oracle, IBM's DB2, Microsoft's SQL, or NCR's Teradata offer flexibility, often able to tackle a wide variety of business problems -- though sometimes at the expense of efficiency.

Conversely, a dedicated appliance may be tailored to handle specific or more complex business problems faster than software-only products. "If it [BI] becomes a larger problem, then optimized solutions like a Netezza Corp. or DATAllegro will always find a way to beat the regular, standard, run-of-the-mill solutions," Taneja says. The trick, Taneja says, is not achieving the fastest possible response time but getting the most benefit from each query. For example, getting a response two minutes sooner does not benefit the organization unless it translates into additional profit.

When considering an approach to data warehousing, it's also important to look ahead to changing business needs. There is certainly nothing wrong with Oracle Corp. or SQL handling data warehouse tasks on a SAN, but data warehouses are not static entities, and long-term scalability is always an issue. "The business always wants to put more data in, more users, more queries and growing at 60%-100% per annum in terms of data," says Stuart Frost, CEO at appliance provider DATAllegro. "Those [software-based] solutions very quickly get overwhelmed by demands from the business."

Cliff Longman, chief technology officer at software provider Kalido Inc. concedes that dedicated appliances can potentially provide some performance benefits for data warehousing tasks but warns that dedicated appliances can be adversely affected by changing business needs as well. "You should really weigh the business risk and benefits against the technical cost and opportunity for potentially improved performance," Longman says.

Go to the next page of this article for user case studies and future directions



Tags: Data management toolsVIEW ALL TAGS

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
Data management tools
Green storage essentials: Addressing power, cooling and space issues
Performance metrics: Evaluating your data storage efficiency
Tools and techniques for reducing your enterprise data storage footprint
School district maintains uptime with Xiotech, DataCore
Tools for using your enterprise data storage resources more efficiently
Enterprise data storage technologies rise from the dead
SAN sales boosted by need for storage efficiency
Thin provisioning brings utilization and capacity benefits to data storage, but with a caveat
Improving storage utilization with thin provisioning
Managing capacity planning with thin provisioning
Data management tools Research

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
application-aware storage  (SearchStorage.com)
capacity optimization  (SearchStorage.com)
compression artifact  (SearchStorage.com)
data classification  (SearchDataManagement.com)
data deduplication  (SearchStorage.com)
depository  (SearchStorage.com)
storage consolidation  (SearchStorage.com)
storage provisioning  (SearchStorage.com)
storage resource management (SRM)  (SearchStorage.com)
wide-area file services  (SearchStorage.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary



Backup Solution Directory
TechTarget Storage Media
Storage Magazine View this month\\'s issue and subscribe today.
Storage Decisions Apply online for free conference admission.
SearchStorage.com
HomeNewsMagazineTopicsLearningMultimediaWhite PapersBlogsEventsAbout Us

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2000 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts