Data warehouses are the foundation of business intelligence, storing the huge databases, consumer records, and other data resources that data warehousing applications will draw from. Ultimately, storage plays a pivotal role in any data warehousing effort. "Think of 'information' as a needle in a haystack," says Arun Taneja, consulting analyst and founder of the Taneja Group. "Data warehousing collects a lot of 'hay' and looks for that 'needle'." Consequently, storage systems must provide ample capacity and performance. Let's take a look at the characteristics and considerations of a data warehousing storage.
Data warehouse storage characteristics
Generally speaking, a data warehouse can easily be assembled using the same drive types and storage arrays that service other aspects of the organization. The common objectives of high reliability, data integrity (e.g., RAID) and good storage performance should always be considered, but data warehouse workload patterns generally favor fast sequential reads, rather than the random I/O often encountered with file systems and database queries. Sequential read performance allows storage to efficiently stream vast amounts of information to the BI applications.
But high-end disks are not always necessary or appropriate. "Data warehouses are not update intensive," says Greg Schulz, founder and senior analyst at the Storage I/O Group. "Other than adding data to the warehouse, there are not a lot of transactions taking place. In some cases, a data warehouse is a step right before archiving." This means slower and less-expensive 10,000 rpm FC drives can be employed in a dedicated storage area networks (SAN). The use of nearline SATA drives has also become very appealing for many data warehouse systems. In fact, DATAllegro Inc. supplies dedicated data warehousing appliances based on enterprise-class SATA drives. DATAllegro's C-series appliance will soon be incorporating 500 GB 7,200 rpm Caviar RE drives.
Hardware vs. software
So is it better to use a dedicated data warehouse appliance with its own internal storage or select a data warehousing application to run on your existing storage infrastructure? The answer is "both," depending on your business objectives. The tradeoff is often in efficiency. For example, data warehouse implementations based on established database software like Oracle, IBM's DB2, Microsoft's SQL, or NCR's Teradata offer flexibility, often able to tackle a wide variety of business problems -- though sometimes at the expense of efficiency.
Conversely, a dedicated appliance may be tailored to handle specific or more complex business problems faster than software-only products. "If it [BI] becomes a larger problem, then optimized solutions like a Netezza Corp. or DATAllegro will always find a way to beat the regular, standard, run-of-the-mill solutions," Taneja says. The trick, Taneja says, is not achieving the fastest possible response time but getting the most benefit from each query. For example, getting a response two minutes sooner does not benefit the organization unless it translates into additional profit.
When considering an approach to data warehousing, it's also important to look ahead to changing business needs. There is certainly nothing wrong with Oracle Corp. or SQL handling data warehouse tasks on a SAN, but data warehouses are not static entities, and long-term scalability is always an issue. "The business always wants to put more data in, more users, more queries and growing at 60%-100% per annum in terms of data," says Stuart Frost, CEO at appliance provider DATAllegro. "Those [software-based] solutions very quickly get overwhelmed by demands from the business."
Cliff Longman, chief technology officer at software provider Kalido Inc. concedes that dedicated appliances can potentially provide some performance benefits for data warehousing tasks but warns that dedicated appliances can be adversely affected by changing business needs as well. "You should really weigh the business risk and benefits against the technical cost and opportunity for potentially improved performance," Longman says.
Go to the next page of this article for user case studies and future directions