Best practices in data warehouse storage infrastructures

Stephen J. Bigelow, WinIT

Today's business isn't just about putting products on the market. It's about getting the right product to the right consumer at the right time. But such precision requires much more than a solid product and keen business sense -- it takes business intelligence

Requires Free Membership to View

to cull vast reservoirs of seemingly disparate data and discern the trends and patterns that human expertise simply cannot see.

Data warehouses are the foundation of business intelligence, storing the huge databases, consumer records, and other data resources that data warehousing applications will draw from. Ultimately, storage plays a pivotal role in any data warehousing effort. "Think of 'information' as a needle in a haystack," says Arun Taneja, consulting analyst and founder of the Taneja Group. "Data warehousing collects a lot of 'hay' and looks for that 'needle'." Consequently, storage systems must provide ample capacity and performance. Let's take a look at the characteristics and considerations of a data warehousing storage.

Data warehouse storage characteristics

Generally speaking, a data warehouse can easily be assembled using the same drive types and storage arrays that service other aspects of the organization. The common objectives of high reliability, data integrity (e.g., RAID) and good storage performance should always be considered, but data warehouse workload patterns generally favor fast sequential reads, rather than the random I/O often encountered with file systems and database queries. Sequential read performance allows storage to efficiently stream vast amounts of information to the BI applications.

Data Warehouse Information
Office Depot gets data flowing

Building and deploying an enterprise data warehouse

In terms of disk choice, analysts note that disks should be selected to achieve a reasonable cost/performance tradeoff. High-end Fibre Channel (FC) disks running at 15,000 rpm can offer significant performance that may ideal for busy BI platforms that only have seconds to process information, such as finding relevant products for returning e-commerce site visitors. Still, the disks are expensive and their capacity is limited, forcing an even larger storage investment.

But high-end disks are not always necessary or appropriate. "Data warehouses are not update intensive," says Greg Schulz, founder and senior analyst at the Storage I/O Group. "Other than adding data to the warehouse, there are not a lot of transactions taking place. In some cases, a data warehouse is a step right before archiving." This means slower and less-expensive 10,000 rpm FC drives can be employed in a dedicated storage area networks (SAN). The use of nearline SATA drives has also become very appealing for many data warehouse systems. In fact, DATAllegro Inc. supplies dedicated data warehousing appliances based on enterprise-class SATA drives. DATAllegro's C-series appliance will soon be incorporating 500 GB 7,200 rpm Caviar RE drives.

Hardware vs. software

So is it better to use a dedicated data warehouse appliance with its own internal storage or select a data warehousing application to run on your existing storage infrastructure? The answer is "both," depending on your business objectives. The tradeoff is often in efficiency. For example, data warehouse implementations based on established database software like Oracle, IBM's DB2, Microsoft's SQL, or NCR's Teradata offer flexibility, often able to tackle a wide variety of business problems -- though sometimes at the expense of efficiency.

Conversely, a dedicated appliance may be tailored to handle specific or more complex business problems faster than software-only products. "If it [BI] becomes a larger problem, then optimized solutions like a Netezza Corp. or DATAllegro will always find a way to beat the regular, standard, run-of-the-mill solutions," Taneja says. The trick, Taneja says, is not achieving the fastest possible response time but getting the most benefit from each query. For example, getting a response two minutes sooner does not benefit the organization unless it translates into additional profit.

When considering an approach to data warehousing, it's also important to look ahead to changing business needs. There is certainly nothing wrong with Oracle Corp. or SQL handling data warehouse tasks on a SAN, but data warehouses are not static entities, and long-term scalability is always an issue. "The business always wants to put more data in, more users, more queries and growing at 60%-100% per annum in terms of data," says Stuart Frost, CEO at appliance provider DATAllegro. "Those [software-based] solutions very quickly get overwhelmed by demands from the business."

Cliff Longman, chief technology officer at software provider Kalido Inc. concedes that dedicated appliances can potentially provide some performance benefits for data warehousing tasks but warns that dedicated appliances can be adversely affected by changing business needs as well. "You should really weigh the business risk and benefits against the technical cost and opportunity for potentially improved performance," Longman says.

Go to the next page of this article for user case studies and future directions

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: