Q

Understanding big data analytics and data warehousing

Understanding big data analytics, and how it differs from data warehousing, depends on time to information, content complexity and cost.

This Content Component encountered an error

What's the best way to distinguish big data analytics from data warehousing?

My answer will likely differ from those who like the "Three Vs" approach -- volume, velocity and variety -- to big data analytics. I think of three distinguishing characteristics as well, just not the same ones. Distributed computing architectures became the platforms for alternative ways to do analytics versus a traditional data warehouse for three reasons:

Time to information. While an immediate response to a query isn't an absolute requirement, a response time of five seconds or fewer is often desired and can be delivered by a distributed computing cluster running Hadoop MapReduce, for example. Traditional data warehouses are burdened by the perception that a batch process runs overnight to produce results that are available to decision makers the following morning. Another concept popular with new analytics platforms is the ability to do reiterative queries -- i.e., run a query, get results, then run a second query against those results and/or converge with others, for example. Data warehouses aren't known for ad hoc querying. However, data warehousing vendors are catching up in these areas.

Complex content. Again, the perception is that a traditional data warehouse runs analytics processes against structured database data -- a small subset of the kinds of data business executives want to get their analytical arms around. The new analytics platforms can process unstructured data generated, for example, on the Web and via mobile devices, as well as structured data. However, traditional data warehouse vendors are catching up here as well.

Cost. Once you add up the costs for hardware and software, a traditional data warehouse is relatively expensive compared to a 60-node Hadoop cluster built on commodity server racks and open source software.

Hadoop and platforms like it came about specifically because a traditional data warehouse couldn't, for a relatively low cost, scale to handle large, complex data sets and deliver sub-five-second response times.

This was first published in November 2012
This Content Component encountered an error

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchSolidStateStorage

SearchVirtualStorage

SearchCloudStorage

SearchDisasterRecovery

SearchDataBackup

Close