Essential Guide

The evolution of data center storage architecture

A comprehensive collection of articles, videos and more, hand-picked by our editors
Q

Are SSDs necessary to perform analytics on big data efficiently?

Storage expert Phil Goodwin explains how SSDs can speed up analytics, but solid-state technology is only needed in certain types of environments.

Are solid-state drives necessary to perform analytics on big data efficiently?

To begin with, let me describe the three different kinds of solid-state drive (SSD) deployments. The first is server-side cache where you have a SSD that's installed directly into the server. The second is storage-side cache, which I would characterize as Tier 0, where you have SSD in a specific layer in the array that's used with automated storage tiering. And then the third is an all-SSD storage array.

Now, to answer the question directly, are SSDs necessary to perform analytics on big data efficiently, the answer is no, but it depends on whether or not your environment is CPU-bound or I/O-bound. In analytics there are two important components to this: processing and I/O. If you're CPU-bound, or very heavily on the processing side, then more I/O isn't going to buy you that much -- you really need a faster processor. On the other hand, if you're reading huge amounts of data -- recursively you're pulling in a lot of data from sequential reads -- and things like that, then you definitely could be I/O-bound, and SSD is certainly going to be helpful in performing big data analytics efficiently.

So if an environment is I/O-bound, the question is, which SSD deployment is better? In many cases, if it's a situation where you're reading data over and over again, then you're going to be better off with the server-side cache or the Tier 0. On the other hand, if you're reading huge amounts of data and it's sequential rather than recursive, you may actually be better off with an all-SSD storage array, where you're getting massive amounts of performance all across the data set, and in so doing, you can really get some pretty amazing performance results from that kind of configuration.

About the author:
Phil Goodwin is a storage consultant and frequent TechTarget contributor.

This was last published in February 2014

PRO+

Content

Find more PRO+ content and other member only offers, here.

Essential Guide

The evolution of data center storage architecture

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

SSDs are a natural fit for random workloads with a dataset size of <64k. Traditional mechanical disks may often be a better for for DW/BI workloads with careful choice of RAID types. But check out the Microsoft white paper "Perfromance Evaluation of XtremIO" where an EMC ExtremeIO all-flash (SSD) array exceeds expectation in a SQL Server 2012 BI environment http://msdn.microsoft.com/en-us/library/dn495615.aspx (...I am an EMC employee, but I actually test this stuff for real ...http://www.linkedin.com/profile/view?id=43664640&trk)
Cancel

-ADS BY GOOGLE

SearchSolidStateStorage

SearchConvergedInfrastructure

SearchCloudStorage

SearchDisasterRecovery

SearchDataBackup

Close