This content is part of the Essential Guide: Big data tutorial: Everything you need to know

Understanding stripped-down hyperscale storage for big data use cases

Stripped-down hyperscale storage provides rapid, efficient expansion to handle big data use cases such as Web serving and database applications.

The term hyperscale storage is coming into vogue to describe systems capable of rapid, efficient expansion to handle massive quantities of data from Web-serving, database, data analysis, high-performance computing and other especially busy applications.

Technical specifications aren't the only things that differentiate hyperscale storage from traditional enterprise systems. The distinctions also extend to the mindset of the buyer or end user, such as Facebook and Google, according to Dave Vellante, chief analyst and co-founder of Wikibon, a community-focused research firm based in Marlborough, Mass.

In this podcast interview with TechTarget senior writer Carol Sliwa, Vellante explored use cases and real-world examples, adoption trends and lessons that a typical IT shop can take away from the practitioners of hyperscale storage.

How do you define hyperscale storage, and what are the main use cases?

Dave Vellante: I just try to keep it simple. Hyper, to me, is excessive, and scale implies growth. So, hyperscale is a system capable of growing from really small to very large in a very quick and efficient manner.

As far as the main use cases, I think it's predominantly Web and Web serving, but increasingly, it's a lot of database applications. In particular, ad serving is very popular, and anybody who has a massive storage repository, like service-provider storage, [might take the hyperscale approach]. You also see high-performance computing (HPC) practitioners moving into that hyperscale space. There are a lot of analytics and big data use cases. These are bleeding into traditional enterprises, like financial services for fraud detection. And then there's always government, like three-letter agencies, doing big-data types of work.

From a technical standpoint, in what ways does hyperscale storage differ from traditional enterprise data storage?

Vellante: The differences are both in the attributes and the technical characteristics of the system, but also [in] the mindset of the buyer and the practitioner. The first difference I would cite is the scale itself. We're talking petabytes versus terabytes. The hyperscale guys are typically serving millions or sometimes hundreds of millions of users with a single or a very small set of apps, and it's just the opposite in the enterprise, [where] it's hundreds or thousands of apps to hundreds or thousands of users.

I guess the other point is [that] hyperscale storage, and hyperscale in general, tends to be stripped down. It’s de-frilled versus beefed up and purpose-built configurations. An example is redundant power supplies. Oftentimes, the hyperscale guys don't even bother with their redundant power supplies. They just wait until it breaks and throw it out.

The other attribute of hyperscale is that it's very much software-led. We hear a lot about software-defined these days, and the hyperscale guys kind of invented that concept. When you think about a [traditional] storage product, software function is very much tied to the hardware. It's very much embedded in there.

I'd also say the hyperscale crowd is focused on super-high degrees of automation and eliminating any human involvement, whereas the traditional enterprise is trying to make IT easier to manage but largely by humans. The mindset is different in the hyperscale world. Hyperscale folks will spend a lot of time and effort through engineering to try to save money or, in many cases, make money on, for instance, a specialized app, whereas the enterprise guys will spend money on packaged solutions that will save them time across the application portfolio and [serve as] more of a general manager.

I'll give you an example. Facebook might engineer a software-based solution that allows it to upload pictures really fast for that one application, whereas a traditional enterprise shop might buy an off-the-shelf storage system to meet the needs of many applications across the portfolio.

Which products or types of products qualify as hyperscale storage?

Vellante: Products that scale out and are very simple, commodity-based hardware can fit into a rack and help minimize the cost of managing that rack over its life by essentially eliminating humans. For example, when a device in that rack dies, [hyperscale users] switch it through software to another device, let the failed device die and eventually the rack die, and then throw it in the wood chipper.

The traditional IT guys are going to try to get as much of a return on that asset as possible, so they'll replace failed drives and power supplies, and they'll buy more sophisticated and expensive systems to meet those goals.

The types of products that fit into the model I just mentioned are very much de-frilled. A lot of times, they're just hard drives, or you can think of it as, for instance, a Fusion-IO card. What makes it hyperscale is the software that is layered on top of it that can manage the entire operation, the entire infrastructure, as opposed to the situation where you're layering in component software -- for example, storage tiering software as a component that's added on to an existing infrastructure as sort of a one-off component. What makes it hyperscale is that software-led mindset and approach.

How would you characterize the adoption level of hyperscale storage today versus what you expect it will be five years from now?

Vellante: The adoption is concentrated today among the Web giants, such as Facebook, Google, Amazon, Microsoft and, I think, Apple to a certain extent. So, the adoption is mixed. But if you look at Facebook's Open Compute Project, it's a classic hyperscale example of a standard architecture: no frills, software-led, Seagate disk drives and off-the-shelf chips. I think that's what's driving the model today. They buy a lot, but it's up and down and very concentrated on a few Web giants. It's hitting critical mass.

It's also very bifurcated. Some customers will buy hyperscale solutions, if you will, and others won't. For example, Google will actually build its own flash drives and put power supplies in systems.

And there's no question that a lot of traditional IT buyers are looking at hyperscale trends. As I said, it's bleeding into the IT world, the traditional world, and starting to [see adoption], especially in financial services and cloud service providers that want to compete, for example, with Amazon. I think in five years, you'll see many more classic hyperscale examples, not only in the Web but from traditional IT adopting hyperscale-like techniques. My prediction is that traditional IT suppliers will begin to adopt techniques and sell software-only function that will run on de-frilled, commodity, scale-out hardware, and eventually these two worlds will collide.

What sorts of lessons can the average IT shop take away from users of hyperscale storage?

Vellante: The first thing CIOs can take away from hyperscale is that IT actually does matter. Remember when Nick Carr wrote his now-famous book Does IT matter? The premise was that IT can't deliver a sustainable, competitive advantage. Around that time we saw a huge value creation coming from technology companies that were users of technology, like Google, Amazon and PayPal. These are technology consumers that are clearly using IT as a competitive weapon. And then following them were companies like Facebook, Twitter and LinkedIn, all technology practitioners that are using hyperscale techniques.

What CIOs can learn is that these techniques can dramatically drive costs down. We've seen situations where people are applying object stores in hyperscale, cutting costs relative to traditional storage by 75%. The point is that the next big wave is going to be data as a competitive advantage. That's where CIOs want to put their resources. They don't want to spend a lot of money and time worrying about nondifferentiated heavy lifting of infrastructure. Infrastructure in this new world, from a value-creation standpoint, will be largely irrelevant. What will be relevant is the way in which, for example, you leverage data, use data as the source of competitive advantage and monetize data. In my view anyway, traditional IT organizations want to spend less time and money managing IT infrastructure and more time creating value around data.

Dig Deeper on Big data storage