With all the talk about cloud and big data, it’s hard to tell which comes first; but it just might be a cloud foundation that enables big data applications.
The seemingly inexorable march toward cloud storage has run into a detour called “big data.” It’s not that the cloud drifted away or disappeared, but the buzz about big data has captured the attention of tape, storage and database vendors, and just about everyone else in the data storage industry. Obviously, if you sell gear, you like the idea of “big data” because it means users will need more gear to handle all that big data. But what is big data?
In the long run, the term big data is squishy enough that it may not amount to much more than a passing marketing term, but when you step back and view the big picture, “big data” is all about what you do with digital information, and that’s an underlying requisite driving nearly every IT change unfolding now.
Disruptive trends in IT have always revolved around new ways to interact with information. The new capabilities that are unlocked are usually so important that we’re willing to make huge tradeoffs to get them. We gave up solid management and availability in the shift to distributed computing, and relinquished reliability and speed in deference to Internet-based operations. Such major shakeups have allowed us to do more with information, and to do it more easily. Big data is about taking the integration and use of information to an entirely new level. It may range from new high-performance analytics across large quantities of structured or unstructured data, to new layers of content analysis, to making better use of general metadata on massive amounts of currently stored and often little-used data. It may also involve bringing together digital information from many different data sets.
The problem is that we’re looking at big data as the driver of other technologies. At EMC World, there was some notable signage that showed cloud on top of big data. The way I see it, cloud doesn’t sit on top of big data -- it’s more a case of the other way around. More importantly, the success of cloud will be highly dependent on whether it delivers the right capabilities to unlock big data. And we’re hardly there yet.
In the last few years, wherever infrastructure technologies -- networks, storage, protection, servers, applications -- intersect, the industry has taken great strides in simplifying and making the infrastructure more efficient. But the next major challenge is information, and we have a long way to go. Consider this: How is your infrastructure enabling your IT programmers and business to do new things with information, and to use it more effectively and efficiently than ever before?
Some vendors’ responses to that question are a little dubious. They usually involve big stacks of technology from big vendors that look like the big engine options revved up for the information speedway. That’s not to say those solutions won’t work, but every big vendor has a stack, and they all need that stack to be competitive with the other big guys. But for users, even though the big engine looks slick, it may not provide the agility for your business to maneuver its way around the hairpin turns of big data.
It isn’t good enough to build big sets of infrastructure and then have a team of technologists figure out what to do with the information related to the infrastructure. The cloud is about creating linkages between information and the infrastructure so the infrastructure can be harnessed to turn information to new purposes. Vendors can help by providing the enabling underpinning for applications and business users. And if the vendors that currently dominate the storage market want to dominate this new space too, they’d better move fast because lower profile startups like Cloudera, Eucalyptus Systems, the OpenStack consortium and others are making some noise with their innovative technologies. In addition, a potentially formidable competitor (VMware) lurks in the wings. VMware is wielding a number of secret weapons like Cloud Foundry and SpringSource that may someday become the building blocks for the right scale-out and orchestration technologies in this next generation of IT evolution.
There’s a lot of value in capacity optimizing big data sets, cost-effectively storing big data sets, and delivering tools and technologies that keep the physical infrastructure matched to the scaling demands of big data. But making big sets of data into “big data” means turning those digital bits into something exponentially more useful to the business. This isn’t to say the widgets that support big data -- like capacity optimization, virtualization and automation -- don’t have value; but to deliver on “big data” they need to come together into something that’s more than the sum of the parts. When it comes to big data, you have to consider not only how well a vendor’s converged infrastructures enable efficiency across the many different domains of the IT infrastructure, but how well their vision paints a picture of an infrastructure that changes how your business creates and uses information.
BIO: Jeff Boles is a senior analyst at Taneja Group.