Big data tutorial: Everything you need to know

'Big data' tutorial: Everything you need to know

Learn about big data technologies and architecture, vendor developments in the big data arena and what CIOs say are their biggest challenges when implementing big data in their storage environments.

"Big data" is here to stay. Even technologists who might have previously disparaged it as a buzzphrase will now acknowledge that the term and its accompanying technologies are evolving into real-world enterprise offerings and data center strategies. This big data tutorial is designed to get data storage managers up to speed on the conversations shaping the decisions many IT managers are making about big data. An increasing number of data sources -- such as social media -- and a growing number of media-rich data types -- such as X-rays and video -- are fueling the challenges associated with big data at companies that might never have thought of themselves as big data customers. The divide between analytics and storage in the world of big data is narrowing as data storage managers find themselves tasked with designing and managing big data infrastructures. View the content in our big data tutorial to learn more about these high-transaction environments, new scale-out technologies, rising I/O demands and the latest news on Hadoop.

Table of contents:

Big data technologies

Among the big data technologies you'll need to know are the Apache Software Foundations' Java-based Hadoop programming framework that can run applications on systems with thousands of nodes; and the MapReduce software framework, which consists of a Map function that distributes work to different nodes and a Reduce function that gathers results and resolves them into a single value. Also gaining more attention is the Apache Hive data warehousing component, which offers a query language called HiveQL that translates SQL-like queries into MapReduce jobs automatically. Finally, learn how Microsoft is trying to get in on the Hadoop action with its own SQL Server-Hadoop connectors. Check out the links below to get more in-depth information about each of these technologies for big data.

Related Content

John Webster on big data architecture

John Webster
John Webster, a senior partner at Boulder, Colo.-based Evaluator Group, offers a thorough explanation of how to manage big data storage environments and Apache Hadoop technology, and provides readers with alternatives to direct-attached storage (DAS) in Hadoop storage. This four-part video presentation begins with a high-level discussion of big data architecture and closes with a technical explanation of the Hadoop Distributed File System (HDFS) and NameNode in Hadoop architectures.

Related Content

Big data developments

How do you know what to focus on when it comes to big data? In this section of our tutorial on big data, we've selected the most crucial big data developments coming from the vendor sphere so data storage managers can stay up-to-date on issues surrounding the technology.

Related Content

CIOs: Big data challenges

CIOs have their own big data challenges: Many feel they should take the lead in identifying those patterns that might drive better business decisions. To do this, they'll need to add to their company's big data skill set by hiring data scientists, mathematicians and information architects. Convincing the business that big data governance is a concern for the executive suite and even the boardroom is another important issue. Find out more about the challenges CIOs face with big data in this special collection of news and analysis from

Related Content

Big data video

Whether it's live footage from the EMC World showroom floor or analysis from a variety of technology experts, our big data video section offers a behind-the-scenes glimpse into what users and analysts think of big data.