Essential Guide

Get started Bring yourself up to speed with our introductory content.

Big data tutorial: Everything you need to know

Learn about big data technologies and architecture, vendor developments in the big data arena and what CIOs say are their biggest challenges when implementing big data in their storage environments.


Big data is here to stay. Even technologists who might have previously disparaged it as a buzz word will now acknowledge that the term and its accompanying technologies are evolving into real-world enterprise offerings and data center strategies. This big data tutorial is designed to get data storage managers up to speed on the conversations shaping the decisions many IT managers are making about big data technology. An increasing number of data sources -- such as social media -- and a growing number of media-rich data types -- such as X-rays and video -- are fueling the challenges associated with big data at companies that might never have thought of themselves as big data customers. The divide between analytics and storage in the world of big data is narrowing as data storage managers find themselves tasked with designing and managing big data infrastructures. In addition, big data sets that include company-sensitive and personal data have unique security and compliance requirements that managers need to adhere to. View the content in our big data storage tutorial to learn more about these high-transaction environments, new scale-out technologies, rising I/O demands and the latest news on Hadoop.

1Essential tools-

Big data technologies

Among the big data technologies you'll need to know are the Apache Software Foundations' Java-based Hadoop programming framework that can run applications on systems with thousands of nodes; and the MapReduce software framework, which consists of a Map function that distributes work to different nodes and a Reduce function that gathers results and resolves them into a single value. Also gaining more attention is the Apache Hive data warehousing component, which offers a query language called HiveQL that translates SQL-like queries into MapReduce jobs automatically. In addition to Hive, more vendors are trying to get in on the Hadoop action with their own SQL Server-Hadoop connector engines. Check out the links below to get more in-depth information about each of these technologies for big data, and how newer technologies are reversing some of the big data challenges enterprises once faced.


A comparison of Hadoop and MapReduce

Hadoop and MapReduce are two technologies that are independent of each other -- but when it comes to storing and analyzing big data sets, they work well together. Continue Reading


Using Hadoop to get a jump on big data analytics issues

Hadoop can be a cost-effective way to perform analytics on big data, but only with an adequate knowledge of how to use it. Continue Reading


Sidestepping Hadoop cluster performance issues

As Hadoop matures, bogged down performance is still a concern. Here are some workarounds when performance in your big data environment is a big concern. Continue Reading


New services add innovation to Hadoop's storage infrastructure

Improvements to Hadoop Distributed File System and proprietary alternatives are beginning to address some of the shortcomings of the framework. Continue Reading


Catching up with the Apache Software Foundation's Hadoop and Hive

Hadoop may be one of the most notable big data technologies, but Apache Hive is the query language that makes working with that technology easier. Continue Reading


How to choose the right SQL-on-Hadoop engine

It's much easier for many organizations to use Hadoop with SQL-style querying, and there are now numerous engines that allow this to be done. Continue Reading


Hadoop helps process unstructured big data in warehouse architectures

The Hadoop framework started as a way to manage big data, but is starting to bleed into data warehousing as well. Continue Reading


Big data technologies work to relieve stress of performance, capacity

Vendors such as RainStor, GridIron and Quantum produce technologies that aim to address some of the common problems associated with storing big data. Continue Reading

2Expert video-

John Webster on big data architecture

John Webster, a senior partner at Evaluator Group based in Boulder, Colo., offers a thorough explanation of how to manage big data storage environments and Apache Hadoop technology, and provides readers with alternatives to DAS in Hadoop storage. This four-part video presentation begins with a high-level discussion of big data architecture and closes with a technical explanation of the Hadoop Distributed File System (HDFS) and NameNode in Hadoop architectures.


Explaining the science of big data management

Storage platforms for big data are becoming increasingly important as they have to house a large amount of capacity, but also handle business-critical analytics.


Hadoop compute clusters and data storage

Hadoop is the most talked about big data framework, though there are commercial versions available that aim to address some of its shortcomings.


Considerations for choosing Hadoop storage

Using DAS isn't the only option for performing analytics on big data. Here's how to evaluate alternatives to the direct-attached model.


The role of HDFS and NameNode in Apache Hadoop architectures

Hadoop's file system uses a "cheap and deep" method, meaning it allows commodity hardware to be scaled to petabytes for a low cost.

3Market updates-

Big data vendor trends

How do you know what to focus on when it comes to big data? In this section of our big data tutorial, we've selected the most crucial trends coming from the vendor sphere so data storage managers can stay up to date on popular storage options in the industry.


Data lake architecture eases big data management

Data lake architectures are becoming more popular in big data environments for their scalable nature and ability to house multiple types of data sets in one place. Continue Reading


Vendors offer Hadoop connectors as part of their big data management strategies

Hadoop connector software acts as a link between databases and Hadoop clusters, and vendors such as IBM and Microsoft are using them as part of their big data strategies. Continue Reading


HGST launches helium drive-based Active Archive

The vendor's object storage archive platform is aimed at big data environments where data is accessed infrequently but large amounts of capacity are essential. Continue Reading


Hyperscale storage systems attract big data use cases

Hyperscale storage, made popular by Web giants such as Facebook, is growing in adoption when it comes to use cases such as Web serving and database applications. Continue Reading


Object storage attracts big data in M&E

The media and entertainment industry often works with media-rich big data, and object storage vendors say their products are a good fit. Continue Reading

4CIO commentary-

CIOs: Big data challenges

CIOs have their own big data challenges: Many feel they should take the lead in identifying those patterns that might drive better business decisions. To do this, they'll need to add to their company's big data skill set by hiring data scientists, mathematicians and information architects. Evaluating the business value that big data and analytics can provide is difficult, and building architecture with adequate capacity and performance is a touchy task. Furthermore, convincing the business that big data governance is a concern for the executive suite and even the boardroom is an important issue. Find out more about the challenges CIOs face with big data in this special collection of news and analysis from


Firms must overcome big data, analytics bias to make better business decisions

Data experts say human tendency to hoard data and to get stuck in an old way of doing things often presents businesses from making innovative decisions around big data. Continue Reading


Break down data silos for better analysis of big data

Siloed data has plagued effective analysis of data for years, but in big data environments it becomes even more troublesome. Continue Reading


CIOs beware: Big data is not little data on steroids

According to one CIO expert, thinking about big data as a larger version of small data sets can be detrimental to computing and analysis. Continue Reading


How to gain business value from big data environments

By moving away from thinking of big data in terms of size, CIOs can get a better grip on the business value of storing and analyzing data sets. Continue Reading


Experts address big data pitfalls, factors for success

According to a number of analysts, there are some best practices that can help businesses make it or break it in the world of big data. Continue Reading


Jumping-off point for CIOs and big data starts with architecture

One of the biggest decisions CIOs will have to make when it comes to big data is whether to build or buy their architecture. Continue Reading


Deciding whether to buy or build big data architecture

Architectures for big data environments are not one-size-fits-all. That means CIOs need to look at what functionality they need when it comes to selecting vendors. Continue Reading


Big data analytics drive business value

Big data analytics can use real-time data to provide insight into business processes and trends, and ultimately can provide a lot of value to enterprises. But performing analytics that produce actionable results requires skilled data analysts and an infrastructure that can handle constant processing. To help you be sure you're extracting the most valuable results from your analytics project, these big data experts explain what to watch out for.


Big data analytics: A big challenge with big results

Some experts say it's best to start small when getting started with big data analytics. Continue Reading


Analytics projects hinge on predictive models

Effective analytics models are those that produce results that are usable to business units -- but that doesn't necessarily mean analyzing the entire data set. Continue Reading


To make use of big data, beware of bad analytics

A Harvard professor recounts off-the-shelf analytics approaches that generate wrong or unusable results -- that's bad analytics. Continue Reading


Steps to a successful big data analytics project

Performing analytics that are useful to business processes means working with a clearly defined goal in mind at the outset. Continue Reading


Quiz: Best practices for performing big data analytics

Do you know it all when it comes to big data analytics? Take the quiz to find out. Continue Reading


Big data security and compliance issues

Information stored in big data environments is often extremely sensitive. An organization might be storing customer data, financial records or information integral to business processes -- all information that could do significant damage to the organization if compromised. Security is sometimes baked into database offerings, but it often isn't as comprehensive in open source big data technologies that are growing in popularity. For that reason, third-party or additional security measures may need to be taken. In addition, many enterprises analyze big data sets that have specific requirements for privacy and governance, so additional steps may be necessary for compliance. The following links present some common big data security and compliance issues and tips for proactively preventing a breach.


Big data security issues

Expert Matthew Pascucci offers tips for building security into a big data environment during the architecture implementation phase. Continue Reading


Tackling big data security issues head-on

Because big data environments often consist of sensitive data, the use of third-party security systems and encryption techniques are options companies should explore. Continue Reading


Tips for proactively managing big data privacy

To be sure big data is secure, it's important to understand the security practices of cloud services you might be using and compliance rules personal data needs to adhere to. Continue Reading


Big data governance issues

Two consulting firms acknowledge that with big data comes growing privacy concerns, and a plan for data governance is a must. Continue Reading


With big data, expect compliance concerns

Lawyer Kim Walker explains why organizations making use of big data need to understand data protection and intellectual property laws to be sure their data sets are compliant. Continue Reading


Big data video

Our big data video section offers a behind-the-scenes glimpse into what users and analysts think of what vendors are doing in the space.


Is there real business value in big data analytics?

Hear one expert's take on how much of the big data boom is just hype, and whether there are real insights to be gained from big data analytics.


EMC's big data strategy

Analysts discuss EMC's strategy to focus on big data at its annual EMC World conference in Las Vegas.


Eckerson: Big data is something old, something new

Big data doesn't necessarily require a completely new set of tools in order to perform analytics; your existing technologies could get the job done.

8Terms to know-

Big data glossary

With so many technologies and products associated with big data and analytics, there are a lot of terms to know. Use this glossary for a quick take of some of the most common terms you'll likely come across.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.