Essential Guide

Get started Bring yourself up to speed with our introductory content.

Complete guide to Hadoop technology and storage

Use this guide to get a grasp on Hadoop technology basics, some of its benefits and drawbacks, and what it means for big data and cloud storage.


Hadoop technology has been discussed hand in hand with big data for some time now, but IT professionals still don't know the full extent of what the technology can do or how to use it.

The open source Hadoop framework is based on Google's MapReduce software and can process large data sets at a granular level. It offers analytics at a low cost and high speed that some analysts say can't be achieved any other way. Essential to the effectiveness of Hadoop is the Hadoop Distributed File System (HDFS), which allows parallel processing by spanning data over different nodes in a single cluster and provides fault tolerance.

However, HDFS is the source of one of the main issues users see with Hadoop technology: expanded capacity requirements due to Hadoop storing three copies of each piece of data in case a DataNode fails or is taken offline. That failover setup is necessary because each NameNode that controls the copy and distribution process of data is a single point of failure. Other complaints point to the complicated technology stemming from Hadoop's Java framework.

Despite the hurdles with Hadoop technology, analysts and users say the benefits are worth it. To help you determine that for yourself, this guide will walk you through the basics of what Hadoop technology can achieve, lay out the main concerns about the technology, and outline how it works with storage and the cloud.

1Getting started-

Understanding the basics of Hadoop technology

Hadoop technology is not a single entity -- it consists of different open source products such as HDFS and MapReduce. While Hadoop software is free, some vendors also offer their own Hadoop distributions with support and maintenance add-ons. To find out how all the components work and what they can do, take a look at the links below.


Considerations for deploying Hadoop technology

Learn three ways to decide if deploying a Hadoop infrastructure is the right move for your enterprise big data needs. Continue Reading


Debunking common Hadoop myths

Philip Russom, director of research at The Data Warehousing Institute, deconstructs the 12 most misconstrued Hadoop characteristics and replaces rumor with fact. Continue Reading


Comparing MapReduce and Hadoop technologies

Hadoop and MapReduce are two technologies that are often discussed together. Find out how they compare and relate to each other. Continue Reading


Exploring the future of Apache Hadoop

In this podcast, SearchDataManagement News Editor Jack Vaughan shares his thoughts and observations on Hadoop developments from the Hadoop Summit. Continue Reading


Uses for Hadoop technology extend beyond big data

There are multiple ways an organization can benefit from implementing Hadoop technology, according to analyst John Webster. Continue Reading


Understanding improvements to Apache Hadoop technology

John Webster describes how recent changes to HDFS and the NameNode can help to improve Hadoop technology. Continue Reading


Dealing with Hadoop pain points

Despite its popularity, criticism of Hadoop ranges from the requirement for a specialized skill set to several single points of failure in the Hadoop cluster. In the following links, you'll find explanations of these and other Hadoop issues, and learn how to confront them.


Hadoop technology creates problems for big data analytics

Described as cutting-edge, hot, niche and hard to use, Hadoop, like all celebrities, has its shining moments and dismal displays. Continue Reading


Hadoop benefits come at low cost, but only with high-level understanding

The enterprise can achieve inexpensive big data analytics with the Apache Hadoop framework, but only with qualified data scientists and appropriate applications. Continue Reading


Anticipating the results of an HDFS infrastructure

Analyst John Webster details issues with Hadoop technology and what users can expect from Hadoop Version 2.0. Continue Reading


Why Hadoop isn't the most popular big data technology

Research reveals businesses' relative lack of big data maturity and shows hurdles in both Hadoop technology and analytics techniques they need to overcome. Continue Reading


Dealing with problems in Hadoop and MapReduce

Big data users are facing challenges when using Hadoop and its MapReduce programming model. But taking some good first steps can help avoid problems. Continue Reading


Why Hadoop isn't essential to big data

Expert Jon Toigo explains why Hadoop technology and big data are frequently used together, but argues that Hadoop has a number of downfalls. Continue Reading


Understanding Hadoop technology and storage

Because Hadoop stores three copies of each piece of data, storage in a Hadoop cluster must be able to accommodate a large number of files. To support the Hadoop architecture, traditional storage systems may not always work. The links below explain how Hadoop clusters and HDFS work with various storage systems, including network-attached storage (NAS), SANs and object storage.


The effect of Hadoop technology on storage

Storage Switzerland analyst Colm Keegan explains how to determine whether a SAN or NAS should be used as primary storage with Hadoop. Continue Reading


Can Hadoop technology be used with shared storage?

Storage expert John Webster discusses three ways to use shared storage with Hadoop technology in this Ask the Expert answer. Continue Reading


Working with Hadoop connector software for big data clusters

Various software vendors have begun offering connectors designed to help users bridge the gap between Hadoop clusters and relational databases. Continue Reading


Benefits and challenges when using Hadoop clusters

Brien Posey explains how Hadoop clusters can be extremely beneficial to large amounts of unstructured data -- but they aren't ideal for all environments. Continue Reading


Storage vendors work to integrate Hadoop technology

John Webster discusses how vendors of Hadoop technology are faring when faced with the challenge of meeting enterprise big data demands. Continue Reading


Storage uses for Hadoop technology

Learn how data can best be stored with Hadoop and what it will take to drive wider adoption of the technology. Continue Reading

4Cloud considerations-

How Hadoop technology works with the cloud

Hadoop can be useful for analytics across cloud storage because of its parallel-processing capability. Because Hadoop can process data across many servers, large amounts of data stored in the cloud can be searched and analyzed at high speeds. From the links below, you'll learn how using Hadoop in the cloud works and how cloud storage can help address some common Hadoop problems.


How big data processing across clouds is made possible with Hadoop

Hadoop technology enables distributed big data processing across servers that can improve application performance and offer redundancy. Continue Reading


Using private cloud storage to mitigate Hadoop issues

Learn how private cloud storage providers can help solve common Hadoop problems relating to availability, capacity and migration. Continue Reading


Why Hadoop is so popular with cloud applications

Despite being described as a "perfect" cloud application framework, many enterprises still don't know how to deploy the Apache Hadoop technology or recognize its pitfalls. Continue Reading


Experts discuss Hadoop technology

Now that you have a better understanding of how Hadoop technology works with big data, watch the videos below to get experts' takes on how well Hadoop works and the best ways to use it.


Webster on storage for a Hadoop Cluster

Of all the ways to handle the storage requirements of big data analytics, Hadoop technology is receiving the most attention. Find out why.


Understanding HDFS and NameNode

Learn why you should be on the lookout for issues with NameNode and HDFS when using Hadoop technology for big data storage.


Improving performance with Hadoop technology

The chief technology officer at Sears explains how to reduce data latency troubles by turning to the Hadoop framework and data science analytics.


Benefits of using a Hadoop architecture in big data environments

In a video interview, TechTarget's Wayne Eckerson discusses the benefits and challenges of deploying Hadoop-based systems in big data environments.


Storage alternatives for a Hadoop infrastructure

Hadoop storage systems traditionally call for the use of embedded DAS for hardware-based storage within the Hadoop MapReduce framework. But alternatives exist.

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.