Essential Guide

Browse Sections


This content is part of the Essential Guide: Using big data platforms for data management, access and analytics

Hadoop as a service (HaaS)

Contributor(s): Sarah Wilson

Hadoop as a service (HaaS), also known as Hadoop in the cloud, is a big data analytics framework that stores and analyzes data in the cloud using Hadoop. Users do not have to invest in or install additional infrastructure on premises when using the technology, as HaaS is provided and managed by a third-party vendor.

The open source Hadoop big data analytics framework allows large, unstructured data sets to be analyzed. Hadoop's storage mechanism, the Hadoop Distributed File System, distributes these workloads across multiple nodes so they can be processed in parallel. One of the drawbacks to the Hadoop open source programming language, however, is that it requires a special set of skills many organizations do not have in-house or cannot afford. Hadoop as a service providers integrate proprietary programs with the Hadoop framework to make it easier for organizations to use, and typically include management and support capabilities. Most HaaS offerings are cloud-based, and pricing is most often on a per-cluster, per-hour basis.

HaaS providers offer a variety of features and support, including:

  • Hadoop framework deployment support
  • Hadoop cluster management
  • Alternative programming languages
  • Data transfer between clusters
  • Customizable and user-friendly dashboards and data manipulation
  • Security features

This video by VMware's Andrew Nelson
and Adobe's Chris Mutchler at the
2014 Hadoop Summit discusses some
of the operational and technical benefits
of Hadoop as a service.

Features to look for in a HaaS provider include:

  • Data should be stored persistently in HDFS. This avoids issues associated with translating data stored in other formats into HDFS.
  • Elasticity to accommodate a wide variety of workloads.
  • Ability to recover from processing failures without restarting the entire process (known as non-stop operations).
  • A self-configuring environment that allows automatic configuration based on workload.

Amazon was the first major provider of Hadoop as a service. Other providers currently in the market include:

  • Amazon Elastic MapReduce
  • Microsoft HDInsight
  • IBM InfoSphere BigInsights
  • Oracle Big Data Discovery Tool
  • OpenStack Savanna
  • Google Cloud Dataproc
This was last updated in April 2016

Continue Reading About Hadoop as a service (HaaS)

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

How difficult is it for your organization to perform in-house big data analytics?


File Extensions and File Formats

Powered by: