Hadoop as a service (HaaS)

This definition is part of our Essential Guide: Using big data platforms for data management, access and analytics
Contributor(s): Sarah Wilson

Hadoop as a service (HaaS), also known as Hadoop in the cloud, is a big data analytics framework that stores and analyzes data in the cloud using Hadoop. Users do not have to invest in or install additional infrastructure on premises when using the technology, as HaaS is provided and managed by a third-party vendor.

The open source Hadoop big data analytics framework allows large, unstructured data sets to be analyzed. Hadoop's storage mechanism, the Hadoop Distributed File System, distributes these workloads across multiple nodes so they can be processed in parallel. One of the drawbacks to the Hadoop open source programming language, however, is that it requires a special set of skills many organizations do not have in-house or cannot afford. Hadoop as a service providers integrate proprietary programs with the Hadoop framework to make it easier for organizations to use, and typically include management and support capabilities. Most HaaS offerings are cloud-based, and pricing is most often on a per-cluster, per-hour basis.

HaaS providers offer a variety of features and support, including:

  • Hadoop framework deployment support
  • Hadoop cluster management
  • Alternative programming languages
  • Data transfer between clusters
  • Customizable and user-friendly dashboards and data manipulation
  • Security features

This video by VMware's Andrew Nelson
and Adobe's Chris Mutchler at the
2014 Hadoop Summit discusses some
of the operational and technical benefits
of Hadoop as a service.

Features to look for in a HaaS provider include:

  • Data should be stored persistently in HDFS. This avoids issues associated with translating data stored in other formats into HDFS.
  • Elasticity to accommodate a wide variety of workloads.
  • Ability to recover from processing failures without restarting the entire process (known as non-stop operations).
  • A self-configuring environment that allows automatic configuration based on workload.

Amazon was the first major provider of Hadoop as a service. Other providers currently in the market include:

  • Amazon Elastic MapReduce
  • Microsoft HDInsight
  • IBM InfoSphere BigInsights
  • Oracle Big Data Discovery Tool
  • OpenStack Savanna
  • Google Cloud Dataproc
This was last updated in April 2016 ???publishDate.suggestedBy???

Continue Reading About Hadoop as a service (HaaS)



Find more PRO+ content and other member only offers, here.

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

How difficult is it for your organization to perform in-house big data analytics?


File Extensions and File Formats