High-performance-computing storage specialist DataDirect Networks Inc. today unveiled hScaler, an appliance designed...
to speed the deployment of Apache Hadoop-based big data analytics.
The hScaler from DataDirect Networks (DDN) is built on its Storage Fusion Architecture (SFA) 12K clustered network-attached storage (NAS) array. DDN claims the Hadoop storage and compute system can scale up to approximately 7 PB of capacity, uses 40 GBps networking through InfiniBand, and performs at about 1.5 million IOPS. The hScaler runs the Hortonworks Data Platform Hadoop distribution.
Jean-Luc Chatelain, DDN's executive vice president of strategy and technology, said the goal with hScaler is to enable Hadoop adoption in large enterprises.
"The reality is enterprise Hadoop has not gotten into data centers. The majority of enterprises are not using Hadoop," Chatelain said. "Building a system [for Hadoop] is a science project. The customer has to cobble together a system the hard way. Some enterprise customers take up to six months to deploy a Hadoop infrastructure. Our appliance reduces implementation to about eight hours."
The hScaler storage and compute nodes can scale independently. An appliance supports at least two 4U SFA12K storage enclosures with 84 SAS drives per enclosure. A two-enclosure system includes up to 345 TB of usable capacity with 3 TB drives. HScaler also supports solid-state drives. An hScaler can hold 32 compute nodes in a rack, with each server equipped with two CPUs with eight cores per CPU and 64 GB of RAM per node.
"The way you grow Hadoop is to add servers and direct-attached drives," Chatelain said. "The drawback is every time you need more compute, you have to add more drives. That leads to an imbalance of storage and compute. We take the approach that there is a more efficient way to do that. We have the flexibility to scale compute from the storage."
There are two nodes per appliance that run the Hadoop Distributed File System (HDFS) and the extract, transform and load (ETL) process that gathers data from the application servers. The system includes a NameNode that runs the software to orchestrate and manage Hadoop functions, supporting 12 cores and 128 GB of RAM per node. The NameNode manages the data copy and distribution process for Hadoop.
Each appliance has 48 ports for 10 Gigabit Ethernet switches, 44 ports for Gigabit Ethernet switches, and 36 ports for Mellanox Technologies InfiniBand switches.
Evan Quinn, senior principal analyst for data management and analytics at Enterprise Strategy Group, said most organizations follow the "do-it-yourself path" when designing a Hadoop system. They tend to start with a proof-of-concept implementation that works well but becomes more difficult when it is scaled for larger implementations. That leads companies to build server and storage farms for big data.
"First, they buy commodity hardware and it works well," Quinn said. "But when it grows to the enterprise level, they are back to the server- and storage-farm business that companies are trying to move away from. Hadoop is supposed to cost less, but at some point it becomes more expensive. So, now it's beginning to shift to appliances."
Quinn said integrated appliances are becoming the preferred approach for big data and Hadoop storage because "the original design of Hadoop with storage associated with each node is problematic."
DDN's hScaler will ship at the end of March. The company has not released pricing details. The system competes primarily with IBM's Netezza and Teradata Corp. appliances.