Seagate Technology PLC at this week's Supercomputing Conference in New Orleans announced upgrades to its ClusterStor storage systems with a new Hadoop Workflow Accelerator, Lustre distributed file system update and secure data appliances.
Best known for hard disk drives (HDDs), Seagate expanded its portfolio into high-performance computing (HPC) and storage systems this year with its acquisition of U.K.-based Xyratex Ltd. The Xyratex roster included the Lustre-based ClusterStor system, as well as HDD testing equipment and storage enclosures.
Seagate sells ClusterStor through a network of OEMs and resellers including HP, Dell and Cray Inc. This week, Seagate added a new reseller partnership agreement with SGI, which will offer the ClusterStor 1500, 6000 and 9000 appliances, as well as the ClusterStor Secure Data Appliance (SDA) that launched this week.
Steve Conway, a research vice president at IDC, said the HPC market has been on a "real growth tear," and storage is the fastest growing area of that market. The need is escalating particularly for systems that can handle big data, he said.
IDC forecasts the converging big data-HPC market will exceed $4 billion by 2018. Conway said 29% of all HPC sites already use Hadoop data analytics, and Seagate is targeting that market with its ClusterStor Hadoop Workflow Accelerator.
"Hadoop is not designed to communicate with file systems. That's what they're addressing," Conway said. "They're making it so that Hadoop fits with HPC environments in government, academia and in industry, which is a very fast growing part of the market."
Lustre Connector allows computing, analytics on one system
The ClusterStor Hadoop Workflow Accelerator includes optimization tools, services, support and a Hadoop on Lustre Connector to enable Hadoop clients to read and write data from a ClusterStor HPC storage subsystem that's running the Lustre file system, according to Steve Paulhus, director of strategic business development at Seagate.
"What we discovered talking to customers is that they have two separate data repositories. One is for their technical computing, and that's typically ClusterStor with Lustre. Then they have a completely different data repository for data that they want to run big data analytics on," Paulhus said. "It duplicates the cost."
Paulhus said the Hadoop on Lustre Connector will allow customers to use the same file system and ClusterStor storage architecture to do both technical computing and big data analytics. ClusterStor users will not only save money with the new tools, but they will also see improved performance and achieve better data efficiency, according to Paulhus.
He said the Hadoop Distributed File System (HDFS) requires "triplication," which renders only a third of the raw capacity usable after the system replicates and stores three copies of the data. ClusterStor uses standard RAID or Seagate's new GridRAID, leaving up to 80% of the raw capacity usable, Paulhus said. According to Paulhaus, Seagate's recently announced GridRAID technology performs better than standard RAID and reduces the rebuild window by 400% if a drive fails.
Distinct product bundles featuring the Hadoop Workflow Accelerator will be generally available in January. The accelerator will initially support distributions based on open source Apache Hadoop, and future releases will provide tighter integration with additional Hadoop distributions, according to Seagate.
Seagate contributed an Apache Hadoop on Lustre Connector to the open source community. The company also released source code for a patch to Hadoop that will allow Map and Reduce processes to share files and enable diskless Hadoop compute clusters.
ClusterStor update supports more metadata, HSM
At Supercomputing Conference 14, Seagate also rolled out the 2.0 version of its ClusterStor Engineered Solution for Lustre, which the company claims will deliver a significant improvement in metadata performance.
The updated ClusterStor software is based on the Lustre 2.5 open source community release and gives customers the option to add up to 16 metadata servers and support up to 16 billion files under a single file system. The new release also supports a new hierarchical storage management (HSM) framework to enable users to manage multi-tier, disk-to-disk and disk-to-tape data migrations, Paulhus said.
Seagate this week also rolled out its ClusterStor SDA, a variation of the standard ClusterStor system that offers multilevel security options for data access. The product is compliant with Intelligence Community Directive (ICD) 503 policies, according to Seagate.
"Obviously, the government markets are attracted to this product, but we see this product going beyond the government and defense markets into health care and life sciences because of the privacy and security concerns there," Paulhus said.
Seagate said ClusterStor SDA supports the Kerberos network authentication protocol to enable symmetric-key cryptography. The product provides the file-system framework to facilitate Kerberos-based encryption key management, which can secure network data traffic between compute clients and the storage system, and help to guard against insider threats. ClusterStor SDA with Kerberos enablement is scheduled for general availability in December, according to Seagate.
Xyratex reports unexpectedly high sales
Seagate CEO leaves among layoffs