Open source big data startup Alluxio has bolstered its storage software to help analytics teams securely move data...
across multiple hybrid cloud and physical environments.
Alluxio Enterprise Edition 1.8 centers on improved semantics translation among Hadoop Distributed File System and object stores in AWS, Google Cloud Platform and Microsoft Azure. Alluxio also added analytics aimed at helping development teams sidestep design issues related to the integration of third-party cloud tools.
Alluxio came out of stealth in late 2016. The vendor's eponymous data management software sits between applications and data stores. The Alluxio distributed file system is layered atop an existing file system to virtualize underlying storage. A unified namespace enables an application to consume storage as a mountable file folder. Alluxio places frequently accessed data in RAM and sends less active data to object storage or scale-out NAS on the back end.
The vendor claims about 40 large customers in financial services, internet and telecommunications verticals, including Alibaba, Cray, IBM, Intel and Microsoft.
More automation and flexibility
Alluxio's cloud ingests data from multiple sources and presents it as a data lake. Customers use Alluxio to automate data services for machine learning big data applications, including TensorFlow and Apache's Hive, Spark and Hadoop MapReduce.
Alluxio CEO Haoyuan Li said customers requested a more optimized version that connects storage from different cloud vendors and makes the storage more portable.
The latest version provides optimized connectors for AWS Simple Storage Service, Google Cloud Platform and Microsoft Azure, said Jack O'Brien, Alluxio's interim vice president of marketing. Alluxio sought to "close the gap" between big data Hadoop Distributed File System (HDFS) and object storage, O'Brien said.
"As they migrate data to the cloud, customers are struggling with the interfaces and semantic differences between HDFS and object. We've done things to close the gap and to run workloads in both places and to migrate over time to [disaggregated] cloudlike compute and storage architecture," he said.
A new policy engine lets customers specify automatic data placement for individual applications based on cost, data availability and performance requirements. Similarly, customers can tier data flexibly among different availability zones to govern data locality and the type of required storage media, O'Brien said.
Assistance for developers
DevOps features in Alluxio 1.8 include a command-line interface for detailed queries on health and utilization metrics across a cluster. Alluxio records all remote procedure calls to generate API-based statistics for integrating open source analytics and monitoring.
"We're using our own data and extracting value from it for the developer. It extends beyond the Alluxio platform into the ecosystem, both up into the application side and down the persistent storage side," Li said. The main thing Alluxio has done with metrics is generate a lot of new machine-readable statistics that can be attached to third-party tools, Li added.
Alluxio offers a community-developed version of its software and claims to have more than 800 users worldwide. Its enterprise edition is sold directly to hyperscale data centers or through channel partners.
Dell EMC bundles Alluxio Enterprise on its Atmos-based Elastic Cloud Storage appliances. Alluxio also is integrated on Chinese vendor Huawei's FusionStorage block storage appliances.
Legacy storage vendors also are independently turning out big data storage products to address the burgeoning need for data analytics at the edge for inference-based AI and deep learning.
Alluxio automatically encrypts data before it moves to the cloud to guard against virtual machine corruption. Li said the company plans to add backup and data protection in future Enterprise Edition rollouts.
Alluxio, formerly known as Tachyon, changed its name in 2015, the same year it pulled in $7.5 million in venture funding from Andreessen Horowitz. Researchers at the University of California, Berkeley, and the Massachusetts Institute of Technology jointly developed the Alluxio technology and nurtured it at Berkeley's AMPLab.