Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Keep tabs on the following data storage startup vendors in 2017


Alluxio storage platform wants a slice of the big data pie

Source:  miakievy/istock
Designer: Linda Koury

Fledgling software vendor Alluxio Inc. is trying to make a splash in big data storage.

Alluxio is a virtual distributed storage layer developed by researchers at the Massachusetts Institute of Technology and University of California at Berkeley. The vendor (formerly Tachyon) claims its open source platform allows any computing framework to access application data at memory speed across disparate storage systems.

Alluxio converts idle server memory to storage capacity for processing Apache Spark and other big data workloads. Enterprises use it to extract greater value from high-performance applications on distributed computing frameworks.

The objective is an alternative big data architecture to a disk-based batch approach. Alluxio software is installed between a compute layer and underlying storage to virtualize file and object stores. A columnar data format spawns in memory to overlay traditional batch processing.

Different storage systems are virtualized under a unified namespace. Data is read and written in memory, with hot files cached in memory and less active data tiered to back-end storage.

Two versions are available: a pay-for Alluxio Enterprise Edition and a free Alluxio Community Edition. Pricing for the Enterprise Edition is based on the number of nodes and includes Kerberos authentication and data replication to ensure high availability.

Alluxio does not replicate file storage across a cluster. Instead, changes to data (and metadata) are logged and retained in memory. That allows unused processors to immediately inherit ongoing calculations if a primary server ceases to function.

Alluxio software includes object storage interfaces for Amazon Simple Storage Service (S3) and Swift. Aside from Apache databases, file storage support encompasses Hadoop Distributed File System/MapReduce and Red Hat GFS scale-out NAS.

For optimal performance, the vendor recommends installing its software on the same computational nodes that process big data jobs. The system scales out as additional hardware nodes join the compute cluster.

View All Photo Stories