Latin American e-commerce specialist MercadoLibre blazed largely uncharted trails with its adoption of the open...
source OpenStack cloud technology platform. At the heart of the company's revamped data storage environment is an object store designed to handle petabytes of data.
Read the entire series on building a private cloud with open source software
Cloud storage meets OpenStack: An open source private cloud
building a private cloud: An expert podcast
OpenStack Object Storage, code named "Swift," makes use of clusters of commodity servers to store typically static data, such as documents and photo images, on a long-term basis. The system works by using a hash algorithm to supply a unique identifier for each object or file, which is stored on a data node/server. The addition of new nodes/servers enables the system to scale horizontally.
Metadata for each object resides on all servers in the cluster and OpenStack software ensures data replication. File requests go to Swift (OpenStack Object Storage) proxy servers, which translate the requests, locate the objects by their hash tags and metadata, and retrieve the objects and files.
Administrators assign one or more servers to a zone, and each zone has one copy of every object. The system requires a minimum of three zones to ensure an optimal balance between cost effectiveness and data-loss prevention, according to Beth Cohen, a senior cloud architect at Boston-based Cloud Technology Partners Inc. (cloudTP), which helps companies implement Rackspace Hosting Inc.’s Rackspace Cloud: Private Edition, which is based on OpenStack.
Cohen, however, recommends at least five zones for performance and access purposes because the system goes into read-only mode if one of the three zones goes off-line. A group of zones forms a ring, and each ring shares the same database of MD5 hashes, so every object in a ring is treated as a group, she said.
More on OpenStack Object Storage
Analyzing the OpenStack Object Storage platform
OpenStack Essex looks for wider user appeal
MercadoLibre has four zones and stores three copies of each object. That setup ensures that each of the company’s two data centers will have at least one copy of every object. MercadoLibre’s data centers are located in Virginia to reduce latency, since most requests from users pass through U.S.-based Internet service providers, explained Alejandro Comisario, a senior cloud engineer at MercadoLibre.
“You don’t have to have expensive hardware. Every data node can have its own inexpensive disk,” Comisario said. He said MercadoLibre purchased 24 servers, each with three 2 TB disks, from Hewlett-Packard Co.