BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
The earliest adopters of Red Hat Storage Server reported promising results when using the software for limited-scale projects, which they said have the potential to lead to larger deployments and substantial cost savings.
Capabilities that drew customers such as Amadeus Data Processing, Intuit and Saskatchewan Telecommunications (SaskTel) to Red Hat Inc.'s year-old storage server product include high availability, replication and ease of scalability through the addition of commodity server hardware.
"If you want to support scalability, you need to go away from a SAN," said Paul Hubert, a system architect at Amadeus' operations and data center in Erding-Aufhausen, Germany.
Hubert said the software-based Red Hat Storage Server presents a significantly less-expensive option than scale-out network-attached storage (NAS) products from major storage vendors, and stands to benefit from the pace of innovation of the open source development community.
Red Hat Storage is based on technology acquired in October 2011 from Gluster Inc., which developed and supported the open source GlusterFS scale-out file system. The storage product consists of the Red Hat Enterprise Linux operating system, the single-node XFS file system (for each Linux box), the GlusterFS distributed file system (which runs over the local Linux file system and pools storage), and a console management station. Users supply server hardware.
Red Hat's storage server supports files, object storage, and virtual block and virtual file storage. It's best suited for unstructured data such as documents, images, audio and video files, email, virtual machine images and log files. The product is not intended for use with high-transaction relational databases, according to Sayandeb (Sayan) Saha, manager of product management for Red Hat Storage.
But Amadeus' Hubert said the small project to implement Red Hat Storage provided an important proof point for the use of scale-out, software-based storage in tandem with NoSQL databases that remove the notion of a transaction. NoSQL engines work mainly in memory, but they need persistency on storage, he noted.
"NoSQL is all about local storage. Gluster has the same approach, this elasticity," Hubert said. "You replicate and you partition, and it's the same principle as any NoSQL engine like Couchbase or MongoDB."
Kicking off its Red Hat Storage project last year, Amadeus used six Hewlett-Packard servers with JBOD. This year, the company is transitioning from its near-line SAS drives to PCI Express-based NAND flash cards for added performance, he said.
Hubert said the data in Red Hat Storage represents only a small part of the petabytes that Amadeus stores for the complex processing associated with searches for flight availability. But he said he can foresee software-based storage and NoSQL databases leading to a slow phase-out of Oracle databases and EMC SANs over time as part of product refresh cycles.
Ana-Paula Ribeiro, Amadeus' director of data storage services, said Red Hat Storage eliminates the need to buy high-end storage each time the company needs to add capacity to its massive computation platform.
"Definitely a very attractive total cost of ownership," said Ribeiro. "Redundancy and scalability; it's a lot easier. [If] you have a cluster of six servers [and] you want to add some storage tomorrow, you just add another one or two [servers], or you take one out."
Intuit requires scalable, available storage
A year before the Red Hat acquisition, Gluster-based storage caught the attention of Intuit's software architects when they created a common platform for company developers to build new services and websites. Their goal was to put in place an underlying technology stack to free developers to concentrate on the presentation tier and speed the delivery of the fresh offerings.
Mohit Anchlia, a software architect at Intuit, said the team needed highly scalable and available storage that could deliver consistent response times during peak demand periods, and they looked for alternatives to the expensive SAN and NAS systems used in the past.
Gluster-based storage worked well during the first year or so, until Intuit decided to add another data center approximately 800 miles away. Prior to going live, the team noticed problems with the active-active geo-replication between the data centers. They hoped for bandwidth in the range of 20 MBps to 50 MBps, but instead got an unacceptable 2 MBps, according to Anchlia.
Anchlia said the problem stemmed from the GlusterFS approach of fetching data from all nodes with multiple network hops. The Intuit team called in Red Hat to observe the problem, and they worked together to revamp the architecture. The system currently delivers 20 MBps, he said.
"We are able to meet our SLAs [service-level agreements]," he said.
Intuit started with only 10 TB to 15 TB of data in the Gluster-based system, but it now stores about 140 TB in Red Hat Storage. The total could hit a petabyte later this year, as the company expands its tax, finance and other services, Anchlia said.
Red Hat Storage runs on about 150 servers at Intuit. The company uses multiple clusters and stores all the file metadata on a NoSQL database.
Anchlia said the software architecture team deals with capacity planning without needing to contact a storage administrator, which they had to do in the past with NAS and SAN systems. They monitor data usage and estimate future needs based on past figures.
The team wrote scripts to tier the Red Hat Storage between high-end SAS and low-end SATA disks. There are no flash drives and the servers use only local disks.
SaskTel uses Red Hat Storage for system logs
SaskTel turned to Red Hat Storage for the centralized storage of approximately 250 GB worth of secondary system logs, which the company gathers on a daily basis. The Canadian telecom company wanted to make the syslogs searchable and easy to manage while restricting access only to authorized individuals.
"Our growth rate is insane because we're not only talking about standard, wired-line networks, we're [also] talking about wireless networks," said David Yaffe, a systems specialist at SaskTel. "As we deploy new services for our customers, we have new applications that run multiple servers, producing not only system logs, but application logs."
Yaffe said the price tag for storage hardware from some of the major vendors would have exceeded the budget for the entire project. Red Hat Storage not only afforded significant cost savings over traditional SANs, but it was a natural fit for an IT shop highly skilled in using and maintaining Red Hat software, he said.
Yaffe is a Red Hat Certified Engineer (RHCE), and he and his chief colleague on the project have a deep knowledge of Linux. Their chief responsibility is keeping performance management applications running. Prior to the syslog project, they had little involvement with storage.
"Because of the simplicity of Red Hat Storage," Yaffe said, "I see it as just another application that is in our toolset."
The biggest decision point for SaskTel was planning out the architecture, including the network configuration and storage volume configuration. The company did a proof of concept and technical trial in March, and started with production data in April.
Yaffe said the team followed Red Hat best practices for setting up its four-server cluster with a pool of 34 TB of usable capacity on high-speed enterprise SAS disk drives. Each server has 25 drives in two RAID 6 configurations, and the servers are replicated for data protection, he noted.
The most important lesson Yaffe learned was the need to automate storage provisioning. He and his colleague drew up basic scripts to automate as much of the server setup as possible, and they executed them to add more volumes from the central management server.
"Unlike a lot of the higher-end tools, there's a lot of typing involved, and the typing has to be 100% correct. It's better to create a script to do everything" so the typo won't affect all the servers, Yaffe said. "It's a lot of work to correct the typographical error."
GlusterFS handles the distribution and replication of files. SaskTel uses the Gluster native client on its Linux clients to connect to the Gluster daemon running on the server. The Gluster daemon ensures all the nodes are up and talking to each other, and the Gluster client determines where the individual files are stored on the servers, Yaffe said.
The only problem SaskTel encountered was monitoring. The staff set up basic SNMP monitoring, but wasn't able to probe Gluster internals to see how its trusted storage pool is performing.
"You can see all your disks and how much disk space is being used on the bricks, but what the cluster is doing overall in terms of aggregated IOPS or aggregated storage doesn't happen yet," Yaffe said.
Should SaskTel need to expand the capacity of Red Hat Storage, the IT staff can either add external drive cages to each server, or add nodes/servers to the pool. At this point, they favor the latter approach because the servers already have a fair number of drive bays, Yaffe said.
"This flexibility fulfills our immediate storage requirements and simplifies future planning for the platform," Yaffe said.