Published: 12 Jul 2007
Few businesses have grown as rapidly as eBay. Its founder and chairman, Pierre Omidyar, built the online trading bazaar in his living room in one weekend in September 1995 with hardware that could be purchased at Fry's Electronics. Every sale item was a separate file, generated by a Perl script, and the system maxed out at 50,000 active items. Search functionality was nonexistent and storage was internal, directly attached.
Fast forward to 2007 and eBay now has 212 million users and 584 million listings. On a typical day, there are 1 billion page views and 26 billion SQL queries, which averages out to 6,000 SQL transactions per second, per database.
William Crosby-Lundin, eBay's SAN and high-availability infrastructure manager, says only 12 people manage eBay's storage, which consists of 13 discrete SANs that each offer a different quality of service for performance, availability and cost. Approximately 2 petabytes (PB) of raw storage (eBay adds 10TBs, 75 LUNs and performs six database moves each week) support 1,000 SAN-connected hosts and more than 600 database clusters.
To reach its 99.94% availability objective, eBay built a redundant path (sometimes as many as four per LUN) to each device in its SANs; it takes 8,000 ports to tie everything together. To increase performance and remove hot spots, Crosby-Lundin says there are multiple layers of aggregation and slicing of storage both at the host level and in the array. Ninety-five percent of its storage is devoted to databases, batch processing and file-transfer protocol operations, while 5% is NAS-based, used for email and fixed content. Crosby-Lundin won't name the brand of storage eBay uses--and for security reasons declined to talk about the company's disaster recovery policies and infrastructure--but did say the company keeps several copies of some databases, as many as four at multiple locations.
The complexity of eBay's storage environment requires heavy customization. "We wish we could, but we just can't buy all the stuff we need," says Crosby-Lundin, adding that in "most cases we have to write the code that makes the things we need happen." While "most standards-based tools deliver the lowest common denominator of functionality," he says, "we want more specific information for our relatively homogeneous storage environment." eBay uses a homegrown storage resource management tool to provision, monitor and manage its storage.
eBay isn't a big user of storage virtualization. "It's vital [for us] to know what disk groups are working on behalf of which databases, and in which clusters they're in so we can deliver on our SLOs [service-level objectives] and diagnose problems," says Crosby-Lundin.
To keep up with anticipated growth, the company plans to add more automation to the process of storage provisioning and is working on "ways to better map storage performance and cost to the value it provides to the business," says Paul Strong, a distinguished engineer at eBay Research Labs.
What's on eBay's storage wish list? Pretty much what every storage manager wants: a cheaper SAN, improved availability and improved interoperability.
Infrastructure maven Saha says open source key at eBay