Petabyte-level file storage needs compelled Ancestry.com to shift from traditional scale-up network-attached storage...
(NAS), and its capacity and performance limitations, to the scale-out network-attached storage offerings of Isilon Systems Inc. and BlueArc Corp. long before their respective acquisitions by EMC Corp. and Hitachi Data Systems Corp.
Provo, Utah-based Ancestry.com once stored more than 200 TB in NetApp Inc.’s conventional scale-up filers. But NetApp’s “idiot-simple” interface, ease of administration and “top-notch” Windows integration ultimately weren’t enough to offset the scaling limitations of traditional NAS systems that require disruptive “forklift upgrades,” said Travis Smith, senior manager of storage operations at Ancestry.com.
“You could only have a handful of terabytes behind each one and still get decent performance,” Smith said of the company’s aging NetApp boxes. “That drove us to look at the Isilon.”
Ancestry.com’s data includes 7 billion records from 1.7 million subscribers gathered in the last 15 years. The company now stores only 30 TB in its older model NetApp systems, and has 3.8 PB of photographs, document images and other file data in Isilon’s X-Series scale-out NAS and 140 TB of performance-sensitive data in BlueArc’s Titan 3000 Series. Ancestry also has 1.3 PB of block data in storage-area network (SAN) systems from Hewlett-Packard (HP) Co./3PAR Inc., Hitachi Data Systems and Nexsan Corp.
Expanding Isilon capacity
Boosting capacity and performance in Isilon’s modular environment requires adding a storage node, which is essentially a server equipped with a special operating system, distributed file system, and hard disks or solid-state drives (SSDs). The system automatically handles the load balancing.
Ancestry.com is expanding its Isilon capacity to 4.5 PB to accommodate its growing file storage needs. The organization already has 84 nodes of Isilon’s 12000X platform and 84 12000EX expansion nodes, and recently added 34 of the larger, faster 36000X nodes and 12 of the 32000X nodes equipped with SSDs for metadata caching. Smith said the SSDs made a dramatic impact on the speed of the roughly 3.5 million searches that Ancestry.com's customers do per day.
Another significant benefit of an Isilon cluster is its ability to manage petabytes of data under a single file system, although Smith said he’s not comfortable with going beyond 500 TB. Ancestry.com’s largest cluster currently stores approximately 300 TB.
An added plus with the Unix-based Isilon is the ability to write command-line programs and daemons for a wide range of functions, such as system checks for files of a certain size or age, or distribution of the indexing process across all nodes of a cluster, Smith said.
“They give you enough rope to hang yourself,” he cautioned, “so you need to be careful with what you’re doing.”
No system is perfect, and Ancestry.com has encountered its share of kinks and bugs in the early Isilon technology, Smith said. Isilon systems weren’t able to handle Ancestry.com’s highest performance needs in 2009, so the company bought a pair of BlueArc Titans to provide fast access to small files, such as the reduced size, low-resolution thumbnails of user uploaded photos.
“The way our website works, somebody will hit a page and grab a bunch of different thumbnails to stick up there,” Smith said. “You’re authenticating thousands and thousands of times a minute, and the Isilons just couldn’t keep up at the time.”
With custom-ASICs driving their hardware acceleration, the BlueArc Titan “blew the pants off the Isilon,” Smith observed. He has since heard that a newer model Isilon S-Series competes favorably, but Smith said Ancestry.com has no plans to swap out the BlueArc Titans. He actually expects to bump up each of Ancestry.com’s Titan 3210s from 70 TB to 120 TB this year.
Ancestry originally plugged disk shelves from LSI Corp.’s Engenio (now owned by NetApp) into the BlueArc NAS heads, but Smith said the company is replacing some of them with Hitachi’s Adaptable Modular Storage (AMS) array, equipped with SAS disks, to gain the benefit of nondisruptive upgrades.
“The pricing wasn’t particularly great,” Smith said, commenting on the initial Titan purchase, “but it was what we needed. It did the job. We’ve left it in that niche, but most of our stuff doesn’t need quite that level of performance.”