Amazon.com Inc.'s Web indexing unit, Alexa, recently figured out a way to let software developers create customizable search engines against its data, but allowing the developers access to potentially millions of files simultaneously was a challenge.
For instance, a developer could build a podcasting search engine using Alexa's Web Search Platform to request just audio files. However, according to Alexa's vice president of engineering Niall O'Driscoll, those developers could be manipulating files as small as 1 KB or as large as several terabytes (TB). And their access requirements would vary, too -- some of the small files would be deleted almost immediately, while others would need to stay in place for months at a time.
"We looked at about a half dozen vendors," O'Driscoll said. He declined to name them but said that "most had built-in limitations, either for the file system size as a whole or the size of the file that could be written. Ibrix allowed us to put in as much data as we wanted, write to any file size and have any number of people accessing the file system at one time."
Ibrix's Fusion file system can scale to up to 16 petabytes of capacity in a single namespace and provides up to 1 TB per second of aggregate I/O throughput performance.
Alexa set up the Ibrix system with a Hewlett-Packard Co. (HP) MSA 1000 SAN that supports 12 TB of data today. The file system will allow customers of the Alexa Web Search Platform program to create text indexes, including anywhere from millions to billions of Web pages, which can be searched using fast-lookup algorithms. The file system will also allow the data to be sorted sequentially or randomly, and for the same files to be accessed at the same time by many different users.
All in all, the infrastructure took a year to test and develop. Still, O'Driscoll cautioned other users looking at similar products to do more planning than he did.
"We couldn't know exactly how our system was going to be used," he said. "We were building something hoping people would come. Other storage admins will know more about their users' tendencies and how they'll use the system. My advice would be -- if you want to implement one of these systems, to do it on a small scale and test it before bringing it up to size, which is the opposite of how we did it."
Meanwhile, the Alexa Web Search Platform is already signing on beta customers.
O'Driscoll said so far the Ibrix file system is performing well. The one thing he's asked for is support for 64-bit multiprocessor Linux machines. And, he said, Ibrix has already sent him beta software for that.
"We're happy with Ibrix," he said. "But we're going to take a hard look at how it performs when we have a full load of users, of course."
There are a number of companies besides Ibrix that make storage NFSs, including Acopia Networks Inc., Advanced Digital Information Corp., IBM, HP and Microsoft. Most recently, Ibrix has been tied with EMC Corp. and Dell Inc. in package sales deals for massive file systems. (See EMC wins deals with Ibrix file system, Jan. 13.).