News Stay informed about the latest enterprise technology news and product updates.

Linux file system ‘firestorm’ fizzles

I was intrigued when a colleague sent me a link to an article by Henry Newman referring to a “firestorm” touched off by some remarks he recently made in another article he wrote. The first article addressed the scalability of standalone Linux file systems vs. standalone symmetric multiprocessing (SMP) file systems such as IBM’s GPFS or Sun’s ZFS. His point was that in high-performance computing environments requiring a single file system to handle large files and provide streaming performance, an SMP file system that pools CPU and memory components yields the best performance.

Newman begins his followup piece by writing, “My article three weeks ago on Linux file systems set off a firestorm unlike any other I’ve written in the decade I’ve been writing on storage and technology issues.” He refers later on to “emotional responses and personal attacks.” I’m no stranger to such responses myself, so it’s not that I doubt they occurred, but in poking around on message boards and the various places Newman’s article was syndicated I haven’t been able to uncover any of that controversy in a public place. And I’m not sure why there would be any firestorm.

I hit up StorageMojo blogger and Data Mobility Group analyst Robin Harris yesterday for an opinion on whether what Newman wrote was really that incendiary. Harris answered that while he disagreed with Newman’s contention that Linux was invented as a desktop replacement for Windows, he didn’t see what was so crazy about Newman’s ultimate point: a single, standalone Linux file system (Newman is explicit in the article that he is not referring to file systems clustered by another application) does not offer the characteristics ideal for a high-performance computing environment. “It seems he made a reasonable statement about a particular use case,” was Harris’s take. “I’m kind of surprised at the response that he says he got.”

That said, how do you define the use case Newman is referring to–what exactly is HPC, and how do you draw the line between HPC and high-end OLTP environments in the enterprise? Harris conceded that those lines are blurring, and that moreover, image processing in general is something more and more companies are discovering in various fields that didn’t consider such applications 15 years ago, like medicine. So isn’t the problem Newman is describing headed for the enterprise anyway?

“Not necessarily,” Harris said, because Newman is also referring to applications requiring a single standalone large file system. “The business of aggregating individual bricks with individual file systems is a fine way to build reliable systems,” he said.

But what about another point Newman raised–that general-purpose Linux file systems often have difficulty with large numbers of file requests? Just a little while ago I was speaking with a user who was looking for streaming performance from a file system, and an overload of small random requests brought an XFS system down. “Well, someone worried about small files does have a problem,” Harris said, though it’s a tangential point to the original point Newman raised. “But everybody has this problem–there is no single instance file system that does everything for everybody.” He added, “this may be an earea where Flash drives have a particular impact going forward.”

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

The file system space has always been controversial - only some of the recent debates around dedupe (inline versus post-processing) have been as incendiary as the file system wars. The file systems have been going on for a lot longer, though. Individual files are - on average - getting bigger. As you point out, image processing is headed for the main stream enterprise, in medicine, in the application of social networks to business, in energy, etc. That puts pressure on a file system to support large file system sizes, large file sizes, and good large file streaming performance. On the flipside, though, it's not just OLTP that does a lot of small file I/O. Almost every image application also uses thumbnails, so the same file system you asked to do that giant image file system thing, you're also going to ask to handle a bunch of 1K small file thumbnails, and a bunch of metadata lookups while people try to find the image they need. The free single-node file systems that you get with every Linux distro - ext3, Reiser, etc - may not be as good as big Unix file systems like ZFS. But there are lots of scalable clustered and distributed file systems on Linux that outperform any single node file system. Those would include Ibrix, HP/PolyServe, Lustre, GPFS, and a bunch of "get/put" Web 2.0 file systems like GoogleFS, Hadoop, and Mogile. Sun's ZFS is the wild card here. They keep trying to give people the impression that it is a scalable multi-node file system. I do not believe that it is. It is a very sophisticaed, feature-rich single node Unix file system. As far as those kinds of file systems go, it's got everything you could want, but ultimately, the clustered / distributed file systems offer more of everything. There's no one right answer, but I think a firestorm is a reasonable response for a simplistic conclusion like Mr. Newman's.
Out of curiosity: I've frequently read that Hollywood's render farms are run on Linux. What filesystems do they employ?
Perhaps the 'firestorm' was simply the result of the fact that the initial article was incompetent. Mr. Newman is clearly not a file system developer or even informed observer, so should not attempt to play one on TV without much better ghostwriters. He began with the assertion that "The NTFS file system layout, allocation methodology and structure do not allow it to efficiently support multi-terabyte file systems", which is simply false (though what can one expect from someone whose acquaintance with this nearly 15-year-old file system is so superficial that he claimed "it was released almost 10 years ago"?). After immediately following that with the ridiculous assertion (which just for good measure he repeats later in the article) that the target of Linux was "A Microsoft desktop replacement, of course", he moved on to the suggestion that Linux file systems could not support I/O even at a paltry 240 MB/sec for backup/restore operations: one might suspect that his reporting background consists primarily of writing for supermarket tabloids, where accuracy is not near the top of the priority list. Without bothering to check whether any of the Linux file systems that he limits his comments to in fact *do* support automatic direct I/O, it's worth noting that the breaking-up of large requests into 128 KB segments that he claims to have seen has nothing to do with whether the data is then sent directly to the application but was more likely due to the request-size limitations of ATA drives until relatively recently. And that applications where direct I/O is of significant importance tend to request it expliclty anyway. Tossing "the lack of someone to take charge or responsibility" into a discussion of file system performance was utterly random (leaving aside its debatable validity as a more general issue: for example, even though Linux's long-standing failure to address some dubious behavior in fsync does qualify as cause for concern, such negligence is not without parallel in proprietary systems). Even assuming that he's correct in the assertion that most Linux file system pay no attention to RAID stripe alignment, for sensible - fairly coarse - stripe configurations this has little performance impact save for writes that just happen to be exactly one stripe in size (the impact if they're some integral number of stripes in size, while perhaps still measurable, is proportionally less) - and there's virtually no impact at all for mirrored rather than parity-protected data). And despite his Carl-Sagan-like admonition about Billions And Billions of allocations, 1) ext may use 4 KB allocation *units* but usually allocates a lot more than one at a time for efficiency (a strategy that goes back at least to the early '90s with UFS at Sun) and 2) at one bit per allocation unit the bitmap for the 200 TB file system he uses as an example would fit in less than 8 GB - an easily-justifiable amount of RAM to handle storage of that size even if one didn't page the bitmap (which is eminently reasonable to do). His charmingly naive observation that "The bitmap or allocation map could even fit in memory for this number of allocations!" if ridiculous 8 MB allocation units were used would be true even if the bitmap had to fit in a modern cell phone... In the end, while there's some truth in his broad-brush assertion that Linux and some of its file systems were not originally designed to scale up to the levels of some of the large-Unix-system competition (though even there it's worth noting that XFS *was* designed exactly for that purpose), the bottom line is that save for the most demanding very-large-system configurations they'll do just fine (as will NTFS, for that matter). So for almost all server use (presumably the primary concern of this publication's readership) the answer to his question "Are Linux File Systems Right for You?" is a fairly definite "Yes". (And no, George, ZFS, while interesting, does not even come close to having "everything you could want" in a single-node file system: its copy-on-write behavior can result in devastating file fragmentation and resulting order-of-magnitude performance degradation in fairly common situations like database tables where data is updated at fine grain but later read in bulk, and its brain-damaged 'RAID-Z' implementation can reduce small-random-read throughput by a factor of the array size minus 1 compared to RAID-5. Despite Sun's hype, ZFS is nothing like "The Last Word In File Systems".) - bill
"[...] general-purpose Linux file systems often have difficulty with large numbers of file requests?" OMFG. I had a project recently where I was explaining people repeatedly for two weeks that hard drives just can't spin all that fast. No, you can't make baby in less than nine months. Also, to all the HPC fans, please keep in mind that modern CPU/buses can pump tens GB/s (gigabyte per second) compared to measly 50-80MB/s (*mega*byte per second) of the hard drives (and do not forget seek times!). Guys, you are looking for bottleneck in the wrong place. What about optimizing your own applications first? Before blaming all on OS?
eh? This description of the article is faulty. You're confusing SMP (one copy of OS with multiple processors), with distributed processing (aka clusters, in this case.) Neither GPFS nor ZFS are mentioned in the article, as far as I can tell. That said, the original article is pretty high in the BS dept. Probably just trying to generate ad clicks.